# Importing data from VASP calculations into Citrination

Citrination is a public, un-siloed repository of materials data coupled to analysis and modeling tools.

By putting data on Citrination, you can:
 1. Share incremental results within your group
 1. Supplement your data with similar published data
 1. Release your data to the public as you publish associated papers
 1. Recieve feedback on the quality of your DFT calculations
 1. View statistical analyses of the data as it comes in
 1. Build machine learning models on the data that update as data comes in

There are two steps for getting data from any format onto Citrination:
 1. formatting the data as a PIF
 1. uploading to Citrination

## Formatting VASP outputs as PIFs

We provide scripts to extract common conditions and properties from VASP calculations.  You just pass in a path to the calculation and it returns a PIF!

In [1]:
from dfttopif import directory_to_pif
from os import chdir

chdir("./example_data")
rundir = "./Al.cF4"
pif = directory_to_pif(rundir, quality_report=True)

The PIF is a lightweight schema on top of the JSON format:

In [2]:
from pypif.pif import dumps

print(dumps(pif, indent=2)[:200])

{
  "properties": [
    {
      "name": "Converged",
      "scalars": [
        {
          "value": true
        }
      ],
      "conditions": [
        {
          "name": "XC Functional",
        


We'll dig into PIFs more later.

## Uploading files to Citrination

PIFs and other files can be uploaded to Citrination via the `citrination_client` package.

### Setting up the client

The client authenticates with your API key, which is located on the "Account" page on Citrination.  Please keep these keys out of your source code.  Placing them in and referencing them from environment variables is a best practice. 

In [3]:
from citrination_client import CitrinationClient
from os import environ

client = CitrinationClient(environ['CITRINATION_API_KEY'], 'https://citrination.com')

We'll use the same client to query and download from Citrination later.

### Creating a dataset

To upload data, we'll need to specify a dataset for it to live in. You'll only need to do this once. There are 2 ways to proceed.

First: this notebook uses the `citrination_client` to create a dataset for you to use in this and other tutorials. 

Second: you can create a dataset on the website using the directions below. You can also share datasets via the groups tab.

1. login to citrination.com
2. navigate to `Add Data`
3. Enter a name (ex: "tutorial dataset")
4. select `dummy_csv.csv` from the `example_data` folder and upload
5. if successful, the page should automatically reload with your dataset_id in the url:
     * https://citrination.com/datasets/**dataset_id**/show_files
6. comment the cell below and set `dataset_id` to the dataset_id found in step 5 above.



In [4]:
#comment this cell if you have an ID from a dataset you created via the website

import random
import string
import json

random_string = ''.join([random.choice(string.ascii_uppercase + string.digits) for i in range(5)])
dataset_name = "Tutorial dataset " + random_string

dataset = client.create_data_set(name=dataset_name, description="Dataset for tutorial", share=0).content.decode('utf-8')

dataset_id = json.loads(dataset)['id']

### Uploading a pif to a dataset

The client uploads files, so we write the file to `pif.json`.

We may also want to add a tag to the pif, which will make it easier to search and filter pifs later.

In [5]:
from pypif.pif import dump
from os.path import join

pif.tags = ["my_first_upload",]
with open(join(rundir, "pif.json"), "w") as fp:
    dump(pif, fp)
client.upload_file(rundir, dataset_id)

'{"message": "Upload of files is complete.", "successes": ["./Al.cF4/INCAR", "./Al.cF4/pif.json", "./Al.cF4/OUTCAR", "./Al.cF4/POSCAR"], "failures": []}'

## Hands-on: upload your DFT calculations

 * If you have your own DFT calculations, try formatting and uploading them
 * If you don't, you can use the Al-Cu data in `example_directory`