![Banner logo](https://raw.githubusercontent.com/CitrineInformatics/community-tools/master/templates/fig/citrine_banner_2.png)

# Importing data from VASP calculations into Citrination

*Authors: Max Hutchinson, Carena Church, Enze Chen*

This is the first tutorial in a sequence that teaches you how to import DFT data onto Citrination. [Citrination](https://citrination.com) is a public, un-siloed repository of materials data coupled to analysis and modeling tools.

By putting data on Citrination, you can:
 1. Share incremental results within your group.
 1. Supplement your data with similar published data.
 1. Release your data to the public as you publish associated papers.
 1. Recieve feedback on the quality of your DFT calculations.
 1. View statistical analyses of the data as it comes in.
 1. Build machine learning models on the data that update as data comes in.

There are two steps for getting data from any format onto Citrination:
1. Formatting the data as a [Physical Information File (PIF)](http://citrineinformatics.github.io/pif-documentation/schema_definition/index.html).
1. Uploading the PIF to Citrination.

## Python package imports

In [None]:
# Standard packages
import os
import uuid

# Third-party packages
from dfttopif import directory_to_pif
from pypif import pif
from pypif.obj import *
from citrination_client import CitrinationClient

## Formatting VASP outputs as PIFs

We provide scripts to extract common conditions and properties from VASP calculations.  You just pass in a path to the calculation and it returns a PIF!

In [None]:
os.chdir("./example_data")
rundir = "./Al.cF4"
my_pif = directory_to_pif(rundir, quality_report=True)

The PIF is a lightweight schema on top of the JSON format:

In [None]:
print(pif.dumps(my_pif, indent=4)[:200])

We'll dig into PIFs more later.

## Uploading files to Citrination

PIFs and other files can be uploaded to Citrination via the `citrination_client` package.

### Setting up the client

The client authenticates with your API key, which is located on the "Account" page on Citrination.  Please keep these keys out of your source code.  Placing them in and referencing them from environment variables is a best practice. 

In [None]:
site = 'https://citrination.com' # public site
client = CitrinationClient(api_key=os.environ['CITRINATION_API_KEY'], 
                           site=site)

We'll use the same client to query and download from Citrination later.

### Creating a dataset

To upload data, we'll need to specify a dataset for it to live in. You'll only need to do this once. There are two ways to proceed.

First: This notebook uses the `citrination_client` to create a dataset for you to use in this and other tutorials. 

Second: you can create a dataset on the website using the directions below. You can also share datasets via the Teams tab.

1. Login to https://citrination.com
1. Navigate to **Add Data**.
1. Enter a name (e.g. "Tutorial dataset").
1. Select `dummy_csv.csv` from the `example_data` folder and upload it.
1. If successful, the page should automatically reload with your **dataset_id** in the URL:
     * https://citrination.com/datasets/**dataset_id**/show_files
1. Comment the cell below and set `dataset_id` to the **dataset_id** found in step 5 above.

In [None]:
# Comment this cell if you have an ID from a dataset you created via the website
dataset_name = "Tutorial dataset " + str(uuid.uuid4())[:6]
dataset = client.data.create_dataset(name=dataset_name, 
                                     description="Dataset for VASP tutorial.", 
                                     public=False)
dataset_id = dataset.id
print('Dataset created! {}/datasets/{}'.format(site, dataset_id))

### Uploading a PIF to a dataset

The client uploads files, so we write the file to `pif.json`.

We may also want to add a tag to the PIF, which will make it easier to search and filter PIFs later.

In [None]:
my_pif.tags = ["my_first_upload",]
with open(os.path.join(rundir, "pif.json"), "w") as fp:
    pif.dump(my_pif, fp)
res = client.data.upload(dataset_id, rundir)

## Hands-on: Upload your DFT calculations

* If you have your own DFT calculations, try formatting and uploading them.
* If you don't, you can use the `Al-Cu` data in `example_data`.