# Advanced PIF tutorial
*Authors: Max Hutchinson, Carena Church, Enze Chen*

In this notebook, we will explore the **Physical Information File (PIF)** structure and [`pypif`](https://github.com/CitrineInformatics/pypif/tree/master/pypif) package in greater detail. At the end, we will use the [Python Citrination Client](http://citrineinformatics.github.io/python-citrination-client/tutorial/tutorial.html) to upload the PIFs to [Citrination](https://citrination.com/).

## Learning outcomes
By the end of this notebook, you should understand:
* The basic PIF structure and its advantages.
* How to use the `pypif` package to create and save PIFs.
* How the `DataClient` can upload data to Citrination.

## Python package imports

In [1]:
# Standard packages
import random
import string
import json
from os import environ

# Third-party packages
from pypif import pif
from pypif.obj import *
from citrination_client import CitrinationClient

## The PIF structure
The PIF is a __general__, __flexible__, and __hierachical__ schema, stored as a JSON file, for representing infomation about physical devices and materials. This enables the PIF to store a wide range of information on many kinds of physical systems, but requires more careful thought on where to store information within the schema. 

* The PIF focuses on information specific to physical systems, but is sufficiently __general__ to handle a wide range of systems.
* The PIF does not impose a rigid schema, and instead is __flexible__ in exactly where data is stored.
* The PIF has a __hierarchical__ structure can store information on multiple levels.

Full documentation on the PIF schema can be [found online](http://citrineinformatics.github.io/pif-documentation/schema_definition/index.html), and the `pypif` code base is [public on GitHub](https://github.com/CitrineInformatics/pypif/tree/master).

## System

A [System](http://citrineinformatics.github.io/pif-documentation/schema_definition/system/System.html) is the primary building block in a PIF record. Systems contain three general categories of information relevant to physical systems:

* Identifiers - What is this?

* Preparation - How was this made?

* Properties - How does this perform and what are its characteristics?

When appropriate, a System can be sub-classed to provide more specific information. For example, [ChemicalSystem](http://citrineinformatics.github.io/pif-documentation/schema_definition/system/chemical/ChemicalSystem.html) contains all of the fields in System, but also adds fields for composition and chemical formula.

In [2]:
my_pif = ChemicalSystem()
print(pif.dumps(my_pif, indent=4))

{
    "category": "system.chemical"
}


The `dumps()` method used above converts a (list of) PIF objects into a JSON-encoded string.

## Identifiers
Identifiers inform the user of what system exists in each PIF record. Identifiers can take the form of:
* `chemical_formula`: A string representing the chemical formula.
* `names`: An array of strings representing the common names of the chemical system.
* `composition`: An array of [Composition](http://citrineinformatics.github.io/pif-documentation/schema_definition/system/chemical/common/Composition.html) objects specifying the elements and relative atomic/weight percentages.
* `ids`: An array of [Id](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/Id.html) objects associated with this system.

In [3]:
my_pif.chemical_formula = "Li0.0024Ni0.9976O"
print(pif.dumps(my_pif, indent=4))

{
    "category": "system.chemical",
    "chemicalFormula": "Li0.0024Ni0.9976O"
}


## Preparation
Processing information can also be included to detail how this material was made. The `preparation` field is an array of [ProcessStep](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/ProcessStep.html) objects with attributes including:
* `name`: A string representing the name of the process step.
* `details`: An array of [`Value`](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/Value.html) objects.

In [5]:
heat = ProcessStep(name="Heat Treatment")
heat.details = [Value(name="Temperature", 
                      scalars=[Scalar(value='600')], 
                      units='K')]
my_pif.preparation = [heat]
print(pif.dumps(my_pif, indent=4))

{
    "preparation": [
        {
            "name": "Heat Treatment",
            "details": [
                {
                    "name": "Temperature",
                    "scalars": [
                        {
                            "value": "600"
                        }
                    ],
                    "units": "K"
                }
            ]
        }
    ],
    "category": "system.chemical",
    "chemicalFormula": "Li0.0024Ni0.9976O"
}


## Properties
Properties describe how this material performs and its various characteristics. The `properties` field is an array of [Property](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/Property.html) objects with a diverse set of attributes, including:
* `name`: A string representing the name of the property.
* `scalars`: An array of [`Scalar`](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/Scalar.html) objects, which contain the measured values with uncertainty.
* `units`: A string representing the units of the property value.
* `conditions`: An array of `Value` objects specifying the external conditions under which the property was measured or observed.
* `data_type`: A specific string from the set `{MACHINE_LEARNING, FIT, COMPUTATIONAL, EXPERIMENTAL}`.

However, you will see below that the PIF structure is very flexible and will accept slight deviations in formats (e.g. `scalars` formatting). We recommend that you pick to a syntax you like and *stay consistent* (so don't do what we're doing here).

In [6]:
crystallinity = Property(name="Crystallinity", scalars=[Scalar(value="Polycrystalline")])

resistivity = Property(name="Resistivity", units="\Ohm*cm", 
                       scalars=[Scalar(value=28.8677), 
                                Scalar(value=0.2629), 
                                Scalar(value=0.0466)])
resistivity.conditions = [Value(name="Temperature", units="K", 
                                scalars=[Scalar(value=400), 
                                         Scalar(value=700), 
                                         Scalar(value=1000)])]

power_factor = Property(name="Power factor", units="W/mK", 
                        scalars=[Scalar(value=1.21E-4), 
                                 Scalar(value=1.66E-2), 
                                 Scalar(value=1.48E-1)])
power_factor.conditions = [Value(name="Temperature", units="K", 
                                 scalars=[Scalar(value=400), 
                                          Scalar(value=700), 
                                          Scalar(value=1000)])]

my_pif.properties = [crystallinity, resistivity, power_factor]

print(pif.dumps(my_pif, indent=4)[:800])
print('... ommitted for space ...')

{
    "properties": [
        {
            "name": "Crystallinity",
            "scalars": [
                {
                    "value": "Polycrystalline"
                }
            ]
        },
        {
            "name": "Resistivity",
            "scalars": [
                {
                    "value": 28.8677
                },
                {
                    "value": 0.2629
                },
                {
                    "value": 0.0466
                }
            ],
            "units": "\\Ohm*cm",
            "conditions": [
                {
                    "name": "Temperature",
                    "scalars": [
                        {
                            "value": 400
                        },
                        {
                   
... ommitted for space ...


## Reference information
Reference information can be included as an array of [`Reference`](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/Reference.html) objects in the `references` field. `Reference` objects have a large set of fields, including:
* `doi`: A string representing the DOI of the work.
* `url`: A string representing the URL of the website where the work can be accessed.
* `title`: A string representing the title of the work.
* `authors`: An array of [`Name`](http://citrineinformatics.github.io/pif-documentation/schema_definition/common/Name.html) objects, or just strings.

In [7]:
my_pif.references = [Reference(doi='10.1143/JJAP.38.L1336', 
                               url='https://iopscience.iop.org/article/10.1143/JJAP.38.L1336',
                               title='Li-Doped Nickel Oxide as a Thermoelectric Material',
                               authors=['Woosuck Shin', 'Norimitsu Murayama'])]
print(pif.dumps(my_pif, indent=4)[:400])
print('... ommitted for space ...')

{
    "references": [
        {
            "doi": "10.1143/JJAP.38.L1336",
            "url": "https://iopscience.iop.org/article/10.1143/JJAP.38.L1336",
            "title": "Li-Doped Nickel Oxide as a Thermoelectric Material",
            "authors": [
                "Woosuck Shin",
                "Norimitsu Murayama"
            ]
        }
    ],
    "properties": [
        {
            "na
... ommitted for space ...


## Hierarchical
In addition to having the hierarchical structure we just described, systems can also store `sub_systems` which is a list of additional `System` objects that make up the system.

In [8]:
thermoelectric_module = System(names=["Thermoelectric module"])
thermoelectric_module.sub_systems = [my_pif]
print(pif.dumps(thermoelectric_module, indent=4)[:300])
print('... ommitted for space ...')

{
    "names": [
        "Thermoelectric module"
    ],
    "subSystems": [
        {
            "references": [
                {
                    "doi": "10.1143/JJAP.38.L1336",
                    "url": "https://iopscience.iop.org/article/10.1143/JJAP.38.L1336",
                    "title": 
... ommitted for space ...


## Citrination metadata
Citrination metadata includes identifiers for querying and navigation:
* `uid`: A string representing the permanent ID associated with the system.
* `tags`: An array of strings representing tags that apply to the system.

In [9]:
my_pif.uid = "my-pif"
my_pif.tags = ["my_second_upload"]
print(pif.dumps(my_pif, indent=4)[:500])
print('... ommitted for space ...')

{
    "tags": [
        "my_second_upload"
    ],
    "references": [
        {
            "doi": "10.1143/JJAP.38.L1336",
            "url": "https://iopscience.iop.org/article/10.1143/JJAP.38.L1336",
            "title": "Li-Doped Nickel Oxide as a Thermoelectric Material",
            "authors": [
                "Woosuck Shin",
                "Norimitsu Murayama"
            ]
        }
    ],
    "uid": "my-pif",
    "properties": [
        {
            "name": "Crystallinity",
         
... ommitted for space ...


## The full PIF structure tree
The following image represents the tree structure for the major fields that you'll encounter when working with PIFs. It is not exhaustive, and is current as of `pypif 2.1.1`.

![PIF structure](fig/pif_structure.png "PIF structure")

## More on PIF I/O
Previously, we've been relying on the `dumps()` method to turn PIF objects into strings for output. We can use the `dump()` method to save PIF objects (1st argument) into a File object (2nd arg). An example is shown below.

Analogously, `loads()` and `load()` perform the same conversions in the opposite direction.

## Creating a dataset

If you have been completing the API Example tutorials in the order in which they're presented in the [README](README.md), feel free to use the dataset ID you created previously. Otherwise, the cell below will use the `citrination_client` to create a new dataset and corresponding `dataset_id`.

In [9]:
# Skip this cell if you have an ID from a dataset you created via the website

# Initialize the client
client = CitrinationClient(environ['CITRINATION_API_KEY'], 'https://citrination.com')
data_client = client.data

# The following lines create a dataset to use throughout the tutorial notebooks
random_string = ''.join([random.choice(string.ascii_uppercase + string.digits) for i in range(5)])
dataset_name = "Tutorial dataset " + random_string
dataset = data_client.create_dataset(name=dataset_name, description="Dataset for tutorial", public=False)
dataset_id = dataset._id
print(dataset_id)

174256


Now we will save our PIF object into a file and upload the file to our dataset on Citrination.

In [17]:
filename = "example_data/pif.json"
with open(filename, 'w') as fp:
    pif.dump(my_pif, fp)
result = data_client.upload(dataset_id, filename)
print('Upload successful? {0}'.format(result.successful()))

Upload successful? True


Once uploaded, a PIF can be viewed at the url http://citrination.com/pifs/{dataset_id}/{version_number}/{uid}

## Conclusion
This concludes the advanced tutorial on PIFs, where we dug into the details of the PIF hierarchy and how to use the `pypif` package. As mentioned before, you should now understand:
* The basic PIF structure and its advantages.
* How to use the `pypif` package to create and save PIFs.
* How the `DataClient` can upload data to Citrination.