![Banner logo](../templates/fig/citrine_banner_2.png "Banner logo")

# PyCC Data Views API Tutorial

*Authors: Enze Chen, Eric Lundberg*

In this notebook, we will cover how to *create* a data view using the [Citrination API](http://citrineinformatics.github.io/python-citrination-client/). Data views provide the configuration necessary in order to perform machine learning and identify relationships in your data. We will demonstrate this functionality using the [Band gaps from Strehlow and Cook](https://citrination.com/datasets/1160/show_search?searchMatchOption=fuzzyMatch) dataset, where we will create a view mapping: 

$$\text{Chemical formula (inorganic) + Crystallinity (categorical)} \longrightarrow \boxed{\text{ML model}} \longrightarrow \text{Band gap (real)}$$

## Table of contents
1. [Learning outcomes](#Learning-outcomes)
1. [Background knowledge](#Background-knowledge)
1. [Imports](#Python-package-imports)
1. [DataViewBuilder](#DataViewBuilder)
1. [DataViewsClient](#DataViewsClient)
1. [View properties](#Data-view-properties-and-analysis)
1. [ModelsClient](#ModelsClient-methods)
1. [Conclusion](#Conclusion)
1. [Additional resources](#Additional-resources)

## Learning outcomes

[Back to ToC](#Table-of-contents)

By the end of this tutorial, you will know how to:
* Create [`DataViewBuilder`](https://github.com/CitrineInformatics/python-citrination-client/blob/master/citrination_client/views/data_view_builder.py) objects.
* Create new data views from existing data using the [`DataViewsClient`](https://github.com/CitrineInformatics/python-citrination-client/blob/master/citrination_client/views/client.py).
* Perform operations on views using the `DataViewsClient`.

## Background knowledge

[Back to ToC](#Table-of-contents)

In order to get the most out of this tutorial, you should already be familiar with the following:
* Create and access datasets through the API ([documentation](http://citrineinformatics.github.io/python-citrination-client/tutorial/data_examples.html) and [tutorial](1_data_client_api_tutorial.ipynb)).
* What the data views [front-end UI](https://citrination.com/data_views) looks like.

## Python package imports

[Back to ToC](#Table-of-contents)


In [None]:
# Standard packages
import json
import os
import time
import uuid # generating random IDs

# Third-party packages
from citrination_client import *
from citrination_client.views.data_view_builder import DataViewBuilder

## DataViewBuilder

[Back to ToC](#Table-of-contents)

The [`DataViewBuilder`](https://github.com/CitrineInformatics/python-citrination-client/blob/master/citrination_client/views/data_view_builder.py) class handles the configuration for data views and returns a **configuration** object that is an input for the `DataViewsClient`. The configuration specifies the datasets, model, and descriptors. Some of the important parameters to note are:
* **dataset_ids**: An array of strings, one for each dataset ID that should be included in the view.
* **descriptors**: A descriptor instance, which could be `{RealDescriptor, InorganicDescriptor, OrganicDescriptor, CategoricalDescriptor,` or `AlloyCompositionDescriptor}`.
    * **Note 1**: Chemical formulas for the API take the key `formula`.
    * **Note 2**: Properties take the key `Property <property name>`.
* **roles**: A role for each descriptor, as a string, which could be `{input, output, latentVariable, ignored}`.

In [None]:
# Create ML configuration
dv_builder = DataViewBuilder()
dv_builder.dataset_ids(['172242']) # ID number for band gaps dataset

# Define descriptors
crystallinity = ['Single crystalline', 'Polycrystalline', 'Amorphous'] # Obtained from dataset
desc_crystal = CategoricalDescriptor(key='Property Crystallinity', categories=crystallinity)
dv_builder.add_descriptor(descriptor=desc_crystal, role='input')

desc_formula = InorganicDescriptor(key='formula', threshold=1.0) # threshold <= 1.0; default in future releases
dv_builder.add_descriptor(descriptor=desc_formula, role='input')

desc_bandgap = RealDescriptor(key='Property Band gap', lower_bound=0.0, upper_bound=1e2, units='eV')
dv_builder.add_descriptor(descriptor=desc_bandgap, role='output')

# Build the configuration once all the pieces are in place
view_config = dv_builder.build()

## DataViewsClient

[Back to ToC](#Table-of-contents)

After obtaining your customized configuration, you have to initialize a [`DataViewsClient`](https://github.com/CitrineInformatics/python-citrination-client/blob/master/citrination_client/views/client.py) instance in order to create a data view from the configuration you built. The `create()` method returns the ID for the data view, which you will need for subsequent analysis and retraining.

In [None]:
# Instantiate the base CitrinationClient
site = 'https://citrination.com'
client = CitrinationClient(api_key=os.environ.get('CITRINATION_API_KEY'), site=site)

# Instantiate the DataViewsClient
views_client = client.data_views

# Create a data view using the above configuration and store the ID
view_name = 'PyCC View ' + str(uuid.uuid4()) # random name to avoid clashes
view_desc = 'This view was created by the PyCC API tutorial.'
view_id = views_client.create(configuration=view_config, name=view_name, description=view_desc)
print('Data view {} was successfully created.'.format(view_id))
print('It can be accessed at {}/data_views/{}.'.format(site, view_id))

## Data view properties and analysis

[Back to ToC](#Table-of-contents)

Now that the view is on your Citrination site, you can use the ID to do a variety of analyses. For example, you can obtain the metadata in JSON format for easy extraction with the `get()` method.

In [None]:
view_metadata = views_client.get(view_id)
print('Name of view: {}'.format(view_metadata['name']))
print('Column names: {}'.format(view_metadata['selected_columns']))
print('Descriptor roles: {}'.format(view_metadata['configuration']['roles']))

### Check status of services
If there's a lot of data, training might take some time, and you might want to check when `predict` services are ready. Other possible services include `experimental_design`, `data_reports`, and `model_reports`.

In [None]:
# Use a loop to monitor status
while True:
    predict_state = views_client.get_data_view_service_status(view_id).predict.reason
    print(predict_state)
    if predict_state == 'Predict services are ready.':
        break
    time.sleep(10)

### Deleting a view
You can delete views very easily through the API, so handle with care!

In [None]:
# views_client.delete(id=view_id)

## ModelsClient methods

[Back to ToC](#Table-of-contents)

The `ModelsClient` is now a [linked attribute](https://github.com/CitrineInformatics/python-citrination-client/blob/c1c34b6f848e8bfcdaf1bb5619ea54afe18220c2/citrination_client/views/client.py#L21) of the `DataViewsClient`, so methods such as `retrain()`, `predict()`, and `submit_design_run()` can be used with the `view_id` we just created.

In [None]:
models_client = views_client.models # the original way is client.models; both return the same object

We'll leave the tutorial for the `ModelsClient` and its associated methods as [a separate notebook](3_models_client_api_tutorial.ipynb).

## Conclusion

[Back to ToC](#Table-of-contents)

To recap, this notebook went through the steps for creating a data view using the API.
1. First, we used the `DataViewBuilder` object to specify the configuration.
2. Then, we trained the model, which is simple as long as the configuration is correct.
3. Lastly, we explored some of the post-processing capabilities, such as retraining and submitting predictions.

## Additional resources

[Back to ToC](#Table-of-contents)

It's now possible to conduct the major aspects of the Citrination workflow through the API, which should increase the speed and flexibility of informatics approaches. Some other topics that might interest you include:
* More details regarding client functions in the [code base](https://github.com/CitrineInformatics/python-citrination-client/blob/master/citrination_client/views/client.py).
* [DataClient](http://citrineinformatics.github.io/python-citrination-client/tutorial/data_examples.html) - This allows you to create datasets and upload PIF data (only) using the API.
  * There is also a corresponding [tutorial](1_data_client_api_tutorial.ipynb).
* [ModelsClient](http://citrineinformatics.github.io/python-citrination-client/tutorial/models_examples.html) - This allows you to submit predict and design runs using the API.
  * There is also a corresponding [tutorial](3_models_client_api_tutorial.ipynb).
  * The `ModelsClient` is actually [linked as an attribute](https://github.com/CitrineInformatics/python-citrination-client/blob/c1c34b6f848e8bfcdaf1bb5619ea54afe18220c2/citrination_client/views/client.py#L21) of the `DataViewsClient`.