## AFLOW-ML API


AFLOW-ML provides an open RESTful API to directly access continuously updated ML models, which can be integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties. These types of interconnected cloud-based applications have the potential to further accelerate the adoption of machine learning methods into materials development.


### API structure


The AFLOW-ML API is structured around a REpresentational State Transfer (REST) architecture, which allows resources to be accessed using HTTP request methods. Each resource is located at an **endpoint**, which is identified by a URL comprised of descriptive nouns.


Resources within the API are represented in JavaScript Object Notation (JSON), and are referred to as **objects**. Once at an endpoint, the user must specify how to interact with the object. This is referred to as an **action**, and is an HTTP request method. The API supports the two most common HTTP request methods, ```GET``` and ```POST```, where ```GET``` fetches an object and ```POST``` sends user defined data to the server. Therefore, users interact with the API by performing actions (```GET```, ```POST```) on endpoints (URLs) to retrieve objects (resources).

#### Endpoints

All endpoints are located at the base URL https://aflow.org/API/aflow-ml/ and are organized by the model and the returned object.

| Action | Endpoint               |        Object          |
| ----  |:-----------------------:| ----------------------:|
| POST  | /plmf/prediction        | Task                   |
| POST  | /mfd/prediction         | Task                   |
| GET   | /prediction/result/{id} |   Status or prediction |

In general, API usage involves uploading a material structure to a POST endpoint, {model}/prediction, and retrieving a prediction object from a GET endpoint, /prediction/result/{id}, as shown in the flowchart below:

![AFLOW-ML workflow](https://drive.google.com/uc?id=1seImUEYSDQ7Kn3GLhMIf9zAsZZjJVx3P)

```POST``` endpoints are responsible for the submission of a material structure for a prediction. In their request body, the file keyword is required. It must contain a string representation of the material’s crystal structure, in POSCAR 5 format.

Upon receiving a request, the response body returns a **task object** containing information about the submitted structure, which has the following format:

```JavaScript
{
     "id": String,
     "model": String,
     "results_endpoint": String
}
```

When a material is posted to the API, a prediction task is created and added to a queue. Each task is assigned an identifier, the id keyword, used to fetch the prediction object at the endpoint referenced in the results_endpoint keyword. This endpoint, /prediction/result/{id}, supports the ```GET``` method and requires the id as a path parameter. Depending on the status of the prediction task, the response body returns a status object or prediction object. 

```JavaScript
// status object 
{
     "status": String,
     "description": String
}
```

Prediction objects are returned depending on the targeted model. Currently we support two models: molar fraction descriptor (mfd) and property-labeled materials fragments (plmf).

### ML models

#### property-labeled materials fragments (plmf)
The plmf model represents each crystal structure as a colored graph, where the atomic vertices are decorated by the reference properties of the corresponding elemental species. Topological neighbors are determined using a Voronoi tessellation, and these nodes are connected to form the graph. The final feature vector for the ML model is obtained by partitioning the full graph into smaller subgraphs, termed fragments in analogy with the fragment-based descriptors used in cheminformatics. All plmf models are built with the Gradient Boosting Decision Tree (GBDT) method. Models for electronic and thermo-mechanical properties were trained on 26,674 and 2,829 materials entries from the AFLOW repository, respectively. All models are validated through Y-randomization (label scrambling) and fivefold cross validation, with coefficient of determination (r^2) values in excess of 0.9 for most quantities, while the mean average errors are 0.035 eV (electronic band gap), 8.68 GPa (bulk modulus), 10.6 GPa (shear modulus), 35.9 K (Debye temperature), 0.05 K_B/atom (heat capacity at constant pressure) and 0.04 K_B/atom (heat capacity at constant volume), and the root mean square errors are 0.51 eV (electronic band gap), 14.3 GPa (bulk modulus), 18.4 GPa (shear modulus), 57.0 K (Debye temperature), 0.09 K_b/atom (heat capacity at constant pressure) and 0.07 K_b/atom (heat capacity at constant volume). Further details on the model training and validation can be found here: https://www.nature.com/articles/ncomms15679


The plmf prediction object has the following structure:

```JavaScript
// plmf prediction object 
{
     "status": String,
     "description": String,
     "model": String,
     "citation": String,
     "ml_egap_type": String,
     "ml_egap": Number,
     "ml_energy_per_atom": Number,
     "ml_ael_bulk_modulus_vrh": Number,
     "ml_ael_shear_modulus_vrh": Number,
     "ml_agl_debye": Number,
     "ml_agl_heat_capacity_Cp_300K": Number,
     "ml_agl_heat_capacity_Cv_300K": Number,
     "ml_agl_heat_capacity_Cp_300K_per_atom": Number,
     "ml_agl_heat_capacity_Cv_300K_per_atom": Number,
     "ml_agl_thermal_conductivity_300K": Number,
     "ml_agl_thermal_expansion_300K": Number
}
```

#### molar fraction descriptor (mfd)
The mfd model predicts the material properties based on the chemical formula only: the vector of descriptors has 87 components, each component being the mole fraction of element in the compound ( is H, is He, etc.). The model is built with nonlinear support vector machines and a radial basis function kernel. The model is trained using a data set of 292 randomly selected compounds of the ICSD for which the vibrational properties are computed with DFT calculations. Pearson and Spearman correlations for k-fold cross-validation are in excess of 0.9 for all predicted properties, while the mean average errors are 13.2 meV/atom (vibrational free energy) and 0.037 meV/(atom K) (vibrational entropy) and the root mean square errors are 18.8 meV/atom (vibrational free energy) and 0.052 eV/(atom K) (vibrational entropy). Further details on the model training and validation can be found in here: https://pubs.acs.org/doi/abs/10.1021/acs.chemmater.7b00789

The mfd prediction object takes the following structure:

```JavaScript
// mfd prediction object 
{
     "status": String,
     "description": String,
     "model": String,
     "citation": String,
     "ml_Cv": Number,
     "ml_Fvib": Number,
     "ml_Svib": Number
}
```




## Using the API

The process to retrieve a prediction is as follows: First, the contents of a POSCAR 5 file (i.e. the geometry input file format for version 5 of VASP), titled test.poscar, are uploaded to the submission endpoint. This can be achieved by using an HTTP library such as Requests (Python), URLSession (iOS SDK), HttpURLConnection (Java), Fetch (JavaScript) or using a command line tool such as Wget or cURL.


### cURL example

For this example the UNIX utility cURL will be used. If cURL is not available on your machine, you can use the python example below. 

First, let's download test.poscar:

In [None]:
!curl -s https://aflowlib.duke.edu/AFLOW_SCHOOL/AFLOW_ML/test.poscar -o test.poscar
!cat test.poscar

The contents of a POSCAR are posted to the submission endpoint as follows:

In [None]:
!curl https://aflow.org/API/aflow-ml/plmf/prediction --data-urlencode file@test.poscar

the ```--data-urlencode``` flag handles encoding the contents of the poscar named ```test.poscar``` found in the current directory. From the JSON response above we will use the task id to poll the results endpoint.

In [None]:
!curl https://aflow.org/API/aflow-ml/prediction/result/<INSERT_YOUR_TASK_ID_HERE>

### Python example

For this example the Python 3 Requests library will be used. 

The contents of a POSCAR are posted to the submission endpoint as follows:

In [None]:
#!/usr/bin/python3
import json, sys, os
from pprint import pprint
from time import sleep
from urllib.parse import urlencode
from urllib.request import urlopen
from urllib.request import Request
from urllib.error import HTTPError

#set server and model
SERVER="https://aflow.org"
API="/API/aflow-ml"
MODEL="plmf"

#encode the structure file
poscar=open('test.poscar', 'r').read()
encoded_data = urlencode({'file': poscar,}).encode('utf-8')

#submit structure file to API and retrieve response
url = SERVER + API + "/" + MODEL + "/prediction"
request_task = Request(url, encoded_data)
task = urlopen(request_task).read()
task_json = json.loads(task.decode('utf-8'))
results_endpoint = task_json["results_endpoint"]
results_url = SERVER + API + results_endpoint

#print response
pprint(results_url)

The prediction can be retrieved by pasting the results_url string provided into the <results_url> placeholder in the code below:

In [None]:
#!/usr/bin/python3
import json, sys, os
from pprint import pprint
from time import sleep
from urllib.parse import urlencode
from urllib.request import urlopen
from urllib.request import Request
from urllib.error import HTTPError

#set server and model
SERVER="https://aflow.org"
API="/API/aflow-ml"
MODEL="plmf"

results_url = "<RESULTS_URL>"

#retrieve response from results_url
request_results = Request(results_url)
results = urlopen(request_results).read()
results_json = json.loads(results)

#print status and results
if results_json["status"] == 'PENDING':
    print("AFLOW-ML prediction pending")
elif results_json["status"] == 'STARTED':
    print("AFLOW-ML prediction started")
elif results_json["status"] == 'FAILURE':
    print("Error: AFLOW-ML prediction failure")
elif results_json["status"] == 'SUCCESS':
    print("AFLOW-ML successful prediction")
    pprint(results_json)

The POSCAR submission and results retrieval steps can be combined into a single code as shown below:

In [None]:
#!/usr/bin/python3
import json, sys, os
from pprint import pprint
from time import sleep
from urllib.parse import urlencode
from urllib.request import urlopen
from urllib.request import Request
from urllib.error import HTTPError

#set server and model
SERVER="https://aflow.org"
API="/API/aflow-ml"
MODEL="plmf"

#encode the structure file
poscar=open('test.poscar', 'r').read()
encoded_data = urlencode({'file': poscar,}).encode('utf-8')

#submit structure file to API and retrieve response
url = SERVER + API + "/" + MODEL + "/prediction"
request_task = Request(url, encoded_data)
task = urlopen(request_task).read()
task_json = json.loads(task.decode('utf-8'))
results_endpoint = task_json["results_endpoint"]
results_url = SERVER + API + results_endpoint

#retrieve response from results_url in a loop until task is complete
incomplete = True
while incomplete:
    request_results = Request(results_url)
    results = urlopen(request_results).read()
    results_json = json.loads(results)
    if results_json["status"] == 'PENDING':
        sleep(10)
        continue
    elif results_json["status"] == 'STARTED':
        sleep(10)
        continue
    elif results_json["status"] == 'FAILURE':
        print("Error: AFLOW-ML prediction failure")
        incomplete = False
    elif results_json["status"] == 'SUCCESS':
        print("AFLOW-ML successful prediction")
        pprint(results_json)
        incomplete = False

## Exercises

1. Convert the following to the VASP 5 POSCAR format by adding an additional line with the list of elements in alphabetical order, right underneath the lattice vectors. For example, modify this VASP 4 format POSCAR and run the cell. This will save as `MgO_rocksalt.poscar`:

In [None]:
%%writefile MgO_rocksalt.poscar 
MgO/AB_cF8_225_a_b.AB params=5.63931 SG=225 [ANRL doi: 10.1016/j.commatsci.2017.01.017 (part 1), doi: 10.1016/j.commatsci.2018.10.043 (part 2)]
1.000000
   0.00000000000000   2.81965500000000   2.81965500000000
   2.81965500000000   0.00000000000000   2.81965500000000
   2.81965500000000   2.81965500000000   0.00000000000000
1 1 
Direct(2) [A1B1] 
   0.00000000000000   0.00000000000000   0.00000000000000  Mg    
   0.50000000000000   0.50000000000000   0.50000000000000  O     

2. Copy and modify the code above to retrieve the PLMF properties for this structure. Is it a metal or an insulator? What are its bulk and shear modulus values?

In [None]:
#!/usr/bin/python3
import json, sys, os
from pprint import pprint
from time import sleep
from urllib.parse import urlencode
from urllib.request import urlopen
from urllib.request import Request
from urllib.error import HTTPError

#set server and model
SERVER="https://aflow.org"
API="/API/aflow-ml"
MODEL="plmf"

#encode the structure file
poscar=open('MgO_rocksalt.poscar', 'r').read()
encoded_data = urlencode({'file': poscar,}).encode('utf-8')

#submit structure file to API and retrieve response
url = SERVER + API + "/" + MODEL + "/prediction"
request_task = Request(url, encoded_data)
task = urlopen(request_task).read()
task_json = json.loads(task.decode('utf-8'))
results_endpoint = task_json["results_endpoint"]
results_url = SERVER + API + results_endpoint

#retrieve response from results_url in a loop until task is complete
incomplete = True
while incomplete:
    request_results = Request(results_url)
    results = urlopen(request_results).read()
    results_json = json.loads(results)
    if results_json["status"] == 'PENDING':
        sleep(10)
        continue
    elif results_json["status"] == 'STARTED':
        sleep(10)
        continue
    elif results_json["status"] == 'FAILURE':
        print("Error: AFLOW-ML prediction failure")
        incomplete = False
    elif results_json["status"] == 'SUCCESS':
        print("AFLOW-ML successful prediction")
        pprint(results_json)
        incomplete = False

3. Use the MFD model to predict its vibrational free energy and entropy.

In [None]:
#!/usr/bin/python3
import json, sys, os
from pprint import pprint
from time import sleep
from urllib.parse import urlencode
from urllib.request import urlopen
from urllib.request import Request
from urllib.error import HTTPError

#set server and model
SERVER="https://aflow.org"
API="/API/aflow-ml"
MODEL="plmf"

#encode the structure file
poscar=open('MgO_rocksalt.poscar', 'r').read()
encoded_data = urlencode({'file': poscar,}).encode('utf-8')

#submit structure file to API and retrieve response
url = SERVER + API + "/" + MODEL + "/prediction"
request_task = Request(url, encoded_data)
task = urlopen(request_task).read()
task_json = json.loads(task.decode('utf-8'))
results_endpoint = task_json["results_endpoint"]
results_url = SERVER + API + results_endpoint

#retrieve response from results_url in a loop until task is complete
incomplete = True
while incomplete:
    request_results = Request(results_url)
    results = urlopen(request_results).read()
    results_json = json.loads(results)
    if results_json["status"] == 'PENDING':
        sleep(10)
        continue
    elif results_json["status"] == 'STARTED':
        sleep(10)
        continue
    elif results_json["status"] == 'FAILURE':
        print("Error: AFLOW-ML prediction failure")
        incomplete = False
    elif results_json["status"] == 'SUCCESS':
        print("AFLOW-ML successful prediction")
        pprint(results_json)
        incomplete = False