Materials APIS:

APIs or Application Programming Interfaces are a set of functions to access features or data of an operating system or application.


The most common type of API is known as a REST or REpresentational State Transfer. These can be typically used to GET and PUT data onto a system such as a server. Many materials databases use APIs for researchers to obtain data.
In this notebook, we will mostly be exploring how to obtain data from APIs so we can use it in python scripts.

In [2]:
import json
from urllib import request
import pandas as pd

Let us try to access an API. We will use a very simple API that gets random pictures of dogs.


In [3]:
# You will first notice that this API is a link. Follow the link by copying and pasting it into your browser. You will be granted with some strange lines of text. 
# This text is in a format called JSON (JavaScript Object Notation) which is very similar to a python dictionary object
DOG_API = 'https://dog.ceo/api/breeds/image/random'

#Now lets get the data we want into our code. First, since this file is in json, we will call a JSON loader and open the url, and read it. Also, we want to decode it into UTF
response = json.loads(request.urlopen(DOG_API).read().decode('utf-8'))

#Lets print our response. You will notice it is in the python dictionary format (which is similar to the json format)

print(response)


{'message': 'https://images.dog.ceo/breeds/basenji/n02110806_4024.jpg', 'status': 'success'}


In [4]:
# You will see that this dictionary has 2 keys and 2 corresponding values. The first one is 'status' the next one is 'message'. This api conviently tells you if it worked or not.
# other ways to see it is to see if you got an error. The type and format of the information displayed is not always the same so be sure to always read documentation, or print out the output to see it yourself! 
# Dog CEO API Docs: https://dog.ceo/dog-api/documentation/random

#Now that we have our response, lets create a function that will give us a random dog link. This could be useful for some project, but not for the rest of this notebook.

#the -> str means that the function must return a string.
def get_random_dog() -> str:
    response = json.loads(request.urlopen(DOG_API).read().decode('utf-8'))
    if response['status'] == 'success':
        return response['message']
    else:
        return "Oops! There has been an error"

dog_link = get_random_dog()

#Fun fact: If you are typing something that would obstruct code, there are escapers! \' indicates a ' as a string not part of code, \n -- new line, \\ - backslash, \" double quote, \t - tab, \b - backspace.
print("Here\'s a funny dog link! " + dog_link)




Here's a funny dog link! https://images.dog.ceo/breeds/whippet/n02091134_18590.jpg


In [20]:
#Lets try a bit of more advanced code. APIs can sometimes be unreliable and not return data. Sometimes their servers are down or some of their data is invalid.
#Here we will write some code that will deal with that and still continue to run! This is important if you are downloading a lot of things and do not want to obstruct the code.

# -> str indicates that this function will always return a string, otherwise it will throw an error.
def get_random_dog2(API) -> str:
    try:
      response = json.loads(request.urlopen(API).read().decode('utf-8'))
      if response['status'] == 'success':
          return response['message']
      else:
        return ""
    except request.HTTPError as e:
        print("Oops! Theres been a web error: Error Code: " + str(e))
        return ""
    except Exception as e:
        print("Oops there has been a different error! Stack Trace: \n" + str(e))
        return ""

dog_link = get_random_dog2(DOG_API)

print("Here\'s a funny dog link! " + dog_link)

#Lets try a messed up link
BAD_API = "https://www.google.com"
dog_link = get_random_dog2(BAD_API)

#Another one
BAD_API = "asdlkj;afjlkslkdfal;jks"
dog_link = get_random_dog2(BAD_API)
#another
dog_link = get_random_dog2(DOG_API + 'asdasdasdasdas')

#This code will run even though the function is error prone!
print("The code does not stop!")

#Keep in mind however, as you get better at python, there are still better ways to handle exceptions, but for now, this will work well.

Here's a funny dog link! https://images.dog.ceo/breeds/pug/n02110958_14683.jpg
Oops there has been a different error! Stack Trace: 
Expecting value: line 1 column 1 (char 0)
Oops there has been a different error! Stack Trace: 
unknown url type: 'asdlkj;afjlkslkdfal;jks'
Oops! Theres been a web error: Error Code: HTTP Error 404: Not Found
The code does not stop!


Activity 1: <br> Read the documentation from https://dog.ceo/dog-api/documentation/random. Write a function that will take in  a number, n, as the amount of dog pictures. Return a list of dog pictures. 
<br>Bonus Activity:
<br>Add an optional "breed" parameter that narrows down the search to what breed of dog you want to return. Bonus points if you can throw an exception if the request fails, and print out why it failed. (Invalid breed? Server failure?)
<br>Furthermore, you can make all parameters optional and if no parameters are specified, default to one image. Return a list of size 0 or greater. (empty if fails)


In [19]:
#YOUR CODE HERE


def get_n_random_dog(n = 1,breed = None) -> list:
    dog_list = []
   # if n == None:
   #   n = 1
    API = f'https://dog.ceo/api/breed/{breed}/images/random/{n}'
    if breed == None:
      API = f'https://dog.ceo/api/breeds/image/random/{n}'
    try:
      response = json.loads(request.urlopen(API).read().decode('utf-8'))
      if response['status'] == 'success':
          return response['message']
      else:
        print("Oops! There has been an error")
    except request.HTTPError as e:
        print("Oops! Theres been a web error: Error Code: " + str(e))
    except Exception as e:
        print("Oops there has been a different error! Stack Trace: \n" + str(e))
    return dog_list

corgi = get_n_random_dog(breed='corgi')
dog_fetch2 = get_n_random_dog(n = 100 , breed='corgi')
print(len(dog_fetch2))


100


# AFLOW AFLUX API

In [7]:
#Lets try to access some materials databases APIs. First, we will try AFLOW Aflux as it is similar to the dog API.

#Lets open up the documentation for reference. Follow this link for a basic usage: http://aflow.org/API/aflux/ and this one for more advanced usage https://www.sciencedirect.com/science/article/pii/S0927025614003322?via%3Dihub
# Or run this block to get the documentation in text.
#It is very important to know how to read documentation and figure how each database works! Usually, they come with some code tutorial as a quickstart guide.

response = request.urlopen('http://aflow.org/API/aflux/?help').read().decode('utf-8')
print(response)


[
    "       Welcome to Aflux - The Aflow search API (version 1.0)                   ",
    "            Aflux home:   https://aflow.org/API/aflux/?                        ",
    "                                                                               ",
    "Aflux is in its first stable release cycle (read, we think it works now).      ",
    "                                                                               ",
    "                                                                               ",
    "The purpose of Aflux is to expose our collection of materials data via         ",
    "arbitrary set based restrictions on any of the properties of interest. By      ",
    "default, matched results are returned as JSON serialized data. An Aflux        ",
    "summons is written entirely in the query portion of the Aflux home URL:        ",
    "        http://aflow.org/API/aflux/?<summons>                                  ",
    "                                    

In [8]:
# Now lets actually use aflow. In this example we will get EVERY compound with an energygap greater than 0.5 eV. 

# First lets set our base url
AFLOW_API = 'http://aflow.org/API/aflux/'

# Now lets make a summons example. You can query so many properties using aflow. Heres just a few of them. You can add data columns to your request by adding whatever the property is plus a (*)
#Example. If I want to show the formation enthalpy per atom I would add enthalpy_formation_atom(*)

AFLOW_SUMMONS = 'Egap(!*0.5),catalog(ICSD),$paging(0),energy_atom(*),prototype(*),enthalpy_formation_atom(*),ael_bulk_modulus_reuss(*)'

# Combine the two to make a link.
AFLOW_REQUEST = AFLOW_API + '?' + AFLOW_SUMMONS

# You could technically use the commented line and skip these steps, but its nice to have your summons separated so you can adjust it. 
# 
# AFLOW_REQUEST = 'http://aflow.org/API/aflux/?Egap(!*0.5),catalog(ICSD),$paging(0),energy_atom(*),prototype(*)
#
# 
#Now lets make the request. Keep in mind it will take a bit.
response = json.loads(request.urlopen(AFLOW_REQUEST).read().decode('utf-8'))

# Lets also dump all the data into a file so we don't have to make another request. If you are on google collab, keep in mind that this .json won't save so 
# this is better on a localhost (personal computer running python) or a server.

with open('My_Aflow_Query.json', 'w') as f:
    json.dump(response, f)

#A trick with aflow is to use the search function Go here http://aflowlib.org/search/?search=
# Select "property filter", add or specify any filters you want or just add a checkmark to add that data to your summons.
# Press search. Scroll down a little bit and click on the "aflux summons" and copy and paste that url as your aflow summons and proceed.

In [9]:
#Open the json file just saved.
with open('My_Aflow_Query.json', 'r') as f:
    aflow_data = json.load(f) #This is wierd -- we used json.load() instead of json.loads(). The key difference between the two is that load() loads a file, loads() loads strings.

#Lets get the size -- you will notice theres a lot of data.
print("The size of the data list is " + str(len(aflow_data)))

#lets print the first 3 entries -- theres a lot of entries
print(aflow_data[0:3])

#This is a very raw but machine readable way to read data. If you look very carefully, you will notice it is a list of dictionaries. Each dictionary represents a database entry which represents a material.

The size of the data list is 768
[{'compound': 'Si34', 'auid': 'aflow:336d342eccc91ed6', 'aurl': 'aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Si136_ICSD_56721', 'spacegroup_relax': 227, 'Pearson_symbol_relax': 'cF136', 'Egap': 0.5324, 'catalog': 'ICSD', 'energy_atom': -5.08412, 'prototype': 'Si136_ICSD_56721', 'enthalpy_formation_atom': 0.339606, 'ael_bulk_modulus_reuss': 61.4778}, {'compound': 'K8Te12', 'auid': 'aflow:cf40cb0bd363aec2', 'aurl': 'aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/ORC/K2Te3_ICSD_2453', 'spacegroup_relax': 62, 'Pearson_symbol_relax': 'oP20', 'Egap': 0.5325, 'catalog': 'ICSD', 'energy_atom': -3.04685, 'prototype': 'K2Te3_ICSD_2453', 'enthalpy_formation_atom': -0.723416, 'ael_bulk_modulus_reuss': 14.6771}, {'compound': 'O1V1', 'auid': 'aflow:224a84fc80fdcc46', 'aurl': 'aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/O1V1_ICSD_28681', 'spacegroup_relax': 225, 'Pearson_symbol_relax': 'cF8', 'Egap': 0.5352, 'catalog': 'ICSD', 'energy_atom': -7.84912, 'prototype': 'O1V1_ICSD_28681', 

In [10]:
#Lets use Pandas software to organize this. You can convert any list of dictionaries into a dataframe. We will use this data later.
aflow_df = pd.DataFrame(aflow_data)
#lets display it

display(aflow_df)

Unnamed: 0,compound,auid,aurl,spacegroup_relax,Pearson_symbol_relax,Egap,catalog,energy_atom,prototype,enthalpy_formation_atom,ael_bulk_modulus_reuss
0,Si34,aflow:336d342eccc91ed6,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Si136...,227,cF136,0.5324,ICSD,-5.084120,Si136_ICSD_56721,0.339606,61.47780
1,K8Te12,aflow:cf40cb0bd363aec2,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/ORC/K2Te3...,62,oP20,0.5325,ICSD,-3.046850,K2Te3_ICSD_2453,-0.723416,14.67710
2,O1V1,aflow:224a84fc80fdcc46,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/O1V1_...,225,cF8,0.5352,ICSD,-7.849120,O1V1_ICSD_28681,-0.826067,155.70000
3,As4Mg6,aflow:e6aeabf7626b5275,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/CUB/As2Mg...,224,cP10,0.5419,ICSD,-3.284240,As2Mg3_ICSD_24485,-0.467180,48.85000
4,Rb8Sb8,aflow:8cf08ce1df285278,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/ORC/Rb1Sb...,19,oP16,0.5447,ICSD,-2.973860,Rb1Sb1_ICSD_14030,-0.548019,11.39940
...,...,...,...,...,...,...,...,...,...,...,...
763,F4Si1,aflow:28bd553cc68bd28a,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/BCC/F4Si1...,217,cI10,7.7111,ICSD,-5.715250,F4Si1_ICSD_14122,-3.142880,5.65556
764,Be6F12,aflow:57e8275b6787e4d9,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/BCC/Be1F2...,217,cI36,8.0374,ICSD,-5.764990,Be1F2_ICSD_173557,-3.278300,8.97033
765,Be1O1,aflow:c316b8d2b8b69038,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Be1O1...,225,cF8,8.1830,ICSD,-6.621540,Be1O1_ICSD_162676,-2.285460,241.17800
766,F1Li1,aflow:6ae27b3e38086da7,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/F1Li1...,225,cF8,8.7442,ICSD,-4.845470,F1Li1_ICSD_18012,-2.963190,70.40560


# Materials Project

Materials project uses an API but to access it, we have to use an API client to access the data. API clients are software packages that can be imported to give us functions to access their database, the results will be very similar though.


In [11]:
#Uncomment this and run if you are on google collab. Notebooks can perform terminal commands with a ! operator in front of it, this installs the package if you do not have it.
#!pip install mp_api
#
# Heres another linux command just for fun. cd = current directory and just gets the current directory
#!cd

In [12]:
from mp_api.client import MPRester

# Go to https://materialsproject.org/api sign up, and get an API key. Paste it in this variable. DO NOT SHARE YOUR KEY.
#
# What is an API key? 
# For many APIs, you are required to get a key.  A key is basically a unique string of random 
# numbers and letters that authenticates you. 
# API keys are implemented for safety reasons. Without it, people could access and/or modify programs/databases they shouldn't be able to. 
# Also, the requests are ratelimited so you can only request so much per minute. Its not an issue for most individuals,
# but a company or research group may want to do a lot of request so they can get a special key that lets them do so
# more than other users.
#
# In the future if you leak your API key, most sites will let you regenerate that key (invalidate the old key, and get you a new one)
# MP does not do that, so email them if that happens. 

with open('api_key.txt', 'r') as f:
    MY_API_KEY = f.read()
    
#Optionally, 
#MY_API_KEY = "<YOUR-API-KEY>"

# Explore these docs https://docs.materialsproject.org/downloading-data/using-the-api/querying-data. This will tell you how to do things.
# This specific query gets all ABO3 (Perovskites) and gets a bunch of fields for them including band gap, formation energy, hull.... etc. you can see the list.

with MPRester(MY_API_KEY) as mpr:
    list_of_available_fields = mpr.summary.available_fields
    print(list_of_available_fields)
    data = mpr.summary.search(formula=["**O3"], fields=["material_id","formula_pretty", "formation_energy_per_atom","band_gap","theoretical","nsites","energy_above_hull", "symmetry",])

    
    
mp_df = pd.DataFrame([d.dict() for d in data])
mp_df.drop(columns=["fields_not_requested"], inplace=True)
mp_df = mp_df.dropna()
display(mp_df)

  from tqdm.autonotebook import tqdm




Retrieving SummaryDoc documents:   0%|          | 0/2544 [00:00<?, ?it/s]

Unnamed: 0,nsites,formula_pretty,symmetry,material_id,formation_energy_per_atom,energy_above_hull,band_gap,theoretical
0,20,NaNbO3,"{'crystal_system': Orthorhombic, 'symbol': 'Pm...",mp-4681,-2.833432,0.012851,2.3684,False
1,10,MnZnO3,"{'crystal_system': Trigonal, 'symbol': 'R-3', ...",mp-754318,-1.814413,0.005794,1.3225,True
2,20,MnAlO3,"{'crystal_system': Orthorhombic, 'symbol': 'Pn...",mp-1368992,-2.573144,0.161167,0.5506,True
3,30,AlBiO3,"{'crystal_system': Hexagonal, 'symbol': 'P6_3c...",mp-1376082,-2.046846,0.517672,0.7618,True
4,5,BaBiO3,"{'crystal_system': Cubic, 'symbol': 'Pm-3m', '...",mp-545783,-2.222888,0.021211,0.0000,False
...,...,...,...,...,...,...,...,...
2539,10,YAlO3,"{'crystal_system': Trigonal, 'symbol': 'R-3c',...",mp-756214,-3.694554,0.045807,4.9586,True
2540,20,ErScO3,"{'crystal_system': Orthorhombic, 'symbol': 'Pn...",mp-1212985,-3.972788,0.040889,4.4171,True
2541,5,AlSnO3,"{'crystal_system': Cubic, 'symbol': 'Pm-3m', '...",mp-1426094,-1.354196,1.318631,0.0000,True
2542,20,CoAgO3,"{'crystal_system': Triclinic, 'symbol': 'P-1',...",mp-1273356,-0.749871,0.082212,0.0000,True


Another Data base: Open Quantum Materials Database (OQMD) https://static.oqmd.org/static/docs/restful.html. You can do the same thing as before with Aflux and MP. Follow documents and usage guide. Note: I'm pretty sure the client is broken so make calls like aflow.


Activity 2: Lets get some data. Write a code that will get the following featues for ALL cubic structures. Use any database, MP or aflow, or both.



*   Composition/compound
*   Bulk modulus voight (k voight)
*   Formation energy

<br>Bonus: save it as a json file. Querying large data sets takes a lot of time. Saving it in json lets you access the data again without having to do a query. Also display it as a dataframe. Also save it as a .csv

In [13]:
#YOUR CODE HERE

API = 'http://aflowlib.org/API/aflux/?$catalog(ICSD),crystal_system(Cubic),$paging(0),enthalpy_formation_atom(*),ael_bulk_modulus_voigt(*)'
response = json.loads(request.urlopen(API).read().decode('utf-8'))
with open('My_Aflow_Query.json', 'w') as f:
    json.dump(response, f)


aflow_df = pd.DataFrame(response)
aflow_df

Unnamed: 0,compound,auid,aurl,spacegroup_relax,Pearson_symbol_relax,crystal_system,enthalpy_formation_atom,ael_bulk_modulus_voigt
0,Ca1S1,aflow:007edcc20ee1c898,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Ca1S1...,225,cF8,cubic,-2.155440,57.1111
1,F9Y3,aflow:012e952be3f6311f,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/CUB/F3Y1_...,221,cP12,cubic,-3.921170,96.7167
2,Li2Se1,aflow:01a70da6b0c35513,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Li2Se...,225,cF12,cubic,-1.266900,33.8722
3,C3N4,aflow:02e917f98e603284,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/CUB/C3N4_...,215,cP7,cubic,0.695172,388.6500
4,Ca8H4N4,aflow:032b67f4287490a2,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/Ca2H1...,227,cF64,cubic,-0.796537,55.7889
...,...,...,...,...,...,...,...,...
654,K1N1,aflow:fe952ef1f59cda0a,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/FCC/K1N1_...,225,cF8,cubic,1.349030,19.5611
655,Sr1Tl1,aflow:fe4e8bc133dd6bd4,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/CUB/Sr1Tl...,221,cP2,cubic,-0.432300,26.3667
656,Cs1F1,aflow:ff3e8a3ebb991d74,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/CUB/Cs1F1...,221,cP2,cubic,-2.552680,24.8444
657,C1Tl1Y3,aflow:ff2e821a5a105eeb,aflowlib.duke.edu:AFLOWDATA/ICSD_WEB/CUB/C1Tl1...,221,cP5,cubic,-0.495610,77.3389


In [14]:
#MP Way

with MPRester(MY_API_KEY) as mpr:
    list_of_available_fields = mpr.summary.available_fields
    print(list_of_available_fields)
    data = mpr.summary.search(fields=["material_id","formula_pretty", "formation_energy_per_atom","symmetry","k_voigt","crystal_system"], crystal_system = 'Cubic')
mp_df = pd.DataFrame([d.dict() for d in data])
mp_df.drop(columns=["fields_not_requested"], inplace=True)
mp_df = mp_df.dropna()
display(mp_df)






Retrieving SummaryDoc documents:   0%|          | 0/20977 [00:00<?, ?it/s]

Unnamed: 0,formula_pretty,symmetry,material_id,formation_energy_per_atom,k_voigt
11,Be2CuPt,"{'crystal_system': Cubic, 'symbol': 'Fm-3m', '...",mp-865869,-0.459644,171.238009
27,YIr,"{'crystal_system': Cubic, 'symbol': 'Pm-3m', '...",mp-30746,-0.820320,128.497981
36,Fe15Co,"{'crystal_system': Cubic, 'symbol': 'Pm-3m', '...",mp-18695,-0.010986,184.151059
37,ZrCr2,"{'crystal_system': Cubic, 'symbol': 'Fd-3m', '...",mp-903,-0.032432,181.142668
43,TiMn2W,"{'crystal_system': Cubic, 'symbol': 'Fm-3m', '...",mp-865656,-0.241861,243.149827
...,...,...,...,...,...
20925,Hf2FeIr,"{'crystal_system': Cubic, 'symbol': 'Fm-3m', '...",mp-864890,-0.670987,187.835987
20940,MgInPd2,"{'crystal_system': Cubic, 'symbol': 'Fm-3m', '...",mp-865043,-0.672360,108.786250
20943,La3Hf,"{'crystal_system': Cubic, 'symbol': 'Pm-3m', '...",mp-973024,0.198071,39.275499
20960,K2TiCl6,"{'crystal_system': Cubic, 'symbol': 'Fm-3m', '...",mp-27839,-2.184339,11.876738


In [15]:
mp_df['symmetry'][1]

KeyError: 1