# Materials data used in the paper

<font size=4> 
    
The material structures and properties used in their article of [Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties](https://arxiv.org/pdf/1710.10324.pdf)  are downloaded from the following two open datasets.

- The Materials Project database [[link](https://www.materialsproject.org)]
- The Perovskite database [[link](https://cmr.fysik.dtu.dk/cubic_perovskites/cubic_perovskites.html)]

They cannot redistribute these two datasets. Users should download these data by themselves and convert them into proper formats, so this notebook is intended to do just that.

To reproduce their results exactly, they provide three CSV files that include the materials IDs of the crystals they used in their paper. Note that the Materials Project database is contantly updating, so the structures and properties may have changed since the publication of their paper.
    
What we will do first is to read the smallest CSV file 

In [30]:
import csv #Package to read csv files

In [31]:
materials_id=[] #List to store the materials project id
with open('/home/mlgraphs/CGCNN/cgcnn/data/material-data/mp-ids-46744.csv') as csv_file: #Open the csv file
    csv_reader = csv.reader(csv_file) #Read the csv file 
    for row in csv_reader: #Loop over the rows
        materials_id.append(row[0]) #Add the material id to the list

In [21]:
print('The materials ID of the first crystal is: ',materials_id[0])
print('The total number of crystals is: ',len(materials_id))

The materials ID of the first crystal is:  mp-754118
The total number of crystals is:  46744


In [22]:
from pymatgen.ext.matproj import MPRester  #Pymatgen (Python Materials Genomics) is a robust, 
#open-source Python library for materials analysis.

In [23]:
MAPI_KEY = "lqhpkTOZo39u7SAV"  # Materials API key
mp_id = materials_id[0]  # mp-id 
mpr = MPRester(MAPI_KEY)  # object for connecting to MP Rest interface
mp_id=mpr.get_materials_id_from_task_id(mp_id)
structure = mpr.get_structure_by_material_id(mp_id) #read structure in pymatgen
structure #print structure summary

Structure Summary
Lattice
    abc : 4.947149500881695 4.947149500881695 5.523342
 angles : 90.0 90.0 119.99999332516595
 volume : 117.06920404587619
      A : 2.473575 -4.284357 0.0
      B : 2.473575 4.284357 0.0
      C : 0.0 0.0 5.523342
PeriodicSite: Sr (0.0000, 0.0000, 0.0000) [0.0000, 0.0000, 0.0000]
PeriodicSite: As (2.4736, 1.4281, 2.7617) [0.3333, 0.6667, 0.5000]
PeriodicSite: As (2.4736, -1.4281, 2.7617) [0.6667, 0.3333, 0.5000]
PeriodicSite: O (1.5275, 2.6457, 3.8195) [0.0000, 0.6175, 0.6915]
PeriodicSite: O (0.9461, 1.6387, 1.7039) [0.0000, 0.3825, 0.3085]
PeriodicSite: O (1.8922, -0.0000, 3.8195) [0.3825, 0.3825, 0.6915]
PeriodicSite: O (3.0549, -0.0000, 1.7039) [0.6175, 0.6175, 0.3085]
PeriodicSite: O (0.9461, -1.6387, 1.7039) [0.3825, 0.0000, 0.3085]
PeriodicSite: O (1.5275, -2.6457, 3.8195) [0.6175, 0.0000, 0.6915]

In [24]:
from pymatgen.io.cif import CifWriter #Package to convert pymatgen structure to cif file

In [7]:
CifWriter(structure,0.1).write_file('data/complete-data/'+mp_id+'.cif') #Convert to a cif file and 
#store it in data/complete-data

In [11]:
current=[]

In [29]:
#Just run the cell once to create cif files
for i in range(1744):
    i=45000+i
    mp_id_i = materials_id[i]
    mp_id_i=mpr.get_materials_id_from_task_id(mp_id_i)
    structure_i = mpr.get_structure_by_material_id(mp_id_i)
    CifWriter(structure_i,0.1).write_file('data/complete-data/'+mp_id_i+'.cif')
    if i%100==0:
        print(i)

45000
45100
45200
45300
45400
45500
45600
45700
45800
45900
46000
46100
46200
46300
46400
46500
46600
46700
