# Create a database of structures

This notebook provides a short demo on how to use SMACT and the Materials Project to generate a database of materials which could be used for structure prediction.

In the example below, we generate a database of garnet materials

In [7]:
from smact.structure_prediction import prediction, database, mutation, probability_models, structure, utilities
import json
import itertools
from itertools import zip_longest
import smact

# An optional utility to display a progress bar
# for long-running loops. `pip install tqdm`.
from tqdm import tqdm
from ipywidgets import widgets

from pymatgen.ext.matproj import MPRester
from pprint import pprint
from pymatgen.analysis import structure_matcher
from pymatgen.util.plotting import pretty_plot
import pandas as pd

## Querying the MP for garnets

In [8]:
#Use the API Key
m = MPRester()
#We use a csv file downloaded from the web version of MP with the MP ids
mp_df=pd.read_csv("_Materials Project .csv")

#Extracts a list of mp-ids
mp_ids=mp_df["Materials Id"].to_list()

print(f"We have {len(mp_ids)} entries with formula A3B2C3O12")
mp_df.head(10)

We have 482 entries with formula A3B2C3O12


Unnamed: 0,Materials Id,Formula,Spacegroup,Formation Energy (eV),E Above Hull (eV),Band Gap (eV),Has Bandstructure,Volume,Nsites,Theoretical,Count,Density (gm/cc),Crystal System,Unnamed: 13
0,mp-1207933,Y3Sc2(GaO4)3,Ia3d,-3.393,0.0,3.595,False,1003.325,80,True,482.0,5.017,cubic,
1,mp-1208208,Y3Ga3(FeO6)2,Ia3d,-2.878,0.0,2.63,False,971.32,80,True,,5.331,cubic,
2,mp-1211646,Li3Lu3(TeO6)2,Ia3d,-2.798,0.0,3.112,False,918.323,80,True,,7.182,cubic,
3,mp-6527,Na3Li3In2F12,Ia3d,-3.046,0.0,5.29,True,1072.334,80,False,,3.391,cubic,
4,mp-556723,Na3Fe2(AsO4)3,Ia3d,-1.952,0.0,2.186,True,959.363,80,False,,4.136,cubic,
5,mp-6247,Na3Li3Fe2F12,Ia3d,-3.036,0.0,4.068,True,995.938,80,False,,2.864,cubic,
6,mp-15103,Sr3Y2(GeO4)3,Ia3d,-3.07,0.0,3.136,True,1165.601,80,False,,4.847,cubic,
7,mp-1210576,Nd3Sc2(FeO4)3,Ia3d,-3.175,0.0,2.125,False,1069.499,80,True,,5.479,cubic,
8,mp-1211226,Li3Y3(TeO6)2,Ia3d,-2.777,0.0,3.24,False,960.533,80,True,,5.081,cubic,
9,mp-1211470,Mn3Al2(GeO4)3,Ia3d,-2.465,0.0,2.38,False,873.231,80,True,,4.782,cubic,


In [9]:
#Query materials project
data=m.query(criteria={"task_id": {"$in": mp_ids}}, properties=["pretty_formula","material_id","spacegroup.symbol","icsd_ids","e_above_hull","exp","structure","cif"])

### Structure matching
Here, we use the structure of Ca3Fe2(SiO4)3 to filter our query data for only materials which form the garnet structure.

In [10]:
#get structure of Ca3Fe2(SiO4)3
SM=structure_matcher.StructureMatcher(attempt_supercell=True)

known_garnet=m.query("mp-6672", properties=["pretty_formula","material_id","spacegroup.symbol","icsd_ids","e_above_hull","exp","structure","cif"])
known_garnet_structure=known_garnet[0]["structure"]

In [11]:
#Iterate over query data and verify if they have the correct structure
fitted_data=[]
for i in data:
    if SM.fit_anonymous(i['structure'], known_garnet_structure):
        fitted_data.append(i)
print(len(fitted_data))



247


### Sorting out experimental data
Here, we determine which garnet structures are experimental (i.e. have a corresponding structure in the ICSD) or theoretical (i.e. has had electronic structure calculations done and submitted to the MP).


In [12]:

experimental_list=[]
theoretical_list=[]
for i in fitted_data:
    if len(i["icsd_ids"])!=0:
        experimental_list.append(i)
    else:
        theoretical_list.append(i)
print(len(experimental_list))
print(len(theoretical_list))

51
196


## Other garnet structures
Garnet materials with formulae A3B2C3D12 and X3Y5Z12 both have the same crystal structure however, the pymatgen structure matcher is unable to match a garnet with the former formula to a garnet with the latter formula. As well, querying the MP for the quarternary formula will not produce results containing the ternary formula and vice versa.

Hence, for this particular example we also query for garnets with the ternary formula.

In [13]:
#Query materials project for X3Y5O12
data_2=m.query("*3*5O12", properties=["pretty_formula","material_id","spacegroup.symbol","icsd_ids","e_above_hull","exp","structure","cif"])
print(f"There are {len(data_2)} materials with formula X3Y2Y3O12 in the MP")
print("")

#get structure of YAG
YAG=m.query("mp-3050", properties=["pretty_formula","material_id","spacegroup.symbol","icsd_ids","e_above_hull","exp","structure","cif"])
YAG_structure=YAG[0]["structure"]

#Iterate over query data and verify if they have the correct structure
fitted_data_2=[]
for i in data_2:
    if SM.fit_anonymous(i['structure'], YAG_structure):
        fitted_data_2.append(i)
print(f"Of the {len(data_2)} materials, {len(fitted_data_2)} match the structure of YAG")
print("")

#Find the number of experimental and theoretical materials
experimental_list_2=[]
theoretical_list_2=[]
for i in fitted_data_2:
    if len(i["icsd_ids"])!=0:
        experimental_list_2.append(i)
    else:
        theoretical_list_2.append(i)
print(f"There are {len(experimental_list_2)} experimental garnet structures with formula X3Y5O12")
print(f"There are {len(theoretical_list_2)} theoretical structures with formula X3Y5O12")

print(f"Considering formulas A3B2C3O12 and X3Y5O12, suggest there are {len(fitted_data)+len(fitted_data_2)} materials with the garnet structure in the Materials Project")
print("")
print(f"Of these {len(fitted_data)+len(fitted_data_2)} materials, {len(experimental_list)+len(experimental_list_2)} are experimental ")
print("")
print(f"Of these {len(fitted_data)+len(fitted_data_2)} materials, {len(theoretical_list)+len(theoretical_list_2)} are theoretical ")



There are 112 materials with formula X3Y2Y3O12 in the MP

Of the 112 materials, 43 match the structure of YAG

There are 30 experimental garnet structures with formula X3Y5O12
There are 13 theoretical structures with formula X3Y5O12
Considering formulas A3B2C3O12 and X3Y5O12, suggest there are 290 materials with the garnet structure in the Materials Project

Of these 290 materials, 81 are experimental 

Of these 290 materials, 209 are theoretical 


In [14]:
new_fitted_data=fitted_data+fitted_data_2
new_theoretical_list=theoretical_list+theoretical_list_2
new_experimental_list=experimental_list+experimental_list_2
df_new=pd.DataFrame(new_fitted_data)

df_new.to_csv("MP_garnets.csv", index=False)
print(df_new.shape)
df_new.head()

(290, 8)


Unnamed: 0,pretty_formula,material_id,spacegroup.symbol,icsd_ids,e_above_hull,exp,structure,cif
0,Na3Ti2(GeO4)3,mp-1012670,Ia-3d,[],0.037919,{'tags': ['Substituion']},"[[4.65848625 6.211315 3.1056575 ] Na, [ 7.76...",# generated using pymatgen\ndata_Na3Ti2(GeO4)3...
1,Na3V2(GeO4)3,mp-1012686,Ia-3d,[],0.056257,{'tags': []},"[[4.616742 6.155656 3.077828] Na, [7.6945700e+...",# generated using pymatgen\ndata_Na3V2(GeO4)3\...
2,Li3Cr2(GeO4)3,mp-1012879,Ia-3d,[],0.132669,{'tags': []},"[[4.51952625 6.026035 3.0130175 ] Li, [7.532...",# generated using pymatgen\ndata_Li3Cr2(GeO4)3...
3,Li3Ti2(GeO4)3,mp-1013749,Ia-3d,[],0.085313,{'tags': ['Substituion']},"[[4.567539 6.090052 3.045026] Li, [7.612565 0....",# generated using pymatgen\ndata_Li3Ti2(GeO4)3...
4,Na3Co2(GeO4)3,mp-1013794,Ia-3d,[],0.141169,{'tags': []},"[[4.54758375 6.063445 3.0317225 ] Na, [7.579...",# generated using pymatgen\ndata_Na3Co2(GeO4)3...


In [15]:
theoretical=[]
for i in df_new["icsd_ids"]:
    if len(i)!=0:
        theoretical.append("No")
    else:
        theoretical.append("Yes")
df_new["theoretical?"]=theoretical

    

## Storing in SMACT compatible database
Now that we have all filtered query data, we now proceed to store the results in a locally accessible database.

In [16]:
#This creates the database object
DB=database.StructureDB("Garnets.db")

#These create tables within the database
DB.add_table("Garnets")
DB.add_table("Experimental")
DB.add_table("Theoretical")

In [17]:
#Create an iterable of the query data
structs=[]
for i in new_fitted_data:
    structs.append(database.parse_mprest(i))
    
exp_structs=[]
for i in new_experimental_list:
    exp_structs.append(database.parse_mprest(i))
    
theo_structs=[]
for i in new_theoretical_list:
    theo_structs.append(database.parse_mprest(i))
    
#Uncomment the line below the first time you run this notebook
DB.add_structs(structs, "Garnets")
DB.add_structs(exp_structs, "Experimental")
DB.add_structs(theo_structs, "Theoretical")

Couldn't decorate mp-1012670 with oxidation states.
Couldn't decorate mp-1012686 with oxidation states.
Couldn't decorate mp-1012879 with oxidation states.
Couldn't decorate mp-1013749 with oxidation states.
Couldn't decorate mp-1013794 with oxidation states.
Couldn't decorate mp-1013795 with oxidation states.
Couldn't decorate mp-1013796 with oxidation states.
Couldn't decorate mp-1013797 with oxidation states.
Couldn't decorate mp-1013807 with oxidation states.
Couldn't decorate mp-1013808 with oxidation states.
Couldn't decorate mp-1013842 with oxidation states.
Couldn't decorate mp-1013849 with oxidation states.
Couldn't decorate mp-1013864 with oxidation states.
Couldn't decorate mp-1013916 with oxidation states.
Couldn't decorate mp-1157153 with oxidation states.
Couldn't decorate mp-1207912 with oxidation states.
Couldn't decorate mp-1210549 with oxidation states.
Couldn't decorate mp-1211091 with oxidation states.
Couldn't decorate mp-1214140 with oxidation states.
Couldn't dec

188