[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AntObi/Materials-Project-tip-and-tricks/blob/master/next_gen/simple_queries.ipynb)

# Accessing data from the Materials Project (next-gen)

You will need to get your API key from the Materials Project site (https://next-gen.materialsproject.org/api).

Do note that the API key from the next-gen site is different from the legacy site.

## Install dependencies

In [1]:
!pip install pymatgen

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:

from pymatgen.ext.matproj import MPRester
from tqdm.notebook import tqdm
import pandas as pd


In [3]:
#@title Enter your Materials Project API key
MP_API_KEY = "4Ib91crOo7Uwxc0J021oHawyASCKnIVr" #@param {type:"string"} 

## Getting structures

Let's say we want to find all the structures which contained Lithium and had a band gap higher than 1 eV. We can directly query the MP.
To query for a particular element, we use the `elements` parameter. To query for a particular band gap value we use the `band_gap` parameter. The criteria passed to `MPRester` is as follows:
```
elements =['Li'] # We pass a list of elements we want to the elements parameter

band_gap = (1,None) # We pass a tuple of the range of values to the band_gap parameter. (1,None) indicates band_gap values greater than 1.
```


For the parameters that can be used in a Materials Project query, see the documentation (https://api.materialsproject.org/docs#/).
Do note that some parameters and fields are specific to a particular endpoint.

For very simple queries, we will primarily be using the `Summary` endpoint.

`mpr.summary.search` enables us to use the API to search the summary endpoint.


In [4]:
# Query the Materials project

with MPRester(MP_API_KEY) as mpr:
    docs = mpr.summary.search(elements=['Li'],
                                        band_gap=(1,None),
                                        fields=['material_id','formula_pretty', 'structure'])

print(len(docs))





Retrieving SummaryDoc documents:   0%|          | 0/9287 [00:00<?, ?it/s]

9287


In [5]:
# We can convert the query data to a list of dictionaries and store them as a dataframe

query_dict = [{'material_id':doc.material_id, 'formula_pretty':doc.formula_pretty, 'structure':doc.structure} for doc in docs]

df=pd.DataFrame(query_dict)
df.head()

Unnamed: 0,material_id,formula_pretty,structure
0,mp-863431,Li2Si3NiO8,"[[8.60083548 0.39698545 5.08828293] Li, [3.560..."
1,mp-18860,Li2VSiO5,[[-6.00941540e-05 1.49899528e-04 2.26998892e...
2,mp-12829,LiCaGaF6,"[[2.585103 1.49251265 2.46952125] Li, [ 2.58..."
3,mp-29463,LiBeN,"[[1.04966183 2.16036022 4.18114524] Li, [1.167..."
4,mp-777465,Li3P3(WO6)2,"[[8.63653362 2.86967262 2.42977237] Li, [ 7.72..."


We could refine our query by using another parameter
For example, we could filter out radioactivate elements and trainsition metals in our query using the `exclude_elements` parameter.

In [6]:
# A list of radioactive elements
radioactive_elements=['Tc', 'Pm', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr']

# A list of transition metal elements excluding Scandium (Sc), Yttrium (Y), Zirconium (Zr) and Niobium (Nb)
transition_metals = ['Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'La', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Ac']

# Merge the lists
not_wanted = radioactive_elements + transition_metals

# Query the Materials project 

with MPRester(MP_API_KEY) as mpr:
    docs = mpr.summary.search(elements=['Li'],
                                exclude_elements=not_wanted,
                                    band_gap=(1,None),
                                    fields=['material_id','formula_pretty', 'structure'])

print(len(docs))

# Convert the list of SummaryDoc objects to a list of dictionaries
query_dict = [{'material_id':doc.material_id, 'formula_pretty':doc.formula_pretty, 'structure':doc.structure} for doc in docs]

df=pd.DataFrame(query_dict)
df.head()



Retrieving SummaryDoc documents:   0%|          | 0/2125 [00:00<?, ?it/s]

2125


Unnamed: 0,material_id,formula_pretty,structure
0,mp-12829,LiCaGaF6,"[[2.585103 1.49251265 2.46952125] Li, [ 2.58..."
1,mp-29463,LiBeN,"[[1.04966183 2.16036022 4.18114524] Li, [1.167..."
2,mp-760650,Li3Bi(PO4)2,"[[0.14229163 3.34725413 2.69849632] Li, [-0.14..."
3,mp-560463,Li2B4O7,"[[1.94472202 3.3890018 2.40886452] Li, [3.389..."
4,mp-771890,LiSnAsCO7,"[[3.35029459 4.52254884 7.52931378] Li, [6.722..."


## Experimental materials

Using the API, we can also directly query for theoretical materials. The parameter `theoretical` is used to flag whether a material is theoretical.


### How many experimental materials are in Materials Project?

We can query the Materials Project for the material ids of all the materials which are not theoretical.

In [7]:
#
with MPRester(MP_API_KEY) as mpr:
    docs = mpr.summary.search(theoretical=False, fields=['material_id'])

print(f'In the Materials Project there are {len(docs)} experimental materials.')



Retrieving SummaryDoc documents:   0%|          | 0/49601 [00:00<?, ?it/s]

In the Materials Project there are 49601 experimental materials.


### How many experimental Lithium materials with a band gap >1eV, and including neither radioactive elements nor transition metals (except for Zr, Y, Sc, Nb)?

In [8]:
with MPRester(MP_API_KEY) as mpr:
    docs = mpr.summary.search(elements=['Li'],
                                exclude_elements=not_wanted,
                                    band_gap=(1,None),
                                    theoretical=False,
                                    fields=['material_id','formula_pretty', 'structure'])

print(len(docs))


query_dict = [{'material_id':doc.material_id, 'formula_pretty':doc.formula_pretty, 'structure':doc.structure} for doc in docs]

df=pd.DataFrame(query_dict)
df.head()

Retrieving SummaryDoc documents:   0%|          | 0/834 [00:00<?, ?it/s]

834


Unnamed: 0,material_id,formula_pretty,structure
0,mp-12829,LiCaGaF6,"[[2.585103 1.49251265 2.46952125] Li, [ 2.58..."
1,mp-29463,LiBeN,"[[1.04966183 2.16036022 4.18114524] Li, [1.167..."
2,mp-560463,Li2B4O7,"[[1.94472202 3.3890018 2.40886452] Li, [3.389..."
3,mp-570948,LiCaGaN2,"[[0.73696657 3.6844017 2.43078959] Li, [6.237..."
4,mp-557962,SrLiBS3,"[[0.20049776 1.08455836 2.29301875] Sr, [3.757..."


## What properties can be queried?

We can query the summary endpoint for a wide criteria. We can check the documentation to see what arguments can be used as criteria for our query.

In [9]:
with MPRester(MP_API_KEY) as mpr:
     print(mpr.summary.search.__doc__)


        Query core data using a variety of search criteria.

        Arguments:
            band_gap (Tuple[float,float]): Minimum and maximum band gap in eV to consider.
            chemsys (str, List[str]): A chemical system, list of chemical systems
                (e.g., Li-Fe-O, Si-*, [Si-O, Li-Fe-P]), or single formula (e.g., Fe2O3, Si*).
            crystal_system (CrystalSystem): Crystal system of material.
            density (Tuple[float,float]): Minimum and maximum density to consider.
            deprecated (bool): Whether the material is tagged as deprecated.
            e_electronic (Tuple[float,float]): Minimum and maximum electronic dielectric constant to consider.
            e_ionic (Tuple[float,float]): Minimum and maximum ionic dielectric constant to consider.
            e_total (Tuple[float,float]): Minimum and maximum total dielectric constant to consider.
            efermi (Tuple[float,float]): Minimum and maximum fermi energy in eV to consider.
            el



Running the cell above shows that there are quite an extensive number of arguments meaning that we can filter the Materials Project data quite efficiently.
