In [None]:
import warnings
warnings.filterwarnings('ignore')

# About this notebook

* Author: Anubhav Jain
* Github repo: https://github.com/computron/pymatgen_tutorials
* YouTube video: https://youtu.be/e4hSkv1Ghbk

## Important update!

Pymatgen has decided to part ways with using mp-api as of 4/16/2025. This notebook has been updated to use mp-api directly instead of through pymatgen. More information in the v2025.4.16 update here: https://pymatgen.org/CHANGES.html

![alt text](graphics/title.png "Learn Pymatgen Part 3: Getting Data with the Materials Project API")

# What is the Materials Project API?

* The Materials Project (https://www.materialsproject.org) is a **free** web site / database / design tool that contains  high quality simulated properties of various materials via density functional theory calculations.
* The API allows users to download the various data sets via computer programs such as ``pymatgen``.
    * Note: the API is written in a RESTful manner which means that the API is not tied to any one programming language or software library.
* Common uses of the data are to:
    * find materials with specific properties
    * help support a scientific analysis with theoretical data
    * use the data to train a machine learning model

    


  

# What data is available through the Materials Project API?
* At the time of this tutorial (August 2024), the data set contains:
    * &gt;150,000 materials structures
    * &gt;170,000 molecules structures
    * &gt;70,000 electronic band structures / density of states
    * &gt;50,000 x-ray absorption spectra
    * &gt;10,000 elastic tensors
    * &gt;7,000 dielectric tensors
    * &gt;3,000 piezoelectric tensors
    * &gt;1,500 phonon band structures / phonon DOS
    * (among other properties ...)

# Optional note #1: New API vs legacy API

* As of the  time of this tutorial (August 2024), there are currently **two** APIs for the Materials Project.
* The new API allows you to access the most recent data sets, offers more features, and will be more future-proof to learn.
* This tutorial will only cover the **new** API. It will **not** cover the legacy API.

![alt text](graphics/api_comparison.png "Comparison of old and new Materials Project API")





# Optional note #2: pymatgen's MPRester and the new API


* There has been a bit of a messy history between the Materials Project API being accessed through pymatgen or through a dedicated library called ``mp-api``.
* As of 4/16/2025, pymatgen no longer supports ``mp-api`` use. See changelog: https://pymatgen.org/CHANGES.html
* This tutorial has therefore been updated to import and use ``mp-api`` directly without involving pymatgen. pymatgen maintains its own built-in functionality for querying the Materials Project API, but this is not covered in this tutorial.

# Required set up: Getting and setting your API Key

* You can get your API key at: https://next-gen.materialsproject.org/api

# Required set up: installing the ``mp-api`` library

* **To use the new API, you should install the ``mp-api`` library by running the following command**:
    * ``pip install mp-api``
* If you do not install ``mp-api``, pymatgen still offers built-in functionality for using the Materials Project API. However, that will not be covered in this tutorial.


# Importing MPRester

* If your set up is correct, you should be able to import ``MPRester`` and the type of object you get should be ``mp_api.client.mprester.MPRester``.
* Note that when creating the ``MPRester`` object, you either need to supply your API key or configure your API key using the pymatgen configuration file (better for hiding your key from others)
    * Briefly, you can run the following command to enter your API key into your configuration file: ``pmg config --add PMG_MAPI_KEY <USER_API_KEY>``
    * Full instructions on configuring your API key: https://pymatgen.org/usage.html

In [None]:
from mp_api.client import MPRester
mpr = MPRester()  # use this line if you set up the configuration file
# mpr = MPRester("YOUR_API_KEY")  # use this line to enter your API key manually

print(type(mpr)) # should be 'mp_api.client.mprester.MPRester'

# Retrieving data using MPRester: built-in convenience functions

* MPRester contains many convenience functions for getting many common types of data such as crystal structures
* To use these functions, you typically need to know the Materials Project ID (``material_id``) of the material you want data for
* You can get the ``material_id`` by browsing the Materials Project web site (www.materialsproject.org) or by performing a search via the API.
* We'll demonstrate API searches later, for now we will use some known ``material_id`` values to get data.

In [None]:
# Retrieve the crystal structure for a specific material by its Materials Project ID

material_id = "mp-2534"  # Example material ID for GaAs

# Fetch structure for the material
structure = mpr.get_structure_by_material_id(material_id)

# Print fetched data
print(f"--Structure:--\n {structure}")

In [None]:
# Retrieve the electronic band structure and DOS by the Materials Project ID 
bs = mpr.get_bandstructure_by_material_id("mp-2534")  # mp-2534 is GaAs
dos = mpr.get_dos_by_material_id("mp-2534")

# Plot using pymatgen
from pymatgen.electronic_structure.plotter import BSDOSPlotter
bsp = BSDOSPlotter()
ax_bs, ax_dos = bsp.get_plot(bs, dos)

In [None]:
# Retrieve the phonon band structure by Materials Project ID
pbs = mpr.get_phonon_bandstructure_by_material_id("mp-406")  # mp-406 is CdTe

# Plot using pymatgen
from pymatgen.phonon.plotter import PhononBSPlotter
plotter = PhononBSPlotter(pbs)
plt = plotter.get_plot()

In [None]:
# Get the Wulff shape of a material (currently available for selected elements only)
ws = mpr.get_wulff_shape("mp-135")  # mp-135 is Li
ws.get_plot()

# Using Sub-Resters for Additional Functionality

* The built-in convenience functions of ``MPRester`` are just a small fraction of the functionality available through the Materials Project REST API
* To access other functions and data as well as to search for materials, you need to use one of the many "Sub-Resters" of MPRester
* Each "Sub-Rester" helps retrieve a particular type of data from the Materials Project using one of the REST API endpoints
    * The API endpoints are listed here: https://next-gen.materialsproject.org/api#accessing-data
    * These API endpoints correspond to the ``suffix`` parameter of the Sub-Rester
* Sub-Resters helps parallelize development, maintenance, and testing of the code across these various endpoints
    * This greatly helps development and maintenance of the REST API and also helps with data transfer efficiency
    * However, it can make usage more difficult for users because you need to find the appropriate Sub-Rester, and you may need to combine information from multiple Sub-Resters to accomplish your goal

![alt text](graphics/subresters.png "SubResters and API interaction")

# Example 1: Summary Sub-Rester

* One of the most common Sub-Resters you may want to use is the ``SummaryRester``
* The ``SummaryRester`` is the closest you get to a "one-stop-shop" when searching for materials data. It allows you to search over a variety of different fields and retrieve many kinds of information about a material.

![alt text](graphics/SummaryRester.png "Diagram of the SummaryRester")

In [None]:
summary_subrester = mpr.materials.summary  # Sub-Resters are usually attributes of MPRester

results = summary_subrester.search(elements=["Si", "O"], # Si & O are required, but other elements also allowed
                                   exclude_elements=["Ca"], # no Calcium allowed
                                   num_elements=3, # 3 unique elements, i.e., ternaries
                                   band_gap=(0.5, 1.0))  # band gap from 0.5 - 1.0 eV

print(f"The number of returned materials is: {len(results)}")
print("--First material (SummaryDoc)--")
print(results[0])  # This is a SummaryDoc object for this Rester

In [None]:
# Accessing attributes of the SummaryDoc document
print(f"--Structure:--\n {results[0].structure}")
print(f"Band gap: {results[0].band_gap}")

In [None]:
# Being data-efficient by restricting 'fields'

# This will result in faster queries and less data transfer
results = mpr.materials.summary.search(elements=["Si", "O"], 
                                       num_elements=3,
                                       exclude_elements=["Ca"], # no Calcium allowed
                                       band_gap=(0.5, 1.0),
                                       fields=["material_id",  # We will just retrieve the data in these fields
                                               "band_gap", 
                                               "symmetry",
                                               "composition",
                                               "origins"])
print(results[0])

In [None]:
# Getting the individual calculations associated with certain properties - these are called "Tasks"
# Note that this is not complete, so to get the origins of other properties you may need to use the Sub-Rester for that property
print(results[0].origins)

# Sub-Rester Example 2: Task

* Recall that a single material may have many calculations (Tasks) associated with it, and the overall data from a material is combination of data taken from several Tasks.
* The ``TaskRester`` allows you to look up the details of an individual calculation, including input parameters and outputs
* It also has a convenience function to get the trajectory of a calculation (all steps in a structure relaxation)


![alt text](graphics/TaskRester.png "Diagram of the TaskRester")

In [None]:
results = mpr.materials.tasks.search(task_ids=["mp-1792681"])  # this is a "static" calculation

print("--input.incar--")
print(results[0].input.parameters)
print("--output.structure--")
print(results[0].output.structure)
print("--output.forces--")
print(results[0].output.forces)
print("--output.energy--")
print(results[0].output.energy)

# note: more information is in the 'calcs_reversed' key, but not demo'ed here
print("--input--")
print(results[0].calcs_reversed[0].input)
print("--output--")
print(results[0].calcs_reversed[0].output)

In [None]:
traj = mpr.materials.tasks.get_trajectory("mp-19017")  # example of a custom function in a SubRester
print(traj[0])  # the entire relaxation trajectory as pymatgen.core.trajectory.Trajectory.as_dict()

# Sub-Rester Example 3: Elasticity

* Many materials properties have their own Sub-Rester associated with them, which allows you to search on those properties
* For example, to access the full elastic tensor and derived quantities such as sound velocity estimated from the elastic constants, one would use the ``ElasticityRester``

![alt text](graphics/ElasticityRester.png "Diagram of the ElasticityRester")



In [None]:
elasticity = mpr.elasticity.search("mp-2534")[0]  # mp-2534 is GaAs
print(elasticity)

In [None]:
print(f"--elastic tensor--\n: {elasticity.elastic_tensor}")
print(f"--elastic-constant-derived sound velocity--\n {elasticity.sound_velocity}")

# Note that one needs context for the data
# Below is NOT the thermal conductivity of crystalline GaAs but rather the glassy limit
print(f"--elastic-constant-derived glassy limit of thermal conductivity--\n {elasticity.thermal_conductivity}")

# Advanced searches: using multiple Resters

* Sometimes, you are not able to perform the query or get the data you want using a single Sub-Rester
* In this example, we want to search using both band gap and the total static dielectric constant, and get detailed data on both properties
* In that case, you need to merge information from multiple Sub-Resters
* A procedure for doing this is demonstrated next
![alt text](graphics/nested_search.png "Nesting search using multiple Resters")

In [None]:
from emmet.core.summary import HasProps

# Use SummaryRester for part 1 of search
search_1 = mpr.materials.summary.search(has_props=[HasProps.dielectric], 
                                        band_gap=[1.5, 3], 
                                        elements=["O"], 
                                        fields=["material_id", "band_gap"])
search_1_data_dict = {x.material_id: x for x in search_1}
search_1_mpids = search_1_data_dict.keys()

# Use DielectricRester for part 2 of search
search_2 = mpr.materials.dielectric.search(material_ids=search_1_mpids, 
                                           e_total=[5, 10], 
                                           fields=["material_id", "e_total", 
                                                   "e_ionic", "e_electronic", 
                                                   "composition"])
search_2_data_dict = {x.material_id: x for x in search_2}
search_2_mpids = search_2_data_dict.keys()

# (repeat the above steps for other steps to filter the data as needed)

In [None]:
# Now merge the data
from collections import namedtuple
MaterialsData = namedtuple("MaterialsData", ["summary_data", "dielectric_data"])  # container for the data

all_data = []
for mpid in search_2_mpids:  # only the mpids matching both criteria
    all_data.append(MaterialsData(summary_data=search_1_data_dict[mpid], 
                                  dielectric_data=search_2_data_dict[mpid]))

# Note that if you want the band_gap, you need to get it from summary_data.
# But if you want the e_total, you need to get it from dielectric_data!
print(f"Number of results: {len(all_data)}")
print("----First result properties----")
print(f"--band gap--\n {all_data[0].summary_data.band_gap}")
print(f"--static dielectric constant--\n{all_data[0].dielectric_data.e_total}")
print(f"--all summary data retrieved--\n{all_data[0].summary_data}")
print(f"--all dielectric data retrieved--\n{all_data[0].dielectric_data}")


# Being faster and more data-efficient with Materials Project

Knowing how to be data-efficient when using the REST API is good for several reasons. It is helpful for you because it results in much faster results and lower data/memory usage on your side when conducting analyses. It is also much better for the Materials Project because it avoids unnecessary data transfer costs. There are several things you can do to make your queries faster and more data-efficient:

* As already covered, restrict the data returned to the specific fields of interest, to the extent possible:
```
with MPRester("your_api_key_here") as mpr:
    docs = mpr.materials.summary.search(fields=["material_id", "volume", "elements"])

```

* If you are just exploring / testing queries and don't want to wait for thousands of results to be retrieved, use ``num_chunks=1`` and ``chunk_size=10`` parameters when calling ``search()`` to limit to 10 example results. This works for all searches with all Resters and avoids unnecessary calls:
```
with MPRester("your_api_key_here") as mpr:
    mpr.summary.search(band_gap=[0,10], num_chunks=1, chunk_size=10)
```

* If you need to get data for many materials, pass the ``materials_ids`` as a list. This minimizes the number of calls to the API (i.e., don't call ``search()`` thousands of times!):
```
with MPRester("your_api_key_here") as mpr:
    docs = mpr.materials.summary.search(material_ids=["mp-149", "mp-13", "mp-22526"])
```

* For more tips, see https://docs.materialsproject.org/downloading-data/using-the-api/tips-for-large-downloads

# More information

More information about the Materials Project API can be found in the official docs: https://docs.materialsproject.org/downloading-data/how-do-i-download-the-materials-project-database