![Banner logo](../fig/citrine_banner.png)

# Compare Band Gaps From Citrination and Materials Project

*Authors: Carena Church, Enze Chen*

This notebook demonstrates retrieval of data through the Citrination API client using [MatMiner's](https://github.com/hackingmaterials/matminer) tools to retrieve experimental band gaps from [Citrine's databases](https://citrination.com/), output it in the form of a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html#pandas.DataFrame), and then compare them with computed band gaps from the [Materials Project](https://www.materialsproject.org/) (MP).

**WARNING**: Explicit structural information was not checked for or extracted from the experimental datasets this notebook uses, and thus, the below example makes a comparison of the experimental band gap from Citrine with the computed band gap of the most stable structure from MP. Therefore, it is assumed here that the band gaps obtained from Citrine and MP correspond to the same structure for a particular composition, which may not always be true. In cases where this is not true, the comparison is faulty.

## Prerequisites

* Have the [`matminer`](https://pypi.org/project/matminer/) package installed (`pip` installable using `pip install matminer`).

## Python package imports

In [None]:
# Standard packages
import os

# Third-party packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Set pandas view options
pd.set_option('display.width', 1000)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from matminer.data_retrieval.retrieve_Citrine import CitrineDataRetrieval
from pymatgen import MPRester

# Filter warnings messages from the notebook
import warnings
warnings.filterwarnings('ignore')

## Step 1: Retrieve data

We will first import MatMiner's CitrineDataRetrieval tool and create an adapter to the Citrination API.

In [None]:
c = CitrineDataRetrieval(os.environ.get('CITRINATION_API_KEY'))

Then we will retrieve the first 100 experimental band gaps from the Citrination database and show only selected columns in a pandas DataFrame.

In [None]:
df = c.get_dataframe(properties=['Band gap'], common_fields=['chemicalFormula'],
                     criteria={'data_type':'EXPERIMENTAL', 'max_results':100},
                     print_properties_options=False)

# Filter out rows with null values of band gap
df = df.dropna()

# Rename column
df = df.rename(columns={'Band gap': 'Experimental band gap'})

# Show first few rows of the DataFrame
df.head()

## Step 2: Obtain MP band gaps

Now we will create a function that for each composition, gets a list of structures from MP. We will loop through this list to get the computed band gap of the structure with the lowest energy. 

### `get_MP_bandgaps`

In [None]:
def get_MP_bandgaps(formula):
    try:
        struct_lst = MPRester().get_data(formula)     # API key set as env variable "MAPI_KEY"
    except:
        return pd.Series({'Computed band gap': None})
    if len(struct_lst) > 0:
        struct_lst = sorted(struct_lst, key=lambda e: e['energy_per_atom'])
        most_stable_entry = struct_lst[0]
        return pd.Series({'Computed band gap': most_stable_entry['band_gap']})
    else:
        return pd.Series({'Computed band gap': None})

Then, we apply the above function to each composition in the "chemicalFormula" column of the previous DataFrame to get a column of computed band gaps from MP, and concatenate it with the original DataFrame.

In [None]:
mp_df = df.apply(lambda x: get_MP_bandgaps(x['chemicalFormula']), axis=1)
df = pd.concat([df, mp_df], axis=1)
df.head()

## Step 3: Plot the comparison

Now we will plot experimental vs. computed band gaps.

In [None]:
fig, ax = plt.subplots()
plt.scatter(df['Experimental band gap'].astype(float), df['Computed band gap'].astype(float))
plt.xlabel('Experimental band gap')
plt.ylabel('Computed band gap')
plt.show()

## Step 4: Compute the error

In [None]:
rmse = ((df['Experimental band gap'].astype(float) - df['Computed band gap'].astype(float)) ** 2).mean() ** .5
print('The RMSE is {0:.4f} eV.'.format(rmse))

## Conclusion

This notebook demonstrates the extraction of material property data, in this case band gaps, from the APIs of Citrination (through MatMiner) and Materials Project in a Pandas dataframe, and plotting this data using matplotlib. From this comparison, it can be observed that there are many materials for which the experimental band gaps are higher than the computed band gaps. This is expected due to the well-known problem of underestimation of band gaps by Density Functional Theory (using LDA and GGA methods that involve self-interaction errors), and hence verifies our analysis to some extent. 