# Materials Project Integration

This notebook demonstrates how to use CrystalMath with the Materials Project database.
You will learn how to:

1. Set up your Materials Project API key
2. Search for materials by chemical formula
3. Fetch structures by mp-id
4. Run analysis on fetched structures
5. Compare calculated values with MP database values

**Prerequisites:**
- CrystalMath installed with MP support (`pip install crystalmath[all]`)
- Materials Project API key (free registration at materialsproject.org)

## 1. Set Up Materials Project API Key

You need a free API key from Materials Project. Get one at:
https://materialsproject.org/api

There are several ways to configure your API key:

In [None]:
import os

# Option 1: Set environment variable (recommended for security)
# os.environ["MP_API_KEY"] = "your-api-key-here"

# Option 2: The key can be stored in ~/.config/.pmgrc.yaml
# This is the most convenient method for regular use

# Check if API key is configured
api_key = os.environ.get("MP_API_KEY")
if api_key:
    print(f"API key configured: {api_key[:8]}...")
else:
    print("No API key found in environment. Will use ~/.config/.pmgrc.yaml if available.")

In [None]:
# Import required modules
from crystalmath.high_level import HighThroughput
from mp_api.client import MPRester
import pandas as pd

print("Imports successful!")

## 2. Search for Materials by Formula

The Materials Project API allows searching by chemical formula, elements, band gap, and many other criteria.

In [None]:
# Search for all silicon structures
with MPRester() as mpr:
    # Search by formula
    si_docs = mpr.materials.summary.search(
        formula="Si",
        fields=["material_id", "formula_pretty", "energy_above_hull", "band_gap", "nsites"]
    )
    
print(f"Found {len(si_docs)} silicon structures")

# Display results as DataFrame
si_data = [
    {
        "mp_id": doc.material_id,
        "formula": doc.formula_pretty,
        "e_above_hull": doc.energy_above_hull,
        "band_gap": doc.band_gap,
        "n_atoms": doc.nsites
    }
    for doc in si_docs
]

df_si = pd.DataFrame(si_data)
df_si.sort_values("e_above_hull").head(10)

In [None]:
# Advanced search: Find semiconductors in a specific band gap range
with MPRester() as mpr:
    semiconductors = mpr.materials.summary.search(
        elements=["Si", "Ge"],           # Contains Si or Ge
        band_gap=(0.5, 2.0),              # Band gap between 0.5-2.0 eV
        is_stable=True,                   # Thermodynamically stable
        fields=["material_id", "formula_pretty", "band_gap", "symmetry"]
    )

print(f"Found {len(semiconductors)} stable semiconductors")

for doc in semiconductors[:5]:
    print(f"  {doc.material_id}: {doc.formula_pretty}, gap = {doc.band_gap:.2f} eV")

## 3. Fetch Structure by mp-id

Once you identify a material of interest, you can fetch its full structure.

In [None]:
# Fetch silicon structure (mp-149 is the conventional cell)
mp_id = "mp-149"

with MPRester() as mpr:
    # Get the structure
    structure = mpr.get_structure_by_material_id(mp_id)
    
    # Get additional properties from the database
    doc = mpr.materials.summary.get_document_by_id(mp_id)

print(f"Structure: {structure.composition.reduced_formula}")
print(f"Space group: {structure.get_space_group_info()[0]}")
print(f"Number of atoms: {len(structure)}")
print(f"Volume: {structure.volume:.3f} A^3")
print(f"\nMaterials Project values:")
print(f"  Band gap: {doc.band_gap:.3f} eV")
print(f"  Energy above hull: {doc.energy_above_hull:.4f} eV/atom")

In [None]:
# Fetch multiple structures at once
mp_ids = ["mp-149", "mp-32", "mp-2534"]  # Si, Ge, GaAs

structures = {}
mp_properties = {}

with MPRester() as mpr:
    for mp_id in mp_ids:
        structures[mp_id] = mpr.get_structure_by_material_id(mp_id)
        doc = mpr.materials.summary.get_document_by_id(mp_id)
        mp_properties[mp_id] = {
            "formula": doc.formula_pretty,
            "band_gap": doc.band_gap,
            "formation_energy": doc.formation_energy_per_atom
        }

# Display fetched materials
pd.DataFrame(mp_properties).T

## 4. Run Analysis on Fetched Structures

CrystalMath provides a convenient `from_mp()` method that combines fetching and analysis.

In [None]:
# Direct analysis from Materials Project ID
results = HighThroughput.from_mp(
    material_id="mp-149",              # Silicon
    properties=["bands", "dos"],        # Calculate band structure and DOS
    codes={"dft": "vasp"},
    cluster="beefcake2",
    protocol="moderate"
)

print(f"Calculation completed for {results.formula}")
print(f"Calculated band gap: {results.band_gap_ev:.3f} eV")

In [None]:
# Alternatively, run analysis on a pre-fetched structure
results = HighThroughput.from_structure(
    structure=structure,
    properties=["relax", "bands", "dos"],
    codes={"dft": "vasp"},
    cluster="beefcake2",
    protocol="moderate"
)

print(f"Analysis complete!")
print(f"Band gap: {results.band_gap_ev:.3f} eV")
print(f"Direct gap: {results.is_direct_gap}")

## 5. Compare with Materials Project Database Values

Let's compare our calculated values with those in the Materials Project database.

In [None]:
# Fetch MP values for comparison
with MPRester() as mpr:
    mp_doc = mpr.materials.summary.get_document_by_id("mp-149")

# Create comparison table
comparison = pd.DataFrame({
    "Property": ["Band Gap (eV)", "Is Direct Gap", "Space Group"],
    "Materials Project": [
        f"{mp_doc.band_gap:.3f}",
        str(mp_doc.is_gap_direct),
        mp_doc.symmetry.symbol
    ],
    "CrystalMath (VASP)": [
        f"{results.band_gap_ev:.3f}",
        str(results.is_direct_gap),
        results.space_group
    ]
})

print("Comparison: Materials Project vs CrystalMath Calculation")
comparison

In [None]:
# Batch comparison for multiple materials
materials_to_compare = [
    "mp-149",   # Si
    "mp-32",    # Ge
    "mp-2534",  # GaAs
]

comparison_results = []

with MPRester() as mpr:
    for mp_id in materials_to_compare:
        # Get MP values
        mp_doc = mpr.materials.summary.get_document_by_id(mp_id)
        
        # Run CrystalMath calculation
        calc_results = HighThroughput.from_mp(
            material_id=mp_id,
            properties=["bands"],
            protocol="fast"  # Use fast protocol for screening
        )
        
        comparison_results.append({
            "mp_id": mp_id,
            "formula": mp_doc.formula_pretty,
            "mp_gap": mp_doc.band_gap,
            "calc_gap": calc_results.band_gap_ev,
            "difference": abs(mp_doc.band_gap - calc_results.band_gap_ev)
        })

df_comparison = pd.DataFrame(comparison_results)
df_comparison

In [None]:
import matplotlib.pyplot as plt

# Visualize comparison
fig, ax = plt.subplots(figsize=(8, 6))

# Parity plot
ax.scatter(df_comparison["mp_gap"], df_comparison["calc_gap"], s=100)

# Add labels
for idx, row in df_comparison.iterrows():
    ax.annotate(row["formula"], (row["mp_gap"], row["calc_gap"]), 
                xytext=(5, 5), textcoords='offset points')

# Add parity line
max_val = max(df_comparison["mp_gap"].max(), df_comparison["calc_gap"].max())
ax.plot([0, max_val], [0, max_val], 'k--', alpha=0.5, label='Parity')

ax.set_xlabel("Materials Project Band Gap (eV)")
ax.set_ylabel("CrystalMath Calculated Band Gap (eV)")
ax.set_title("Band Gap Comparison: MP Database vs CrystalMath")
ax.legend()
ax.set_aspect('equal')

plt.tight_layout()
plt.savefig("mp_comparison.png", dpi=300)
plt.show()

## 6. Export Results for Further Analysis

In [None]:
# Save comparison to CSV
df_comparison.to_csv("mp_comparison_results.csv", index=False)
print("Comparison results saved to mp_comparison_results.csv")

# Save individual calculation results
results.to_json("silicon_mp149_results.json")
print("Silicon results saved to silicon_mp149_results.json")

## Summary

In this notebook, you learned how to:

1. **Configure** the Materials Project API key
2. **Search** for materials by formula, elements, and properties
3. **Fetch** structures by mp-id
4. **Run** CrystalMath analysis on MP structures
5. **Compare** calculated values with MP database values

**Key Points:**
- Use `HighThroughput.from_mp()` for direct MP integration
- The `MPRester` context manager handles API authentication
- Compare your results with MP values to validate calculations

**Expected Differences:**
- DFT band gaps may differ due to:
  - Different pseudopotentials/basis sets
  - Different exchange-correlation functionals
  - Different k-point meshes
- Structural parameters should be very similar after relaxation

**Next:** See `03_band_structure.ipynb` for detailed band structure analysis.