<a href="https://colab.research.google.com/github/Laere11/MatterGen/blob/main/mattergen_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **How Are the Crystal Structures Generated?**

MatterGen is not simply copying structures from a database—instead, it generates new crystal structures using a diffusion model. Here’s a more detailed breakdown of how the 15 crystal structures are created:

**Training on Real Data:**
MatterGen was trained on a large dataset (over 600,000 stable materials) from sources like the Materials Project and Alexandria. This training teaches the model the underlying chemical rules and structural patterns that make a crystal stable.

**Diffusion-Based Generation:**
When you run the model (in unconditional mode in this case), it starts with a random initial structure (essentially random noise) and then gradually “denoises” that structure over many iterations. In each iteration, the model updates atomic positions, element assignments, and lattice parameters in a way that is informed by the patterns it learned during training.

**Batch Sampling:**
The 15 structures you mention are the result of generating a batch of candidates. The number of outputs is determined by the product of your specified batch size and number of batches. Each output is the end result of the iterative diffusion process—a structure that the model “imagined” based on its training.

**Novel and Chemically Plausible:**
The structures generated are novel candidate materials. They are not randomly assembled nor merely pulled from a repository; instead, they are produced by the model’s learned distribution over crystal structures. Some generated structures may resemble known prototypes, but they are synthesized “on the fly” by the model.

In summary, the 15 crystal structures are generated by MatterGen’s diffusion process, which leverages learned knowledge from real, stable materials. They’re not randomly assembled nor directly copied from any database—they’re new candidates produced by the generative AI model.

The **mattergen_v2** code below sets up the entire MatterGen environment in Google Colab, ensuring all required dependencies are installed, and then executes MatterGen in its unconditional mode to produce valid candidate crystal structures.

In [None]:
%%bash
set -e

# --- System-level Setup: Update and install Git LFS ---
sudo apt-get update
sudo apt-get install -y git-lfs
git lfs install

# --- Clone or update the MatterGen repository ---
if [ -d "mattergen" ]; then
    echo "mattergen directory exists; updating..."
    cd mattergen && git pull && cd ..
else
    git clone https://github.com/microsoft/mattergen.git
fi

# --- Install uv (a fast package manager) ---
pip install uv

# --- Install key Python dependencies explicitly ---
pip install fire
pip install emmet-core
pip install omegaconf==2.3.0
pip install hydra-core==1.3.1
pip install hydra-joblib-launcher==1.1.5
pip install lmdb
pip install matplotlib==3.8.4
pip install matscipy
pip install mattersim>=1.1
pip install monty==2024.7.30
pip install notebook>=7.2.2
pip install pymatgen>=2024.6.4
pip install SMACT
pip install sympy>=1.11.1
pip install tqdm
pip install wandb>=0.10.33
pip install pytorch-lightning==2.0.6

# --- Install PyTorch Geometric dependencies ---
# Your Colab environment has PyTorch 2.5.1+cu124.
# We install the PyG packages without forcing explicit version tags,
# letting pip select the correct wheels from the provided index.
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-2.5.1+cu124.html
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.5.1+cu124.html

# --- Install MatterGen in editable mode (skip dependencies so as not to override Colab defaults) ---
cd mattergen
pip install -e . --no-deps
cd ..

# --- Create output directory for generated samples ---
mkdir -p results

# --- Run MatterGen generation in unconditional mode ---
# Since mattergen_base is an unconditional model, we omit any property-conditioning arguments.
python mattergen/mattergen/scripts/generate.py results/ \
  --pretrained-name=mattergen_base \
  --batch_size=16 \
  --num_batches=1


Hit:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
git-lfs is already the newest version (3.0.2-1ubuntu0.3).
0 upgraded, 0 newly installed, 0 to remove and 30 not upgraded.
Git LFS initialized.
mattergen directory exists; updating...

W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
notebook 6.5.5 requires pyzmq<25,>=17, but you have pyzmq 26.2.1 which is incompatible.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
mattergen 1.0 requires autopep8, which is not installed.
mattergen 1.0 requires contextlib2, which is not installed.
mattergen 1.0 requires jupyterlab>=4.2.5, which is not installed.
mattergen 1.0 requires pylint, which is not installed.
mattergen 1.0 requires notebook>=7.2.2, but you have notebook 6.5.5 which is incompatible.
mattergen 1.0 requires torch==2.2.1+cu118; s

The 15 crystal structures can be viewed by unziping the **generated_crystals_cif**.zip file.  Then open the the **JSmol CIF viewer v2.html** in a web browser.  Use the mouse right button open the hidden menu and select: file/load/open local file/ then click on one of the 15 .CIF files.