OpenAlphaDiffract is an open-source implementation of the AlphaDiffract research project. It provides a reproducible pipeline to:
- Create a dataset from the Materials Project
- Simulate powder diffraction patterns from those structures
- Train and evaluate models on the generated dataset
- Run an inference web app to try out the model
- Model weights: linked-liszt/OpenAlphaDiffract on Hugging Face
- Live demo: linked-liszt/OpenAlphaDiffract-UI on Hugging Face Spaces
-
Acquire CIFs (
docker/downloader.Dockerfile)- Uses the Materials Project API to fetch crystal structures as CIF files
- Configurable via
configs/download.yaml - Filters structures by checking conventional cell consistency across multiple angle tolerances. This filters ~4.4% of MP structures as of 10/22/2025.
-
GSAS-II XRD Simulation (
docker/simulator.Dockerfile)- Generates synthetic powder diffraction patterns from CIFs
- Configurable via
configs/simulator.yaml(e.g., instrument file, noise ranges, job parallelism) - Creates .npy files with simulated pattern and metadata ready to be consumed by the training system
-
Open Alpha Diffract Training (
docker/trainer.Dockerfile)- Trains the multi-task AlphaDiffract model on the generated dataset
- Configurable via
configs/trainer.docker.yamlorconfigs/trainer.local.yaml - Logs checkpoints and metrics (CSV and optional MLflow)
-
XRD Inference Web App (
docker/ui.Dockerfile)- FastAPI service for model inference with a React frontend
- Accepts processed XRD patterns and returns predictions via
/api/predict - Serves the built frontend from the same container
Prerequisites:
- Docker and Docker Compose
- A Materials Project API key
Warning
Building the dataset and training will take a significant amount of space and computational resources:
- Expect to use around 1TB+ of space in total to replicate the paper's 100-variation dataset
- We recommend running simulation with ~100+ processes in parallel. For reference, simulation took ~18 hours on 2x AMD EPYC 7742 (128 processes).
- Training took ~15 hours on a single H100 GPU.
Setup:
-
Copy the environment file and set your API key:
cp .env.example .env- Edit
.envand setMP_API_KEY - Optionally set
UIDandGIDso the containers write files as your user.
-
Download CIFs:
scripts/download.sh(ordocker compose run --rm downloader)- CIFs will be written to
./data/raw_cif
-
Simulate diffraction patterns:
scripts/simulate.sh(ordocker compose run --rm simulator)- Patterns will be written to
./data/dataset - Errors (if any) go to
./data/error_logs
-
Train the model:
docker compose run --rm trainer- Checkpoints and logs will be written to
./outputs
-
Run the inference UI:
- Move a model checkpoint to
./src/ui/models/xrd_model.ckpt docker compose up ui- Open
http://localhost:7860
- Move a model checkpoint to
Notes:
- You can pass extra CLI args to the simulator via
scripts/simulate.sh, e.g.--sims_per_file 1 --parallel_jobs 4 - The default container commands and mounts are defined in
compose.yaml
OpenAlphaDiffract/
├── configs/ - Pipeline configuration files
│ ├── instruments/ - GSAS-II instrument parameter files
│ └── resources/ - Space group distance matrix for GEMD loss
├── docker/ - Container definitions
├── scripts/ - User-facing scripts
├── src/
│ ├── downloader/ - Materials Project CIF acquisition
│ ├── simulator/ - GSAS-II powder diffraction simulation
│ ├── trainer/ - Model definition, dataset, and training loop
│ └── ui/ - FastAPI backend + React frontend
│ └── frontend/
└── compose.yaml
Tests run in CI via GitHub Actions on every push/PR to main. To run locally:
# All Python tests (from repo root)
pytest
# Individual components
pytest src/downloader/tests/ -v
pytest src/simulator/tests/ -v
pytest src/trainer/tests/ -v
pytest src/ui/tests/ -v
# Frontend tests
cd src/ui/frontend && npx vitest runThis project is licensed under the BSD 3-Clause License. See LICENSE for details.
We hope this code was helpful! Please consider citing our paper:
@article{andrejevic2026alphadiffract,
title={AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data},
author={Andrejevic, Nina and Du, Ming and Sharma, Hemant and Horwath, James P. and Luo, Aileen and Yin, Xiangyu and Prince, Michael and Toby, Brian H. and Cherukara, Mathew J.},
journal={arXiv preprint arXiv:2603.23367},
year={2026}
}