OpenAlphaDiffract

OpenAlphaDiffract is an open-source implementation of the AlphaDiffract research project. It provides a reproducible pipeline to:

Create a dataset from the Materials Project
Simulate powder diffraction patterns from those structures
Train and evaluate models on the generated dataset
Run an inference web app to try out the model

Pretrained Model & Demo

Model weights: linked-liszt/OpenAlphaDiffract on Hugging Face
Live demo: linked-liszt/OpenAlphaDiffract-UI on Hugging Face Spaces

Dataset Pipeline

Acquire CIFs (docker/downloader.Dockerfile)
- Uses the Materials Project API to fetch crystal structures as CIF files
- Configurable via configs/download.yaml
- Filters structures by checking conventional cell consistency across multiple angle tolerances. This filters ~4.4% of MP structures as of 10/22/2025.
GSAS-II XRD Simulation (docker/simulator.Dockerfile)
- Generates synthetic powder diffraction patterns from CIFs
- Configurable via configs/simulator.yaml (e.g., instrument file, noise ranges, job parallelism)
- Creates .npy files with simulated pattern and metadata ready to be consumed by the training system
Open Alpha Diffract Training (docker/trainer.Dockerfile)
- Trains the multi-task AlphaDiffract model on the generated dataset
- Configurable via configs/trainer.docker.yaml or configs/trainer.local.yaml
- Logs checkpoints and metrics (CSV and optional MLflow)
XRD Inference Web App (docker/ui.Dockerfile)
- FastAPI service for model inference with a React frontend
- Accepts processed XRD patterns and returns predictions via /api/predict
- Serves the built frontend from the same container

Training from Scratch Quickstart

Prerequisites:

Docker and Docker Compose
A Materials Project API key

Warning

Building the dataset and training will take a significant amount of space and computational resources:

Expect to use around 1TB+ of space in total to replicate the paper's 100-variation dataset
We recommend running simulation with ~100+ processes in parallel. For reference, simulation took ~18 hours on 2x AMD EPYC 7742 (128 processes).
Training took ~15 hours on a single H100 GPU.

Setup:

Copy the environment file and set your API key:
- cp .env.example .env
- Edit .env and set MP_API_KEY
- Optionally set UID and GID so the containers write files as your user.
Download CIFs:
- scripts/download.sh (or docker compose run --rm downloader)
- CIFs will be written to ./data/raw_cif
Simulate diffraction patterns:
- scripts/simulate.sh (or docker compose run --rm simulator)
- Patterns will be written to ./data/dataset
- Errors (if any) go to ./data/error_logs
Train the model:
- docker compose run --rm trainer
- Checkpoints and logs will be written to ./outputs
Run the inference UI:
- Move a model checkpoint to ./src/ui/models/xrd_model.ckpt
- docker compose up ui
- Open http://localhost:7860

Notes:

You can pass extra CLI args to the simulator via scripts/simulate.sh, e.g. --sims_per_file 1 --parallel_jobs 4
The default container commands and mounts are defined in compose.yaml

Project Structure

OpenAlphaDiffract/
├── configs/              - Pipeline configuration files
│   ├── instruments/      - GSAS-II instrument parameter files
│   └── resources/        - Space group distance matrix for GEMD loss
├── docker/               - Container definitions
├── scripts/              - User-facing scripts
├── src/
│   ├── downloader/       - Materials Project CIF acquisition
│   ├── simulator/        - GSAS-II powder diffraction simulation
│   ├── trainer/          - Model definition, dataset, and training loop
│   └── ui/               - FastAPI backend + React frontend
│       └── frontend/
└── compose.yaml

Testing

Tests run in CI via GitHub Actions on every push/PR to main. To run locally:

# All Python tests (from repo root)
pytest

# Individual components
pytest src/downloader/tests/ -v
pytest src/simulator/tests/ -v
pytest src/trainer/tests/ -v
pytest src/ui/tests/ -v

# Frontend tests
cd src/ui/frontend && npx vitest run

License

This project is licensed under the BSD 3-Clause License. See LICENSE for details.

Citation

We hope this code was helpful! Please consider citing our paper:

@article{andrejevic2026alphadiffract,
  title={AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data},
  author={Andrejevic, Nina and Du, Ming and Sharma, Hemant and Horwath, James P. and Luo, Aileen and Yin, Xiangyu and Prince, Michael and Toby, Brian H. and Cherukara, Mathew J.},
  journal={arXiv preprint arXiv:2603.23367},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAlphaDiffract

Pretrained Model & Demo

Dataset Pipeline

Training from Scratch Quickstart

Project Structure

Testing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
configs		configs
docker		docker
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
pytest.ini		pytest.ini

Folders and files

Latest commit

History

Repository files navigation

OpenAlphaDiffract

Pretrained Model & Demo

Dataset Pipeline

Training from Scratch Quickstart

Project Structure

Testing

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages