Skip to content

AdvancedPhotonSource/OpenAlphaDiffract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenAlphaDiffract

OpenAlphaDiffract is an open-source implementation of the AlphaDiffract research project. It provides a reproducible pipeline to:

  • Create a dataset from the Materials Project
  • Simulate powder diffraction patterns from those structures
  • Train and evaluate models on the generated dataset
  • Run an inference web app to try out the model

Pretrained Model & Demo

Dataset Pipeline

  1. Acquire CIFs (docker/downloader.Dockerfile)

    • Uses the Materials Project API to fetch crystal structures as CIF files
    • Configurable via configs/download.yaml
    • Filters structures by checking conventional cell consistency across multiple angle tolerances. This filters ~4.4% of MP structures as of 10/22/2025.
  2. GSAS-II XRD Simulation (docker/simulator.Dockerfile)

    • Generates synthetic powder diffraction patterns from CIFs
    • Configurable via configs/simulator.yaml (e.g., instrument file, noise ranges, job parallelism)
    • Creates .npy files with simulated pattern and metadata ready to be consumed by the training system
  3. Open Alpha Diffract Training (docker/trainer.Dockerfile)

    • Trains the multi-task AlphaDiffract model on the generated dataset
    • Configurable via configs/trainer.docker.yaml or configs/trainer.local.yaml
    • Logs checkpoints and metrics (CSV and optional MLflow)
  4. XRD Inference Web App (docker/ui.Dockerfile)

    • FastAPI service for model inference with a React frontend
    • Accepts processed XRD patterns and returns predictions via /api/predict
    • Serves the built frontend from the same container

Training from Scratch Quickstart

Prerequisites:

  • Docker and Docker Compose
  • A Materials Project API key

Warning

Building the dataset and training will take a significant amount of space and computational resources:

  • Expect to use around 1TB+ of space in total to replicate the paper's 100-variation dataset
  • We recommend running simulation with ~100+ processes in parallel. For reference, simulation took ~18 hours on 2x AMD EPYC 7742 (128 processes).
  • Training took ~15 hours on a single H100 GPU.

Setup:

  1. Copy the environment file and set your API key:

    • cp .env.example .env
    • Edit .env and set MP_API_KEY
    • Optionally set UID and GID so the containers write files as your user.
  2. Download CIFs:

    • scripts/download.sh (or docker compose run --rm downloader)
    • CIFs will be written to ./data/raw_cif
  3. Simulate diffraction patterns:

    • scripts/simulate.sh (or docker compose run --rm simulator)
    • Patterns will be written to ./data/dataset
    • Errors (if any) go to ./data/error_logs
  4. Train the model:

    • docker compose run --rm trainer
    • Checkpoints and logs will be written to ./outputs
  5. Run the inference UI:

    • Move a model checkpoint to ./src/ui/models/xrd_model.ckpt
    • docker compose up ui
    • Open http://localhost:7860

Notes:

  • You can pass extra CLI args to the simulator via scripts/simulate.sh, e.g. --sims_per_file 1 --parallel_jobs 4
  • The default container commands and mounts are defined in compose.yaml

Project Structure

OpenAlphaDiffract/
├── configs/              - Pipeline configuration files
│   ├── instruments/      - GSAS-II instrument parameter files
│   └── resources/        - Space group distance matrix for GEMD loss
├── docker/               - Container definitions
├── scripts/              - User-facing scripts
├── src/
│   ├── downloader/       - Materials Project CIF acquisition
│   ├── simulator/        - GSAS-II powder diffraction simulation
│   ├── trainer/          - Model definition, dataset, and training loop
│   └── ui/               - FastAPI backend + React frontend
│       └── frontend/
└── compose.yaml

Testing

Tests run in CI via GitHub Actions on every push/PR to main. To run locally:

# All Python tests (from repo root)
pytest

# Individual components
pytest src/downloader/tests/ -v
pytest src/simulator/tests/ -v
pytest src/trainer/tests/ -v
pytest src/ui/tests/ -v

# Frontend tests
cd src/ui/frontend && npx vitest run

License

This project is licensed under the BSD 3-Clause License. See LICENSE for details.

Citation

We hope this code was helpful! Please consider citing our paper:

@article{andrejevic2026alphadiffract,
  title={AlphaDiffract: Automated Crystallographic Analysis of Powder X-ray Diffraction Data},
  author={Andrejevic, Nina and Du, Ming and Sharma, Hemant and Horwath, James P. and Luo, Aileen and Yin, Xiangyu and Prince, Michael and Toby, Brian H. and Cherukara, Mathew J.},
  journal={arXiv preprint arXiv:2603.23367},
  year={2026}
}

About

OpenAlphaDiffract is an open-source implementation of the AlphaDiffract research project. It provides a reproducible pipeline from dataset creation to web inference.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors