Skip to content

dfieser/hea-bench

hea-bench

An open, reproducible benchmark suite and reference baselines for high-entropy alloy (HEA) phase prediction.

DOI License: MIT tests: 157 coverage: 86.7%

TL;DR

  • A consolidated, deduplicated open dataset of 7,784 experimentally characterized multi-principal element alloys, merged from three primary sources (Borg 2020, Pei 2020, Peivaste 2023) with per-row source provenance.
  • Reference baseline implementations of the four canonical empirical phase-prediction rules (Yeh ΔSmix, Zhang δ, Guo-Liu VEC, Yang-Zhang Ω), wrapped as proper diagnostic classifiers with sensitivity / specificity / Wilson 95% CIs.
  • A clean, dependency-free Python API (pip install hea-bench) and a self-contained HTML calculator that runs entirely client-side, computes all six descriptors plus the Miedema decompositions, and applies the four phase-prediction rules to the entered composition — open by URL or just double-click the file.

Using an AI coding agent to integrate this? See AGENTS.md for a machine-oriented guide to the API, exact return types and units, the fastest path to each task, and the mistakes to avoid.

Headline benchmark numbers (v0.1.0)

Running the four canonical rules against the consolidated benchmark produces the reference baselines below. These are pinned in tests so any drift in dataset, descriptor code, or rule thresholds surfaces as a test failure.

Rule n_eval Accuracy Sens (single-phase) Spec (multi-phase) Youden's J
Zhang δ < 6.5% 6,651 56.7% 99.0% 8.5% 0.075
Yang Ω > 1.1 6,651 54.4% 95.8% 7.4% 0.032

The Guo–Liu VEC rule predicts crystal structure rather than single-vs-multi, so it's evaluated stratified to single-phase observations (BCC|FCC only):

Rule n_eval Accuracy FCC sensitivity BCC sensitivity
Guo–Liu VEC (FCC if VEC ≥ 8.0, BCC if VEC < 6.87) 3,463 66.9% 92.4% 48.3%

Yeh ΔSmix is descriptive (no phase-prediction claim attached) — 47% of the consolidated benchmark passes the 1.5R HEA-class threshold, 37% sits in the MEA bin, 16% is dilute.

The publishable observation: on a consolidated benchmark drawn from three independent open sources, both binary rules collapse to "predict single-phase almost always" (Youden's J ~ 0.03–0.08), and the VEC rule misses about half of observed BCC alloys despite catching 92% of FCC alloys. The canonical rules generalize poorly.

Quick start (Python)

pip install hea-bench
import hea_bench

cantor = {"Co": 0.2, "Cr": 0.2, "Fe": 0.2, "Mn": 0.2, "Ni": 0.2}

hea_bench.smix(cantor)               # 13.381 J/(mol·K)  = R · ln 5
hea_bench.delta(cantor)              # 3.164 % atomic-size mismatch
hea_bench.vec(cantor)                # 8.0 valence electrons
hea_bench.mixing_enthalpy(cantor)    # -4.16 kJ/mol  (Miedema)
hea_bench.omega(cantor)              # 5.79  (Yang-Zhang)

# Apply the canonical rules
from hea_bench.rules import zhang_delta, yang_omega, guo_vec
zhang_delta.predict(cantor)          # 'single-phase'
yang_omega.predict(cantor)           # 'single-phase'
guo_vec.predict(cantor)              # 'FCC'

# Run the full rule benchmark against the consolidated v0.1.0 dataset
from hea_bench.evaluate import build_report
report = build_report()
print(report["rules"]["zhang_delta_6_5"]["accuracy"])  # 0.5670

Quick start (CLI)

hea-bench --version
python -m hea_bench.evaluate           # run all 4 rules on v0.1.0
python -m hea_bench.benchmark.coverage # coverage analysis on v0.1.0

Quick start (browser, no install)

A self-contained HTML calculator computes the descriptors, applies the four phase-prediction rules, and runs the Miedema decompositions entirely client-side. Two equivalent paths:

The page reports each rule's verdict (Yeh HEA/MEA/dilute, Zhang single/multi, Guo–Liu FCC/BCC/mixed, Yang–Zhang single/multi) alongside the computed descriptor values. Logic matches the Python library, including the six-decimal VEC-boundary rounding.

Architecture

                                ┌────────────────────────────┐
                                │  data/consolidated/v0.1.0/ │
                                │  - consolidated.csv        │
                                │  - rule_baselines.json     │
                                │  - coverage_report.json    │
                                │  - manifest.json           │
                                └─────────────▲──────────────┘
                                              │
                                              │ produced by
                                              │
   ┌─────────────────────┐    ┌───────────────┴───────────────┐
   │  data/raw/          │    │  src/hea_bench/               │
   │  - borg2020/        │───►│  - benchmark/                 │
   │  - pei2020/         │    │      consolidate.py           │
   │  - peivaste/        │    │      coverage.py              │
   │  (per-source READMEs│    │      loaders/{borg,pei,...}.py│
   │   + provenance)     │    │  - descriptors/{size, vec,    │
   └─────────────────────┘    │      melting, miedema, omega} │
                              │  - rules/{yeh, zhang,         │
                              │      guo, yang}               │
                              │  - classifiers/               │
                              │      diagnostic_stats.py      │
                              │  - evaluate.py                │
                              └──────────────┬────────────────┘
                                             │
                                             │ independent
                                             │ implementation
                                             ▼
                              ┌──────────────────────────────┐
                              │  web/   (standalone HTML +   │
                              │          JavaScript)         │
                              │  - index.html                │
                              │  - mathjax/   (vendored)     │
                              └──────────────────────────────┘

What's in the benchmark

data/consolidated/v0.1.0/consolidated.csv7,784 unique compositions × 14 columns:

  • composition_key — alphabetically sorted element symbols + 4-decimal mole fractions, the canonical join key
  • n_elements, sources (semicolon-separated)
  • canonical_phase — one of BCC / FCC / HCP / multi-phase (blank when the contributing sources disagree)
  • has_conflict — 1 when the canonical_phase is blank because of a source-label disagreement
  • Per-source canonical and raw labels preserved verbatim
  • borg_processing, borg_doi, source_row_ids for provenance

100 of the 7,784 compositions are cross-source label conflicts — flagged for downstream resolution rather than silently picked. The sources are: Borg 2020 (740 alloys), Pei 2020 (1,209 alloys), Peivaste 2023 (7,747 alloys).

See data/consolidated/v0.1.0/README.md for the full schema, per-source attribution, and a complete description of the consolidation rules. See data/raw/ for per-source provenance, licenses, and SHA-256s.

What's covered

  • 86.7% of the 7,784 compositions are scorable by every descriptor (δ, VEC, T_m, ΔS_mix, ΔH_mix, Ω) with the current 24-element ELEMENTAL_DATA table
  • 99.6% are scorable for Miedema-based descriptors only (the vendored matminer pair table covers 75 elements)
  • Top elements whose addition would lift coverage to ~95%: Mg, C, Zn, B, Sn, Re (all already in the matminer pair table — pending v0.2.0 data release)

Re-run the coverage analysis on your own version of the dataset with:

python -m hea_bench.benchmark.coverage

Sources and attribution

Every primary source is cited per-row in the consolidated CSV. The data files in data/raw/ carry per-source READMEs with DOIs, licenses, and acquisition SHA-256s.

Source Citation License Status
Borg 2020 Sci. Data 7, 430 (doi:10.1038/s41597-020-00768-9) CC-BY-4.0 Mirrored
Pei 2020 npj Comput. Mater. 6, 50 (doi:10.1038/s41524-020-0308-7) CC-BY-4.0 Mirrored
Peivaste 2023 Sci. Rep. 13, 22556 + GitHub none on data Pointer-only (fetch.py)
Miedema pair enthalpies matminer MiedemaLiquidDeltaHf.tsv BSD-3-Clause Vendored (see descriptors/data/)

Project layout

hea-bench/
├── data/
│   ├── raw/             per-source data with READMEs, licenses, SHAs
│   └── consolidated/    versioned benchmark releases (v0.1.0 here)
├── src/hea_bench/
│   ├── benchmark/       loaders, consolidator, coverage analysis
│   ├── descriptors/     ΔS_mix, δ, VEC, T_m, ΔH_mix, Ω + data tables
│   ├── rules/           four canonical empirical rules as classifiers
│   ├── classifiers/     diagnostic-stats machinery
│   ├── composition.py   formula parser, normalizer
│   ├── constants.py     R = 8.314
│   ├── evaluate.py      orchestrator: rules vs benchmark → headline stats
│   └── cli.py           command-line entry point
├── tests/               157 tests, all passing
├── web/                 self-contained HTML calculator (pure JS, no server)
└── pyproject.toml

Development

git clone <repo>
cd hea-bench
pip install -e ".[dev,data]"
python -m pytest tests/ -q

The HTML calculator (web/index.html) is an independent JavaScript implementation of the same descriptors and rules. When you modify Python descriptor code, sanity-check the calculator against the same composition (e.g. the Cantor alloy values) so the two surfaces don't drift.

Contributing and support

Contributions, bug reports, and dataset additions are welcome. See CONTRIBUTING.md for development setup, the testing convention, and the data-provenance policy. To report a bug or ask a question, open a GitHub issue; for direct contact, email the maintainer at davjfies@gmail.com. Participation is governed by the Code of Conduct.

License

MIT. The vendored matminer Miedema data files remain under their upstream BSD-3-Clause license, preserved at descriptors/data/LICENSE.matminer.txt.

Citation

Citation metadata in CITATION.cff. When citing hea-bench, please also cite the original source datasets (Borg, Pei, Peivaste) and matminer — see data/raw/<source>/README.md for each source's preferred citation.

hea-bench is archived on Zenodo. The concept DOI 10.5281/zenodo.20346287 always resolves to the latest version; v0.1.0 specifically is 10.5281/zenodo.20346288.

Acknowledgements

All numerical parameters, formulas, threshold values, and benchmark numbers are derived from cited primary sources or computed in this codebase from documented inputs; the author verified outputs against the cited literature.

About

Open benchmark and reference baselines for high-entropy alloy phase prediction 7,784 consolidated compositions and the four canonical empirical rules scored as diagnostic classifiers.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors