hea-bench

An open, reproducible benchmark suite and reference baselines for high-entropy alloy (HEA) phase prediction.

TL;DR

A consolidated, deduplicated open dataset of 7,784 experimentally characterized multi-principal element alloys, merged from three primary sources (Borg 2020, Pei 2020, Peivaste 2023) with per-row source provenance.
Reference baseline implementations of the four canonical empirical phase-prediction rules (Yeh ΔS_mix, Zhang δ, Guo-Liu VEC, Yang-Zhang Ω), wrapped as proper diagnostic classifiers with sensitivity / specificity / Wilson 95% CIs.
A clean, dependency-free Python API (pip install hea-bench) and a self-contained HTML calculator that runs entirely client-side, computes all six descriptors plus the Miedema decompositions, and applies the four phase-prediction rules to the entered composition — open by URL or just double-click the file.

Using an AI coding agent to integrate this? See AGENTS.md for a machine-oriented guide to the API, exact return types and units, the fastest path to each task, and the mistakes to avoid.

Headline benchmark numbers (v0.1.0)

Running the four canonical rules against the consolidated benchmark produces the reference baselines below. These are pinned in tests so any drift in dataset, descriptor code, or rule thresholds surfaces as a test failure.

Rule	n_eval	Accuracy	Sens (single-phase)	Spec (multi-phase)	Youden's J
Zhang δ < 6.5%	6,651	56.7%	99.0%	8.5%	0.075
Yang Ω > 1.1	6,651	54.4%	95.8%	7.4%	0.032

The Guo–Liu VEC rule predicts crystal structure rather than single-vs-multi, so it's evaluated stratified to single-phase observations (BCC|FCC only):

Rule	n_eval	Accuracy	FCC sensitivity	BCC sensitivity
Guo–Liu VEC (FCC if VEC ≥ 8.0, BCC if VEC < 6.87)	3,463	66.9%	92.4%	48.3%

Yeh ΔS_mix is descriptive (no phase-prediction claim attached) — 47% of the consolidated benchmark passes the 1.5R HEA-class threshold, 37% sits in the MEA bin, 16% is dilute.

The publishable observation: on a consolidated benchmark drawn from three independent open sources, both binary rules collapse to "predict single-phase almost always" (Youden's J ~ 0.03–0.08), and the VEC rule misses about half of observed BCC alloys despite catching 92% of FCC alloys. The canonical rules generalize poorly.

Quick start (Python)

pip install hea-bench

import hea_bench

cantor = {"Co": 0.2, "Cr": 0.2, "Fe": 0.2, "Mn": 0.2, "Ni": 0.2}

hea_bench.smix(cantor)               # 13.381 J/(mol·K)  = R · ln 5
hea_bench.delta(cantor)              # 3.164 % atomic-size mismatch
hea_bench.vec(cantor)                # 8.0 valence electrons
hea_bench.mixing_enthalpy(cantor)    # -4.16 kJ/mol  (Miedema)
hea_bench.omega(cantor)              # 5.79  (Yang-Zhang)

# Apply the canonical rules
from hea_bench.rules import zhang_delta, yang_omega, guo_vec
zhang_delta.predict(cantor)          # 'single-phase'
yang_omega.predict(cantor)           # 'single-phase'
guo_vec.predict(cantor)              # 'FCC'

# Run the full rule benchmark against the consolidated v0.1.0 dataset
from hea_bench.evaluate import build_report
report = build_report()
print(report["rules"]["zhang_delta_6_5"]["accuracy"])  # 0.5670

Quick start (CLI)

hea-bench --version
python -m hea_bench.evaluate           # run all 4 rules on v0.1.0
python -m hea_bench.benchmark.coverage # coverage analysis on v0.1.0

Quick start (browser, no install)

A self-contained HTML calculator computes the descriptors, applies the four phase-prediction rules, and runs the Miedema decompositions entirely client-side. Two equivalent paths:

Open the hosted page: https://dfieser.github.io/hea-bench/
Or download / clone the repo and double-click web/index.html. No install, no terminal, no server.

The page reports each rule's verdict (Yeh HEA/MEA/dilute, Zhang single/multi, Guo–Liu FCC/BCC/mixed, Yang–Zhang single/multi) alongside the computed descriptor values. Logic matches the Python library, including the six-decimal VEC-boundary rounding.

Architecture

                                ┌────────────────────────────┐
                                │  data/consolidated/v0.1.0/ │
                                │  - consolidated.csv        │
                                │  - rule_baselines.json     │
                                │  - coverage_report.json    │
                                │  - manifest.json           │
                                └─────────────▲──────────────┘
                                              │
                                              │ produced by
                                              │
   ┌─────────────────────┐    ┌───────────────┴───────────────┐
   │  data/raw/          │    │  src/hea_bench/               │
   │  - borg2020/        │───►│  - benchmark/                 │
   │  - pei2020/         │    │      consolidate.py           │
   │  - peivaste/        │    │      coverage.py              │
   │  (per-source READMEs│    │      loaders/{borg,pei,...}.py│
   │   + provenance)     │    │  - descriptors/{size, vec,    │
   └─────────────────────┘    │      melting, miedema, omega} │
                              │  - rules/{yeh, zhang,         │
                              │      guo, yang}               │
                              │  - classifiers/               │
                              │      diagnostic_stats.py      │
                              │  - evaluate.py                │
                              └──────────────┬────────────────┘
                                             │
                                             │ independent
                                             │ implementation
                                             ▼
                              ┌──────────────────────────────┐
                              │  web/   (standalone HTML +   │
                              │          JavaScript)         │
                              │  - index.html                │
                              │  - mathjax/   (vendored)     │
                              └──────────────────────────────┘

What's in the benchmark

data/consolidated/v0.1.0/consolidated.csv — 7,784 unique compositions × 14 columns:

composition_key — alphabetically sorted element symbols + 4-decimal mole fractions, the canonical join key
n_elements, sources (semicolon-separated)
canonical_phase — one of BCC / FCC / HCP / multi-phase (blank when the contributing sources disagree)
has_conflict — 1 when the canonical_phase is blank because of a source-label disagreement
Per-source canonical and raw labels preserved verbatim
borg_processing, borg_doi, source_row_ids for provenance

100 of the 7,784 compositions are cross-source label conflicts — flagged for downstream resolution rather than silently picked. The sources are: Borg 2020 (740 alloys), Pei 2020 (1,209 alloys), Peivaste 2023 (7,747 alloys).

See data/consolidated/v0.1.0/README.md for the full schema, per-source attribution, and a complete description of the consolidation rules. See data/raw/ for per-source provenance, licenses, and SHA-256s.

What's covered

86.7% of the 7,784 compositions are scorable by every descriptor (δ, VEC, T_m, ΔS_mix, ΔH_mix, Ω) with the current 24-element ELEMENTAL_DATA table
99.6% are scorable for Miedema-based descriptors only (the vendored matminer pair table covers 75 elements)
Top elements whose addition would lift coverage to ~95%: Mg, C, Zn, B, Sn, Re (all already in the matminer pair table — pending v0.2.0 data release)

Re-run the coverage analysis on your own version of the dataset with:

python -m hea_bench.benchmark.coverage

Sources and attribution

Every primary source is cited per-row in the consolidated CSV. The data files in data/raw/ carry per-source READMEs with DOIs, licenses, and acquisition SHA-256s.

Source	Citation	License	Status
Borg 2020	Sci. Data 7, 430 (doi:10.1038/s41597-020-00768-9)	CC-BY-4.0	Mirrored
Pei 2020	npj Comput. Mater. 6, 50 (doi:10.1038/s41524-020-0308-7)	CC-BY-4.0	Mirrored
Peivaste 2023	Sci. Rep. 13, 22556 + GitHub	none on data	Pointer-only (`fetch.py`)
Miedema pair enthalpies	matminer `MiedemaLiquidDeltaHf.tsv`	BSD-3-Clause	Vendored (see `descriptors/data/`)

Project layout

hea-bench/
├── data/
│   ├── raw/             per-source data with READMEs, licenses, SHAs
│   └── consolidated/    versioned benchmark releases (v0.1.0 here)
├── src/hea_bench/
│   ├── benchmark/       loaders, consolidator, coverage analysis
│   ├── descriptors/     ΔS_mix, δ, VEC, T_m, ΔH_mix, Ω + data tables
│   ├── rules/           four canonical empirical rules as classifiers
│   ├── classifiers/     diagnostic-stats machinery
│   ├── composition.py   formula parser, normalizer
│   ├── constants.py     R = 8.314
│   ├── evaluate.py      orchestrator: rules vs benchmark → headline stats
│   └── cli.py           command-line entry point
├── tests/               157 tests, all passing
├── web/                 self-contained HTML calculator (pure JS, no server)
└── pyproject.toml

Development

git clone <repo>
cd hea-bench
pip install -e ".[dev,data]"
python -m pytest tests/ -q

The HTML calculator (web/index.html) is an independent JavaScript implementation of the same descriptors and rules. When you modify Python descriptor code, sanity-check the calculator against the same composition (e.g. the Cantor alloy values) so the two surfaces don't drift.

Contributing and support

Contributions, bug reports, and dataset additions are welcome. See CONTRIBUTING.md for development setup, the testing convention, and the data-provenance policy. To report a bug or ask a question, open a GitHub issue; for direct contact, email the maintainer at davjfies@gmail.com. Participation is governed by the Code of Conduct.

License

MIT. The vendored matminer Miedema data files remain under their upstream BSD-3-Clause license, preserved at descriptors/data/LICENSE.matminer.txt.

Citation

Citation metadata in CITATION.cff. When citing hea-bench, please also cite the original source datasets (Borg, Pei, Peivaste) and matminer — see data/raw/<source>/README.md for each source's preferred citation.

hea-bench is archived on Zenodo. The concept DOI 10.5281/zenodo.20346287 always resolves to the latest version; v0.1.0 specifically is 10.5281/zenodo.20346288.

Acknowledgements

All numerical parameters, formulas, threshold values, and benchmark numbers are derived from cited primary sources or computed in this codebase from documented inputs; the author verified outputs against the cited literature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hea-bench

TL;DR

Headline benchmark numbers (v0.1.0)

Quick start (Python)

Quick start (CLI)

Quick start (browser, no install)

Architecture

What's in the benchmark

What's covered

Sources and attribution

Project layout

Development

Contributing and support

License

Citation

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
data		data
docs		docs
examples		examples
src/hea_bench		src/hea_bench
tests		tests
web		web
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

hea-bench

TL;DR

Headline benchmark numbers (v0.1.0)

Quick start (Python)

Quick start (CLI)

Quick start (browser, no install)

Architecture

What's in the benchmark

What's covered

Sources and attribution

Project layout

Development

Contributing and support

License

Citation

Acknowledgements

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages