LEAP Data Release

This repository contains data tables supporting the LEAP / Perovskite-RL manuscript. It is a data-release repository: model weights, Hugging Face training datasets, raw PDFs, logs, procurement tables, and the full internally curated candidate pool are not stored here.

Resource Links

Resource	Link	Notes
Manuscript	arXiv:2605.20242	Associated LEAP / Perovskite-RL preprint.
Model weights	JH976/Perovskite-RL	Perovskite-RL model repository. Update this link if the public model repository uses a different name.
Training datasets	datasets/JH976/Perovskite-RL	SFT and GRPO datasets for the language-model training stages.
Data-release repository	WD928/LEAP	Tables and source data used for benchmark, ablation, and candidate-selection reporting.

Repository Map

Module	Path	Contents
Hot-start additive data	`data/`	36 experimentally characterized additives, hard descriptors, soft mechanistic descriptor statistics, and relative PCE changes.
Mechanism benchmark	`benchmark/questions.csv`	32 multiple-choice questions from held-out literature sources.
Model benchmark results	`benchmark/model_results/`	Per-question answers for Perovskite-RL and baseline models.
Benchmark statistics	`benchmark/statistics/`	Accuracy summaries, exact McNemar-test tables, and Holm-Bonferroni-adjusted pairwise comparisons.
Representation ablation	`ablation/representation/`	Hard/soft/hybrid representation ablation source data and bootstrap confidence intervals for Figure 3 metrics.
Reasoning-source ablation	`ablation/reasoning_source/`	Perovskite-RL versus backbone soft-descriptor ablation tables and top-k diagnostics.
Decision-policy ablation	`ablation/decision_policy/`	Expected-improvement, predicted-mean, uncertainty, and random-policy comparison data.
Candidate selection	`candidate_selection/`	Round-specific top-50 validation shortlists with molecule identifiers and mechanism-score summaries.

Key Files

File	Description
`data/hot_start_additives.csv`	Main 36-additive hot-start table with measured PCE values and descriptors.
`benchmark/statistics/model_summary_with_ci.csv`	Benchmark accuracy summary with confidence intervals.
`benchmark/statistics/mcnemar_vs_reference_holm.csv`	Exact McNemar comparisons against Perovskite-RL with Holm-Bonferroni adjustment.
`ablation/representation/figure3_bootstrap_ci_table.csv`	Bootstrap confidence intervals for the Figure 3 representation-ablation metrics.
`candidate_selection/top50_validation_shortlists_mechanism_scores.csv`	Cleaned round-specific top-50 validation shortlist table.

License

This repository is released under the Apache License 2.0. See the LICENSE file for details.

Citation

Please cite the associated arXiv preprint if you use this repository:

https://arxiv.org/abs/2605.20242

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LEAP Data Release

Resource Links

Repository Map

Key Files

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ablation		ablation
benchmark		benchmark
candidate_selection		candidate_selection
data		data
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

LEAP Data Release

Resource Links

Repository Map

Key Files

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages