This repository contains data tables supporting the LEAP / Perovskite-RL manuscript. It is a data-release repository: model weights, Hugging Face training datasets, raw PDFs, logs, procurement tables, and the full internally curated candidate pool are not stored here.
| Resource | Link | Notes |
|---|---|---|
| Manuscript | arXiv:2605.20242 | Associated LEAP / Perovskite-RL preprint. |
| Model weights | JH976/Perovskite-RL | Perovskite-RL model repository. Update this link if the public model repository uses a different name. |
| Training datasets | datasets/JH976/Perovskite-RL | SFT and GRPO datasets for the language-model training stages. |
| Data-release repository | WD928/LEAP | Tables and source data used for benchmark, ablation, and candidate-selection reporting. |
| Module | Path | Contents |
|---|---|---|
| Hot-start additive data | data/ |
36 experimentally characterized additives, hard descriptors, soft mechanistic descriptor statistics, and relative PCE changes. |
| Mechanism benchmark | benchmark/questions.csv |
32 multiple-choice questions from held-out literature sources. |
| Model benchmark results | benchmark/model_results/ |
Per-question answers for Perovskite-RL and baseline models. |
| Benchmark statistics | benchmark/statistics/ |
Accuracy summaries, exact McNemar-test tables, and Holm-Bonferroni-adjusted pairwise comparisons. |
| Representation ablation | ablation/representation/ |
Hard/soft/hybrid representation ablation source data and bootstrap confidence intervals for Figure 3 metrics. |
| Reasoning-source ablation | ablation/reasoning_source/ |
Perovskite-RL versus backbone soft-descriptor ablation tables and top-k diagnostics. |
| Decision-policy ablation | ablation/decision_policy/ |
Expected-improvement, predicted-mean, uncertainty, and random-policy comparison data. |
| Candidate selection | candidate_selection/ |
Round-specific top-50 validation shortlists with molecule identifiers and mechanism-score summaries. |
| File | Description |
|---|---|
data/hot_start_additives.csv |
Main 36-additive hot-start table with measured PCE values and descriptors. |
benchmark/statistics/model_summary_with_ci.csv |
Benchmark accuracy summary with confidence intervals. |
benchmark/statistics/mcnemar_vs_reference_holm.csv |
Exact McNemar comparisons against Perovskite-RL with Holm-Bonferroni adjustment. |
ablation/representation/figure3_bootstrap_ci_table.csv |
Bootstrap confidence intervals for the Figure 3 representation-ablation metrics. |
candidate_selection/top50_validation_shortlists_mechanism_scores.csv |
Cleaned round-specific top-50 validation shortlist table. |
This repository is released under the Apache License 2.0. See the LICENSE file for details.
Please cite the associated arXiv preprint if you use this repository: