A Benchmarking Evaluation Framework for Target-Driven Materials Formulation.
MatFormBench is an evaluation framework for inverse materials and formulation design. It provides synthetic benchmark tasks, algorithm configuration cards, baseline inverse-design algorithms, and evaluation metrics for comparing optimization, generative, and LLM-assisted design strategies.
This open version is designed for users who want to:
- run benchmark tasks on predefined synthetic formulation datasets;
- evaluate inverse-design algorithms with a unified scoring pipeline;
- compare classical optimization, surrogate-based search, generative models, and LLM-based methods;
- add their own algorithm card and test it under the same evaluation protocol.
The core oracle and metric implementation are distributed as protected compiled modules in this public release.
matformbench-openversion/
├── assets/
│ └── matformbench-icon.svg
├── inverse_algo_card/ # YAML algorithm configuration cards
├── inverse_algorithms_v2/ # Algorithm implementations and registry
├── inverse_metrics/ # Protected compiled metric module
│ └── metrics.cpython-310-x86_64-linux-gnu.so
├── inverse_utils/ # Result saving and score utilities
├── scripts/ # Helper scripts for selected LLM runs
├── synthetic_data/ # Protected compiled oracle module
│ └── oracle_0.cpython-310-x86_64-linux-gnu.so
├── task_registry.json # Task definitions
├── team_test_v2.py # Main evaluation entry point
├── run_batch_team_test_v2.py # General batch runner
├── run_gpr_test.py # GPR-only batch runner
├── run_batch_*.py # Model-specific batch runners
├── requirements.txt
└── README.md
This release includes precompiled protected modules:
synthetic_data/oracle_0.cpython-310-x86_64-linux-gnu.so
inverse_metrics/metrics.cpython-310-x86_64-linux-gnu.so
They are built for:
Linux x86_64 + CPython 3.10
Use Python 3.10 on Linux for the easiest setup. Other operating systems or Python versions require rebuilding the protected modules from the private source files.
conda create -n matformbench python=3.10 -y
conda activate matformbench
pip install --upgrade pip
pip install -r requirements.txtpython3.10 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtRun these commands from the project root:
python -c "from synthetic_data import oracle_0; print('oracle module ok')"
python -c "from inverse_metrics import metrics; print('metrics module ok')"If both commands print ok, the compiled modules are compatible with your Python environment.
Run a single algorithm card:
python team_test_v2.py \
--config inverse_algo_card/algo_card_ACO_GPR.yaml \
--task-registry task_registry.jsonA successful run will print a result path such as:
Result JSON saved to: outputs/ACO_GPR_L1_Dataset-1.json
The output directory is ignored by Git by default.
python run_batch_team_test_v2.py \
--task-registry task_registry.jsonpython run_batch_team_test_v2.py \
--task-registry task_registry.json \
--levels L1 L2 \
--datasets Dataset-1 Dataset-2python run_batch_team_test_v2.py \
--task-registry task_registry.json \
--algos GA_GPR PSO_GPR ACO_GPR \
--levels L1 \
--datasets Dataset-1python run_gpr_test.py \
--task-registry task_registry.json \
--levels L1 L2 \
--datasets Dataset-1 Dataset-2For example:
python run_batch_Diffusion_DDIM.py \
--task-registry task_registry.json \
--levels L1 L2 \
--datasets Dataset-1 Dataset-2Some algorithm cards call external LLM APIs. API keys are not included in this repository. Set the required environment variable before running those cards.
export DEEPSEEK_API_KEY="your_key_here"
export MOONSHOT_API_KEY="your_key_here"
export ZAI_API_KEY="your_key_here"$env:DEEPSEEK_API_KEY="your_key_here"
$env:MOONSHOT_API_KEY="your_key_here"
$env:ZAI_API_KEY="your_key_here"Example LLM batch runs:
python scripts/run_deepseek_direct5_batch.py --levels L1 --datasets Dataset-1
python scripts/run_kimi_direct5_batch.py --levels L1 --datasets Dataset-1
python scripts/run_glm_direct5_batch.py --levels L1 --datasets Dataset-1Each algorithm is configured by a YAML card in inverse_algo_card/. A typical card contains:
submission:
algo_name: ACO_GPR
author: anonymous
description: ACO with GPR surrogate
task:
level: L1
dataset: Dataset-1
split: train
design: lhs
seed: 42
algorithm:
optimizer:
name: ant_colony_optimization
params:
n_ants: 80
n_iterations: 200
evaporation_rate: 0.2
random_state: 42
surrogate_model:
name: gpr
params:
normalize_y: true
n_restarts_optimizer: 1
kernel: trend_matern
evaluation:
recommend:
n_suggestions: 100
n_choose: 5
topk:
K: 5
n_rounds: 5
min_success: 5
dss:
count: [10, 15, 30, 50, 100]
output:
save_json: true
output_dir: outputs
file_name_rule: "{algo_name}_{level}_{dataset}.json"
overwrite: true
mlflow:
enabled: false
tracking_uri: ""
experiment_name: public_releaseTo test a new method, add an implementation under inverse_algorithms_v2/, register it in inverse_algorithms_v2/registry.py, then create a corresponding YAML card.
task_registry.json defines the available benchmark tasks. It organizes tasks by level, dataset name, feature columns, and target constraints.
Common examples:
L1 / Dataset-1
L1 / Dataset-2
L2 / Dataset-1
...
The batch runner can override level and dataset in the YAML card at runtime, so one card can be evaluated across many tasks.
Evaluation results are saved as JSON files. By default:
outputs/{level}/{dataset}/{algo_name}_{level}_{dataset}.json
Each result includes:
- submission metadata;
- task configuration;
- algorithm configuration;
- evaluation settings;
- individual metric results;
- final score pack.
outputs/ is intentionally excluded from Git because it can become large and may contain local experiment logs.