MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

A Benchmarking Evaluation Framework for Target-Driven Materials Formulation.

Overview

MatFormBench is an evaluation framework for inverse materials and formulation design. It provides synthetic benchmark tasks, algorithm configuration cards, baseline inverse-design algorithms, and evaluation metrics for comparing optimization, generative, and LLM-assisted design strategies.

This open version is designed for users who want to:

run benchmark tasks on predefined synthetic formulation datasets;
evaluate inverse-design algorithms with a unified scoring pipeline;
compare classical optimization, surrogate-based search, generative models, and LLM-based methods;
add their own algorithm card and test it under the same evaluation protocol.

The core oracle and metric implementation are distributed as protected compiled modules in this public release.

Repository structure

matformbench-openversion/
├── assets/
│   └── matformbench-icon.svg
├── inverse_algo_card/              # YAML algorithm configuration cards
├── inverse_algorithms_v2/           # Algorithm implementations and registry
├── inverse_metrics/                 # Protected compiled metric module
│   └── metrics.cpython-310-x86_64-linux-gnu.so
├── inverse_utils/                   # Result saving and score utilities
├── scripts/                         # Helper scripts for selected LLM runs
├── synthetic_data/                  # Protected compiled oracle module
│   └── oracle_0.cpython-310-x86_64-linux-gnu.so
├── task_registry.json               # Task definitions
├── team_test_v2.py                  # Main evaluation entry point
├── run_batch_team_test_v2.py        # General batch runner
├── run_gpr_test.py                  # GPR-only batch runner
├── run_batch_*.py                   # Model-specific batch runners
├── requirements.txt
└── README.md

Platform requirement

This release includes precompiled protected modules:

synthetic_data/oracle_0.cpython-310-x86_64-linux-gnu.so
inverse_metrics/metrics.cpython-310-x86_64-linux-gnu.so

They are built for:

Linux x86_64 + CPython 3.10

Use Python 3.10 on Linux for the easiest setup. Other operating systems or Python versions require rebuilding the protected modules from the private source files.

Environment setup

Option A: Conda, recommended

conda create -n matformbench python=3.10 -y
conda activate matformbench
pip install --upgrade pip
pip install -r requirements.txt

Option B: Python venv

python3.10 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Verify the protected modules

Run these commands from the project root:

python -c "from synthetic_data import oracle_0; print('oracle module ok')"
python -c "from inverse_metrics import metrics; print('metrics module ok')"

If both commands print ok, the compiled modules are compatible with your Python environment.

Quick start

Run a single algorithm card:

python team_test_v2.py \
  --config inverse_algo_card/algo_card_ACO_GPR.yaml \
  --task-registry task_registry.json

A successful run will print a result path such as:

Result JSON saved to: outputs/ACO_GPR_L1_Dataset-1.json

The output directory is ignored by Git by default.

Run batch evaluations

Run all available algorithm cards on all registered tasks

python run_batch_team_test_v2.py \
  --task-registry task_registry.json

Run selected levels and datasets

python run_batch_team_test_v2.py \
  --task-registry task_registry.json \
  --levels L1 L2 \
  --datasets Dataset-1 Dataset-2

Run selected algorithms only

python run_batch_team_test_v2.py \
  --task-registry task_registry.json \
  --algos GA_GPR PSO_GPR ACO_GPR \
  --levels L1 \
  --datasets Dataset-1

Run GPR baselines only

python run_gpr_test.py \
  --task-registry task_registry.json \
  --levels L1 L2 \
  --datasets Dataset-1 Dataset-2

Run a model-specific batch script

For example:

python run_batch_Diffusion_DDIM.py \
  --task-registry task_registry.json \
  --levels L1 L2 \
  --datasets Dataset-1 Dataset-2

LLM-based algorithms

Some algorithm cards call external LLM APIs. API keys are not included in this repository. Set the required environment variable before running those cards.

Linux/macOS

export DEEPSEEK_API_KEY="your_key_here"
export MOONSHOT_API_KEY="your_key_here"
export ZAI_API_KEY="your_key_here"

Windows PowerShell

$env:DEEPSEEK_API_KEY="your_key_here"
$env:MOONSHOT_API_KEY="your_key_here"
$env:ZAI_API_KEY="your_key_here"

Example LLM batch runs:

python scripts/run_deepseek_direct5_batch.py --levels L1 --datasets Dataset-1
python scripts/run_kimi_direct5_batch.py --levels L1 --datasets Dataset-1
python scripts/run_glm_direct5_batch.py --levels L1 --datasets Dataset-1

Algorithm card format

Each algorithm is configured by a YAML card in inverse_algo_card/. A typical card contains:

submission:
  algo_name: ACO_GPR
  author: anonymous
  description: ACO with GPR surrogate

task:
  level: L1
  dataset: Dataset-1
  split: train
  design: lhs
  seed: 42

algorithm:
  optimizer:
    name: ant_colony_optimization
    params:
      n_ants: 80
      n_iterations: 200
      evaporation_rate: 0.2
      random_state: 42
  surrogate_model:
    name: gpr
    params:
      normalize_y: true
      n_restarts_optimizer: 1
      kernel: trend_matern

evaluation:
  recommend:
    n_suggestions: 100
    n_choose: 5
  topk:
    K: 5
    n_rounds: 5
    min_success: 5
  dss:
    count: [10, 15, 30, 50, 100]

output:
  save_json: true
  output_dir: outputs
  file_name_rule: "{algo_name}_{level}_{dataset}.json"
  overwrite: true

mlflow:
  enabled: false
  tracking_uri: ""
  experiment_name: public_release

To test a new method, add an implementation under inverse_algorithms_v2/, register it in inverse_algorithms_v2/registry.py, then create a corresponding YAML card.

Tasks and datasets

task_registry.json defines the available benchmark tasks. It organizes tasks by level, dataset name, feature columns, and target constraints.

Common examples:

L1 / Dataset-1
L1 / Dataset-2
L2 / Dataset-1
...

The batch runner can override level and dataset in the YAML card at runtime, so one card can be evaluated across many tasks.

Outputs

Evaluation results are saved as JSON files. By default:

outputs/{level}/{dataset}/{algo_name}_{level}_{dataset}.json

Each result includes:

submission metadata;
task configuration;
algorithm configuration;
evaluation settings;
individual metric results;
final score pack.

outputs/ is intentionally excluded from Git because it can become large and may contain local experiment logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

Overview

Repository structure

Platform requirement

Environment setup

Option A: Conda, recommended

Option B: Python venv

Verify the protected modules

Quick start

Run batch evaluations

Run all available algorithm cards on all registered tasks

Run selected levels and datasets

Run selected algorithms only

Run GPR baselines only

Run a model-specific batch script

LLM-based algorithms

Linux/macOS

Windows PowerShell

Algorithm card format

Tasks and datasets

Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
inverse_algo_card		inverse_algo_card
inverse_algorithms_v2		inverse_algorithms_v2
inverse_metrics		inverse_metrics
inverse_utils		inverse_utils
scripts		scripts
synthetic_data		synthetic_data
README.md		README.md
requirements.txt		requirements.txt
run_batch_Diffusion_DDIM.py		run_batch_Diffusion_DDIM.py
run_batch_Diffusion_DDPM.py		run_batch_Diffusion_DDPM.py
run_batch_Diffusion_EDM.py		run_batch_Diffusion_EDM.py
run_batch_Diffusion_FlowMatching.py		run_batch_Diffusion_FlowMatching.py
run_batch_GAN_CGAN.py		run_batch_GAN_CGAN.py
run_batch_GAN_CTGAN.py		run_batch_GAN_CTGAN.py
run_batch_GAN_PacGAN.py		run_batch_GAN_PacGAN.py
run_batch_GAN_WGAN_GP.py		run_batch_GAN_WGAN_GP.py
run_batch_VAE_BetaVAE.py		run_batch_VAE_BetaVAE.py
run_batch_VAE_CVAE.py		run_batch_VAE_CVAE.py
run_batch_VAE_IWAE.py		run_batch_VAE_IWAE.py
run_batch_VAE_VampPrior.py		run_batch_VAE_VampPrior.py
run_batch_team_test_v2.py		run_batch_team_test_v2.py
run_gpr_test.py		run_gpr_test.py
task_registry.json		task_registry.json
team_test_v2.py		team_test_v2.py

Folders and files

Latest commit

History

Repository files navigation

MatFormBench: A Benchmarking Evaluation Framework for Target-Driven Materials Formulation

Overview

Repository structure

Platform requirement

Environment setup

Option A: Conda, recommended

Option B: Python venv

Verify the protected modules

Quick start

Run batch evaluations

Run all available algorithm cards on all registered tasks

Run selected levels and datasets

Run selected algorithms only

Run GPR baselines only

Run a model-specific batch script

LLM-based algorithms

Linux/macOS

Windows PowerShell

Algorithm card format

Tasks and datasets

Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages