Code3DBench

Single-image to executable low-poly 3D code generation.

Code3DBench evaluates whether a multimodal model can look at one rendered reference image and write object-building Three.js code that reconstructs the object inside a fixed browser scaffold. The generated program is executed, exported as geometry, and scored with executability, mesh geometry metrics, and image-space diagnostics.

Status

This repository includes the benchmark implementation, prompt templates, fixed Three.js scaffold, the 1012-object benchmark manifest, selected public CC0 GLB assets, compact result summaries, and paper/poster artifacts.

It intentionally does not include raw run directories, provider traces, or generated model-output programs. Those files are large, noisy, and can contain provider metadata that is not useful for a clean public starter.

Materials

Repository Contents

benchmark_assets/manifest.json: 1012-object benchmark manifest with category labels and source provenance.
benchmark_assets/assets/: selected public CC0 GLB assets plus referenced texture files.
modules/: ground-truth rendering, inference, execution, iterative revision, metric evaluation, and reporting modules.
tools/: helper scripts for asset preparation, review, aggregation, and exports.
prompts/: scaffold contract and model prompt templates.
tests/: lightweight tests for shared helpers and manifest/category behavior.
data/results/: compact paper result summaries, category scores, recovery counts, and TRELLIS.2 reference numbers.
docs/: paper, supplement, poster, and figure assets.

Benchmark At A Glance

Task: one reference image -> executable Three.js object code.
Dataset slice: 1012 deduplicated public CC0 low-poly objects from 39 packs across 8 categories.
Categories: character, container, food, furniture, nature, prop, tool, and vehicle.
Execution: generated code runs inside a fixed Three.js scaffold.
Geometry metrics: Chamfer L2, F@1, F@2, F@5, and normal consistency on exported normalized meshes.
Image diagnostics: SSIM, MSE, LPIPS, and CLIP similarity.
Iterative setting: single-shot compared with fixed-budget 3-step full-program revision.

Quickstart

Install the JavaScript dependencies used by the Three.js scaffold:

npm install

Install Python dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run a small benchmark smoke test:

export OPENROUTER_API_KEY=...
python3 run_benchmark.py \
  --run-id smoke_5 \
  --manifest benchmark_assets/manifest.json \
  --limit 5 \
  --max-iter 1 \
  --iterative-workers 1 \
  --execution-workers 1

Full metric evaluation uses PyTorch3D. Depending on your platform, installing PyTorch/PyTorch3D may require the package versions recommended for your CUDA or CPU environment.

Data Files

The public manifest and compact summaries are intended to be readable without running the full benchmark:

benchmark_assets/manifest.json: final 1012-object benchmark slice.
data/category_labels.csv: object id to final category label.
data/category_counts.json: category counts used in the paper and poster.
data/results/main_report.json: single-shot, final@3, and best@<=3 model results.
data/results/category_scores.json: per-category summary table.
data/results/executability_recovery.json: executability recovery by retry step.
data/results/trellis_reference.json: TRELLIS.2 scale-reference numbers.

Headline Results

Lower CD is better; higher F@5 is better. CD is reported as CD x 10^3.

Model	Single CD	Single F@5	Final@3 CD	Final@3 F@5	Best@<=3 CD	Best@<=3 F@5
GPT-5.4	17.9	0.6175	13.6	0.6369	9.8	0.6882
GPT-5.4 mini	16.8	0.5939	20.9	0.5975	12.4	0.6543
Claude Sonnet 4.6	18.6	0.6048	18.1	0.6314	11.5	0.6820
Gemini 3 Flash	16.6	0.6548	15.2	0.6541	8.7	0.7279
Gemini 3.1 Pro	15.7	0.6437	14.4	0.6830	8.6	0.7435

The main empirical pattern is that runtime success recovers quickly, while exported mesh geometry remains the limiting factor. The best@<=3 diagnostic also shows that final-only iterative reporting can miss better within-budget geometry.

Reproducibility Notes

config.json contains the paper model ids and provider routing, but no API keys.
OPENROUTER_API_KEY is required for model inference.
FAL_KEY is only needed for optional TRELLIS.2 reference tooling.
runs/ and output/ are ignored because benchmark runs can be large.
Raw provider traces and generated model programs are not part of this public starter.

Citation

@misc{glazachev2026code3dbench,
  title  = {Code3DBench: Single-Image to Executable Low-Poly 3D Code Generation},
  author = {Glazachev, Vladimir},
  year   = {2026}
}

License

This repository is released under the MIT License. The benchmark source assets are public CC0 assets from third-party creators and retain pack-level and per-object provenance in benchmark_assets/manifest.json.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code3DBench

Status

Materials

Repository Contents

Benchmark At A Glance

Quickstart

Data Files

Headline Results

Reproducibility Notes

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benchmark_assets		benchmark_assets
data		data
docs		docs
modules		modules
prompts		prompts
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py

Folders and files

Latest commit

History

Repository files navigation

Code3DBench

Status

Materials

Repository Contents

Benchmark At A Glance

Quickstart

Data Files

Headline Results

Reproducibility Notes

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages