Single-image to executable low-poly 3D code generation.
Code3DBench evaluates whether a multimodal model can look at one rendered reference image and write object-building Three.js code that reconstructs the object inside a fixed browser scaffold. The generated program is executed, exported as geometry, and scored with executability, mesh geometry metrics, and image-space diagnostics.
This repository includes the benchmark implementation, prompt templates, fixed Three.js scaffold, the 1012-object benchmark manifest, selected public CC0 GLB assets, compact result summaries, and paper/poster artifacts.
It intentionally does not include raw run directories, provider traces, or generated model-output programs. Those files are large, noisy, and can contain provider metadata that is not useful for a clean public starter.
benchmark_assets/manifest.json: 1012-object benchmark manifest with category labels and source provenance.benchmark_assets/assets/: selected public CC0 GLB assets plus referenced texture files.modules/: ground-truth rendering, inference, execution, iterative revision, metric evaluation, and reporting modules.tools/: helper scripts for asset preparation, review, aggregation, and exports.prompts/: scaffold contract and model prompt templates.tests/: lightweight tests for shared helpers and manifest/category behavior.data/results/: compact paper result summaries, category scores, recovery counts, and TRELLIS.2 reference numbers.docs/: paper, supplement, poster, and figure assets.
- Task: one reference image -> executable Three.js object code.
- Dataset slice: 1012 deduplicated public CC0 low-poly objects from 39 packs across 8 categories.
- Categories: character, container, food, furniture, nature, prop, tool, and vehicle.
- Execution: generated code runs inside a fixed Three.js scaffold.
- Geometry metrics: Chamfer L2, F@1, F@2, F@5, and normal consistency on exported normalized meshes.
- Image diagnostics: SSIM, MSE, LPIPS, and CLIP similarity.
- Iterative setting: single-shot compared with fixed-budget 3-step full-program revision.
Install the JavaScript dependencies used by the Three.js scaffold:
npm installInstall Python dependencies:
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtRun a small benchmark smoke test:
export OPENROUTER_API_KEY=...
python3 run_benchmark.py \
--run-id smoke_5 \
--manifest benchmark_assets/manifest.json \
--limit 5 \
--max-iter 1 \
--iterative-workers 1 \
--execution-workers 1Full metric evaluation uses PyTorch3D. Depending on your platform, installing PyTorch/PyTorch3D may require the package versions recommended for your CUDA or CPU environment.
The public manifest and compact summaries are intended to be readable without running the full benchmark:
benchmark_assets/manifest.json: final 1012-object benchmark slice.data/category_labels.csv: object id to final category label.data/category_counts.json: category counts used in the paper and poster.data/results/main_report.json: single-shot, final@3, and best@<=3 model results.data/results/category_scores.json: per-category summary table.data/results/executability_recovery.json: executability recovery by retry step.data/results/trellis_reference.json: TRELLIS.2 scale-reference numbers.
Lower CD is better; higher F@5 is better. CD is reported as CD x 10^3.
| Model | Single CD | Single F@5 | Final@3 CD | Final@3 F@5 | Best@<=3 CD | Best@<=3 F@5 |
|---|---|---|---|---|---|---|
| GPT-5.4 | 17.9 | 0.6175 | 13.6 | 0.6369 | 9.8 | 0.6882 |
| GPT-5.4 mini | 16.8 | 0.5939 | 20.9 | 0.5975 | 12.4 | 0.6543 |
| Claude Sonnet 4.6 | 18.6 | 0.6048 | 18.1 | 0.6314 | 11.5 | 0.6820 |
| Gemini 3 Flash | 16.6 | 0.6548 | 15.2 | 0.6541 | 8.7 | 0.7279 |
| Gemini 3.1 Pro | 15.7 | 0.6437 | 14.4 | 0.6830 | 8.6 | 0.7435 |
The main empirical pattern is that runtime success recovers quickly, while exported mesh geometry remains the limiting factor. The best@<=3 diagnostic also shows that final-only iterative reporting can miss better within-budget geometry.
config.jsoncontains the paper model ids and provider routing, but no API keys.OPENROUTER_API_KEYis required for model inference.FAL_KEYis only needed for optional TRELLIS.2 reference tooling.runs/andoutput/are ignored because benchmark runs can be large.- Raw provider traces and generated model programs are not part of this public starter.
@misc{glazachev2026code3dbench,
title = {Code3DBench: Single-Image to Executable Low-Poly 3D Code Generation},
author = {Glazachev, Vladimir},
year = {2026}
}This repository is released under the MIT License. The benchmark source assets are public CC0 assets from third-party creators and retain pack-level and per-object provenance in benchmark_assets/manifest.json.
