Skip to content

VladimirGl/Code3DBench

Repository files navigation

Code3DBench

Single-image to executable low-poly 3D code generation.

Code3DBench evaluates whether a multimodal model can look at one rendered reference image and write object-building Three.js code that reconstructs the object inside a fixed browser scaffold. The generated program is executed, exported as geometry, and scored with executability, mesh geometry metrics, and image-space diagnostics.

Code3DBench poster preview

Status

This repository includes the benchmark implementation, prompt templates, fixed Three.js scaffold, the 1012-object benchmark manifest, selected public CC0 GLB assets, compact result summaries, and paper/poster artifacts.

It intentionally does not include raw run directories, provider traces, or generated model-output programs. Those files are large, noisy, and can contain provider metadata that is not useful for a clean public starter.

Materials

Repository Contents

  • benchmark_assets/manifest.json: 1012-object benchmark manifest with category labels and source provenance.
  • benchmark_assets/assets/: selected public CC0 GLB assets plus referenced texture files.
  • modules/: ground-truth rendering, inference, execution, iterative revision, metric evaluation, and reporting modules.
  • tools/: helper scripts for asset preparation, review, aggregation, and exports.
  • prompts/: scaffold contract and model prompt templates.
  • tests/: lightweight tests for shared helpers and manifest/category behavior.
  • data/results/: compact paper result summaries, category scores, recovery counts, and TRELLIS.2 reference numbers.
  • docs/: paper, supplement, poster, and figure assets.

Benchmark At A Glance

  • Task: one reference image -> executable Three.js object code.
  • Dataset slice: 1012 deduplicated public CC0 low-poly objects from 39 packs across 8 categories.
  • Categories: character, container, food, furniture, nature, prop, tool, and vehicle.
  • Execution: generated code runs inside a fixed Three.js scaffold.
  • Geometry metrics: Chamfer L2, F@1, F@2, F@5, and normal consistency on exported normalized meshes.
  • Image diagnostics: SSIM, MSE, LPIPS, and CLIP similarity.
  • Iterative setting: single-shot compared with fixed-budget 3-step full-program revision.

Quickstart

Install the JavaScript dependencies used by the Three.js scaffold:

npm install

Install Python dependencies:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run a small benchmark smoke test:

export OPENROUTER_API_KEY=...
python3 run_benchmark.py \
  --run-id smoke_5 \
  --manifest benchmark_assets/manifest.json \
  --limit 5 \
  --max-iter 1 \
  --iterative-workers 1 \
  --execution-workers 1

Full metric evaluation uses PyTorch3D. Depending on your platform, installing PyTorch/PyTorch3D may require the package versions recommended for your CUDA or CPU environment.

Data Files

The public manifest and compact summaries are intended to be readable without running the full benchmark:

  • benchmark_assets/manifest.json: final 1012-object benchmark slice.
  • data/category_labels.csv: object id to final category label.
  • data/category_counts.json: category counts used in the paper and poster.
  • data/results/main_report.json: single-shot, final@3, and best@<=3 model results.
  • data/results/category_scores.json: per-category summary table.
  • data/results/executability_recovery.json: executability recovery by retry step.
  • data/results/trellis_reference.json: TRELLIS.2 scale-reference numbers.

Headline Results

Lower CD is better; higher F@5 is better. CD is reported as CD x 10^3.

Model Single CD Single F@5 Final@3 CD Final@3 F@5 Best@<=3 CD Best@<=3 F@5
GPT-5.4 17.9 0.6175 13.6 0.6369 9.8 0.6882
GPT-5.4 mini 16.8 0.5939 20.9 0.5975 12.4 0.6543
Claude Sonnet 4.6 18.6 0.6048 18.1 0.6314 11.5 0.6820
Gemini 3 Flash 16.6 0.6548 15.2 0.6541 8.7 0.7279
Gemini 3.1 Pro 15.7 0.6437 14.4 0.6830 8.6 0.7435

The main empirical pattern is that runtime success recovers quickly, while exported mesh geometry remains the limiting factor. The best@<=3 diagnostic also shows that final-only iterative reporting can miss better within-budget geometry.

Reproducibility Notes

  • config.json contains the paper model ids and provider routing, but no API keys.
  • OPENROUTER_API_KEY is required for model inference.
  • FAL_KEY is only needed for optional TRELLIS.2 reference tooling.
  • runs/ and output/ are ignored because benchmark runs can be large.
  • Raw provider traces and generated model programs are not part of this public starter.

Citation

@misc{glazachev2026code3dbench,
  title  = {Code3DBench: Single-Image to Executable Low-Poly 3D Code Generation},
  author = {Glazachev, Vladimir},
  year   = {2026}
}

License

This repository is released under the MIT License. The benchmark source assets are public CC0 assets from third-party creators and retain pack-level and per-object provenance in benchmark_assets/manifest.json.

About

Code3DBench: Single-Image to Executable Low-Poly 3D Code Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors