Type what you want. Get a real CAD design.
Describe a product in plain words — a coffee maker, a kettle, a desk lamp — and an AI agent designs it live in Autodesk Fusion: real, editable, correctly sized parts, not a picture. It checks its own work by measuring the actual geometry, fixes its own mistakes, and only claims success when the numbers prove it.
Words are just one way in. Hand it a dimensioned engineering drawing and it reads the sheet and machines the part; hand it an existing STEP file and a change request and it edits the part in place — same loop, same measured finish line. See Four ways to ask for a part.
Under the hood: the LLM writes Fusion Python over Autodesk Fusion's MCP server; the CAD kernel measures the result, and only measurements decide PASS. Plain Python, no agent framework: the whole loop reads top to bottom.
One real run, replayed frame by frame: the agent places all seven components, measures its own work, catches the interpenetrating parts, deletes and rebuilds them, catches a floating basket, re-seats it — then the runner's independent kernel verification passes it. Full story: docs/drip-coffee-maker-walkthrough.md.
flowchart TD
P["📝 A sentence<br>plain-language description"] --> LIST
Y["📄 A brief YAML<br>your own acceptance gates"] --> LIST
D["📐 An engineering drawing<br>two-pass vision reads the sheet"] --> LIST
E["🧊 A CAD file + a change request<br>imported by fixed code"] --> LIST
LIST["📋 One contract: a checklist of measurable requirements<br>written by you, or proposed by the AI and approved by you"] --> LOOP
subgraph LOOP["🔁 The agent loop — builds the design, one small step at a time"]
direction TB
ORCH["Orchestrator<br>feeds in the task, the skills,<br>and the Fusion tools"] --> LLM["LLM<br>decides the next small step"]
LLM -- "writes a bit of CAD code" --> RUN["Run it in Fusion<br>a real part appears"]
LLM -- "unsure of the API" --> DOCS["Look it up<br>in Fusion's own manual"]
LLM -- "wants ground truth" --> MEAS["Measure the parts<br>fixed probe code — not written by the LLM"]
RUN -- "result, or the error to fix" --> LLM
DOCS -- "the real answer" --> LLM
MEAS -- "real sizes, volumes, positions" --> LLM
end
LOOP -- "the agent says: done" --> CHECK["🔍 Independent check<br>the harness re-measures every part with fixed code<br>and grades the checklist itself — no LLM involved"]
CHECK -- "some checks fail:<br>measured problems go back, next round" --> LOOP
CHECK -- "every check passes" --> DONE["✅ Verified design in Fusion"]
style LLM fill:#fff7ed,stroke:#ea580c
style ORCH fill:#fff7ed,stroke:#ea580c
style RUN fill:#eff6ff,stroke:#2563eb
style DOCS fill:#eff6ff,stroke:#2563eb
style MEAS fill:#eff6ff,stroke:#2563eb
style CHECK fill:#eff6ff,stroke:#2563eb
style DONE fill:#dcfce7,stroke:#059669
- You ask in whichever form you have — a plain sentence, a brief YAML, a dimensioned engineering drawing, or an existing CAD file plus a change request (the four ways in).
- Whatever came in becomes a written definition of "done". A prose request is turned into a checklist of measurable requirements — every part you named, every size, every capacity — which you approve before anything is built. A brief YAML's gates are used verbatim; a drawing contributes the facts read directly off the sheet.
- It designs in small steps. This is the agent loop: write a small piece of CAD code, run it in Fusion, look at what happened, continue. Parts appear one at a time in a real Fusion document. Unsure of something? It looks it up in the manual instead of guessing. A step fails? It reads the error and fixes it.
- It measures its own work. After building, it asks Fusion's geometry engine for the real numbers — sizes, volumes, positions — and fixes what's off: parts crossing each other, parts floating in the air, a tank that came out too small. (Watch it happen in the GIFs below.)
- The harness has the final word. When the agent says "done", the harness independently measures every part and grades the checklist itself. Anything failing goes back to the agent, with the measured numbers, for another round. It only reports success when every check passes.
The harness itself is small, plain Python you can read top to bottom: the agent loop (an LLM plus a handful of tools), the measuring step, and the grader — no agent framework. Anything risky (scripts touching your live Fusion document) asks a human first, and every step is logged.
One loop, one finish line — but four front doors. Whatever goes in, the same executor builds in live Fusion and the same kernel measurements decide PASS:
| you have | you run | what happens |
|---|---|---|
| a sentence | cad-agent build "a drip coffee maker with a 1.2 L tank" --enrich |
enrichment turns the prose into measurable checks; the agent builds until the kernel passes them |
| a written contract | cad-agent build brief.yaml |
your fuzzy intent in customer_request, your hard requirements as acceptance: gates — used verbatim (example) |
| an engineering drawing | python scripts/cadgenbench_run.py 101 |
a two-pass vision stage distills the sheet into a dimensioned spec; the same build loop machines it and exports STEP |
| a CAD file + a change request | python scripts/cadgenbench_run.py 206 --editing |
fixed code imports your STEP into Fusion; the agent finds the target faces by measurement and edits them in place |
The first two are the build command you have already seen. The last two ship as the CADGenBench adapter, scripts/cadgenbench_run.py — point --data at a folder of benchmark-style samples (input.png or input.step + description.yaml).
A real benchmark sample, end to end: the sheet above is the input; the rotating part is the exported STEP.
Reading the drawing is its own two-pass discipline, built on the same principle as the build loop — never let an unverified claim gate the work:
- Inventory pass. The vision model gets the full sheet plus four overlapping quadrant crops at native resolution (so small dimension text stays legible) and must transcribe every callout verbatim, view by view — after first committing, from the isometric views alone, to what the object is in 3D.
- Reconcile pass. The inventory comes back to the model to be merged into one contradiction-free spec: every feature must be placed by two agreeing views, every dimension must fit the envelope, and anything that cannot be reconciled is quarantined into
confidence_notesinstead of leaking into the build. (On the sample above it correctly ruled that the sheet's Ø200/Ø188 callouts cannot be holes — they exceed the 110 mm part — and read them as outline blend arcs.) - Only drawing facts become gates. The acceptance checks are just the envelope read off the sheet; the model's estimates stay in the prose spec where they can't force the build toward fiction.
Eight drawing samples ran end to end on live Fusion (claude-cli/opus): 8/8 exported STEPs passed the benchmark's validity gate — a single watertight solid inside the drawing-derived envelope, most in one round of 11–23 agent turns.
The reverse direction: the part already exists, and the request is a delta — "remove these holes", "make it 10 mm taller". A deterministic pre-step (no LLM) imports input.step into a fresh Fusion document; the agent then works under editing rules that mirror the build discipline:
- Measure before touching. First script inventories the solid and locates the target faces by geometric predicates — planar faces at a given Z, cylindrical faces of radius r, faces whose normal points +X. Never guessed face indices.
- Edit only what was asked. Direct-modeling features (offset/move faces, press-pull, or a local sketch+cut) on the located faces; every other face must stay exactly where it was.
- Re-measure the delta. After the edit the agent verifies the requested change actually happened — and that the part is still ONE watertight solid.
A real editing run (claude-cli/opus, one round, 17 turns). Request: "On the furthest face of the part in the −Y direction … there are three holes with centerlines parallel to the Y axis. Remove these three holes." The agent found them by predicate — cylindrical faces whose axis is parallel to Y opening on the face at y = −87.5 mm (one Ø12, two Ø6) — and healed all three with a single delete-face operation. The kernel numbers close the case three independent ways: the part went 637 → 587 faces and +1.696 cm³ in volume, which equals the analytic volume of the three filled holes (π·6²·10 + 2·π·3²·10 mm³) to the third decimal; the bounding box stayed 201.0 × 195.0 × 80.0 mm; and the exported STEP passed the benchmark's validity gate as one watertight solid.
The drip coffee maker above, from a single cad-agent build invocation (claude-cli, one round, 21 turns, zero failed scripts). The runner's output — measurements from Fusion's kernel, not the model's claims:
brief: drip_coffee_maker_part (15 acceptance checks)
[ok] component_count: have 7 components, need >= 1
[ok] interference: no interference between components
[ok] supported: every body has a load path to the ground
[ok] component_exists: component 'base_unit' exists
[ok] component_exists: component 'rear_column' exists
[ok] component_exists: component 'water_tank' exists
[ok] component_exists: component 'brew_head' exists
[ok] component_exists: component 'filter_basket' exists
[ok] component_exists: component 'drip_tray' exists
[ok] component_exists: component 'carafe' exists
[ok] component_count: have 7 components, need >= 6
[ok] cavity_volume: component 'water_tank' cavity is 1.106 L, need >= 1.0 L
[ok] footprint: footprint is 240.0 x 200.0 mm, need <= 260 x 250 mm
[ok] height: height is 350.0 mm, need <= 416 mm
[ok] min_wall: all shelled walls >= 1.5 mm
--- measured composition (kernel, not self-reported) ---
base_unit/base_unit_body
drip_tray/drip_tray_body
rear_column/rear_column_body
water_tank/water_tank_body (cavity 1.106 L)
brew_head/brew_head_body
carafe/carafe_body (cavity 0.610 L)
filter_basket/filter_basket_body (cavity 0.161 L)
PASS after 1 round(s), 21 agent turn(s)
The component_exists / cavity_volume / footprint / height / min_wall checks were generated by the enrichment stage from the prose brief — nobody hand-wrote a contract.
Every GIF below is a script-for-script replay of a real session log — the frames are actual geometry states, not illustrations. All runs: cad-agent build "<prompt>" --enrich through the local claude -p CLI.
The hero GIF at the top of this README. The agent built all seven components, then caught its own interpenetrating parts and a floating filter basket from kernel measurements and rebuilt them before verification ever ran. Full walkthrough: docs/drip-coffee-maker-walkthrough.md.
The prompt (a two-tier part ontology)
A massing-component ontology: base_unit, rear_column, water_tank (1.0–1.8 L), brew_head cantilevering over an open ≥130 mm cup zone, filter_basket, drip_tray, carafe — with dimension ranges, materials, BUY/MADE internal parts, and C-silhouette proportion rules. Reproduced in full in the walkthrough.
The prompt
Design an electric kettle as distinct components:
- power_base: a low round base on the counter, about 170 mm diameter and 25 mm tall.
- body: a cylindrical vessel sitting on the power base, about 150 mm diameter and 200 mm tall, holding at least 1.5 L of water. Round its top rim with a 4 mm fillet.
- handle: a C-shaped or bar handle on the side of the body, big enough for a hand (at least 30 mm clearance between handle and body).
- spout: a short pouring spout at the rim, on the side opposite the handle.
- lid: a domed lid capping the body. Appearance matters: give the body a brushed stainless steel appearance, and the power base, handle and lid a matte black plastic appearance.
Highlights: cavity kernel-measured at 3.104 L (≥1.5 required), the 4 mm rim fillet applied, 32 mm measured hand clearance — and one live self-correction: the spout first extruded upside-down to z −225 mm; the agent saw it in its own measurement and rebuilt it at the rim. Appearances applied from Fusion's real library (Stainless Steel – Brushed Linear Long, Plastic – Matte (Black)).
The prompt
Design a desk lamp as distinct components:
- base: a heavy round base about 180 mm diameter and 30 mm tall, its top edge rounded with a 3 mm fillet.
- stem: a slim vertical column rising about 350 mm from the back half of the base.
- arm: a horizontal arm reaching forward from the top of the stem, overhanging the front of the base.
- shade: a downward-opening cone or bell at the end of the arm, hollow inside, aimed at the desk. Layout rules: the light zone under the shade must stay open - at least 250 mm of clear vertical space between the base top and the shade rim - and the shade must overhang past the front edge of the base so light lands in front of the lamp, not on it. Appearance matters: matte black base and stem, brass-colored shade.
Highlights: enrichment extended the brief — it added light_source, power_switch, and power_cord as required components, and the agent built all seven. The hollow brass shade cleared the 250 mm light zone at a measured 258 mm; stability passed with the center of mass 74.75 mm inside the base edge. Before shelling the shade, the agent looked up ShellFeatures.createInput instead of guessing — the exact API a skill-less baseline run once crashed on twice.
Most CAD+LLM demos are chat wrappers: you talk, they call tools, you hope. This harness is built around one non-negotiable — the finish line is kernel-measured, never self-reported:
- The LLM gets freedom where geometry is synthesized. It free-writes small
adskPython scripts against the live document, looks up real API signatures on demand, and repairs from raw tracebacks — the standard coding-agent loop, applied to CAD. - Determinism decides what is true. After the agent declares done, ONE atomic probe enumerates every solid (world-space boxes, volumes, cavity volumes via TemporaryBRepManager) and the brief's acceptance checks run against that snapshot. Failures — with measured coordinates — become the next round's task.
- The contract is generated, then gated.
--enrichturns a prose brief into measurable checks (every named part becomes fail-able); a schema gate validates, a human approves. - Risky actions are gated. Scripts against a live CAD document are critical: a human approves them unless you explicitly run
--full-autoin a sandbox. The LLM can propose; it cannot approve. - Everything is logged. Full session JSONL (every prompt, reply, tool call) — the GIFs above are script-for-script replays of session logs.
With Fusion running and its MCP server enabled (Text Commands palette → Options.Set MCPServerEnabled ON):
uv sync --group dev
uv run cad-agent build "a desk organizer with three pen slots and a phone rest" --enrich --yes
uv run pytest # the full suite runs offline: no key, no network, no Fusionbuild accepts a bare prompt (minimal safety contract: something exists, nothing interpenetrates or floats) or a brief YAML with explicit acceptance checks — see examples/tasks/ for a recorded acceptance case and docs/architecture.md for the check vocabulary. For the other two inputs — an engineering drawing, or a STEP file plus a change request — use scripts/cadgenbench_run.py (Four ways to ask for a part). Every run writes runs/<name>-<timestamp>/ with trace.jsonl, report.json, screenshots, and summary.json.
--enrich— generate measurable checks from the request + knowledge before building--llm claude-cli | openrouter | anthropic— any chat LLM works; the protocol is plain text--yes— auto-approve consequential actions; live-document scripts still ask--full-auto— approve everything; for sandboxes and CI only- No Fusion?
uv run cad-agent mcp-sim --port 27183 &serves an offline stand-in (--mcp-url http://127.0.0.1:27183/mcp)
agent is a small coding agent (file tools + bash + live Fusion probes) whose knowledge comes from skills, not hardcoded lists:
uv run cad-agent agent "Probe the document and report every body's volume" --llm claude-cli- Skills are
skills/*/SKILL.mdfiles, discovered at startup, read on demand (progressive disclosure — names in the prompt, bodies via thereadtool). Shipped:fusion-mcp-usage(the MCP operating discipline the build agent follows),fusion-api-discovery(grep the version-exact API stubs in your local Fusion installation),extend-toolset. - One fenced
tool_callper turn over a plain-text protocol, so it runs onclaude -p, OpenRouter (OPEN_ROUTER_API), the Anthropic API, or scripted replay — same loop, no framework. - Every tool carries a risk class behind the
ActionPolicygate: safe (read, screenshots, measurements) runs freely; consequential (bash,write,edit) asks by default,--yesauto-approves; critical (fusion_script, MCPexecute/update) asks a human on every call, even under--yes. Unattended = denied. - It is also a generic MCP client: at startup it calls
tools/liston Fusion's MCP server (and any--mcp-serverURL) and registers every discovered tool, risk-classified, alongside the built-ins.
Every session — build and agent mode — is logged in full (system prompt, every LLM prompt/reply, every tool call with params and results) to local JSONL. These files double as an eval corpus and a replay source.
- Default location:
~/.cad-agent/sessions/YYYYMMDD/<ts>-<label>-<id>.jsonl(override:CAD_AGENT_SESSION_DIR; opt out:CAD_AGENT_SESSIONS=off). - OpenTelemetry (opt-in):
uv sync --extra otel, then--otel(orCAD_AGENT_OTEL=1). Spans go to any OTLP backend via the standard env vars.
The load-bearing decision is where determinism lives: the LLM writes any Fusion Python it wants (freedom where geometry is synthesized), and deterministic code decides what is true (one atomic kernel probe + the brief's acceptance checks, with measured failures driving bounded repair rounds). Details: docs/architecture.md; a complete live run: docs/drip-coffee-maker-walkthrough.md.
Not a product, not a chat assistant, and not affiliated with Autodesk. It is a reference harness for building reliable codegen agents on top of a CAD kernel: bounded loops, measured verification, human gates on irreversible actions, every deterministic module unit-tested.
Apache-2.0 — see LICENSE.
This is an independent open-source project — not affiliated with, endorsed by, or sponsored by Autodesk, Inc. Autodesk and Fusion are trademarks of Autodesk, Inc.





