A multi-agent system that autonomously operates an Ubuntu Linux desktop to perform engineering design tasks. The system uses Google Gemini's Computer Use API (vision model) to see the screen, reason about what to do, and control the mouse/keyboard to drive applications like FreeCAD and Google Chrome.
Built for the Gemini API Developer Competition 2025.
- Overview
- Key Insight: Model Sophistication Matters
- System Architecture
- Demo
- Deployment
- Quick Start
- Agents
- Multi-Agent Workflow
- The Agentic Loop
- Macro Execution Engine
- Planner Intelligence
- Skill Learning Pipeline
- Coordinate System & Executors
- Project Structure
- Configuration Reference
- Known Limitations & Lessons Learned
- Future Work
- Troubleshooting
This project demonstrates a general-purpose computer use agent that can:
-
Design 3D parts in FreeCAD -- The CAD agent sees the FreeCAD GUI through screenshots, clicks menus, draws sketches, applies constraints, and performs Part Design operations (Pad, Pocket, Thickness, Fillet, etc.). It can also run Python macros directly in FreeCAD's console for precision geometry.
-
Research information online -- The Research agent opens a browser via Playwright, searches DuckDuckGo, reads pages, and extracts structured data with confidence scores and source URLs.
-
Generate professional reports -- The Documentation agent converts raw research data into formatted Word (.docx) and PDF documents with tables, sections, and citations.
-
Chain agents together -- The Planner can route a request like "Make a bracket for an M6 bolt" through Research (find bolt specs) -> Documentation (save report) -> CAD (build the 3D model with real dimensions).
| Component | Technology |
|---|---|
| Vision Model (Computer Use) | Google Gemini 3 Flash Preview (gemini-3-flash-preview) |
| Planning Model (Text-only) | Google Gemini 3.1 Pro Preview (gemini-3.1-pro-preview) |
| Desktop Control | xdotool (X11 input automation) |
| Macro Engine | Python -> FreeCAD console (via xclip paste) |
| Screenshots | scrot + PIL (resize to 1440x900) |
| Browser Control | Playwright (Chromium) |
| CAD Application | FreeCAD 1.0 |
| Document Generation | fpdf2 (PDF), python-docx (Word) |
| VM Environment | Ubuntu Linux, XFCE desktop, X11 display server |
Through extensive testing, we discovered that complex engineering workflows are absolutely possible with Computer Use -- but the quality of the output depends heavily on the model's reasoning capability, not just its ability to see and click.
| Capability | Gemini 3 Flash (current) | Gemini 3.1 Pro (needed for CU) |
|---|---|---|
| Open FreeCAD, navigate menus | Works reliably | Works reliably |
| Create simple shapes (cube, cylinder) | Works (~80% success) | Works |
| Draw sketches with constraints | Inconsistent | Reliable |
| Generate correct FreeCAD Python macros | Wrong API names, wrong face refs | Correct API usage |
| Multi-step designs (L-bracket + holes) | Fails silently, burns turns | Can reason through steps |
| Error recovery (undo, retry) | Repeats same mistake | Recognizes and adapts |
| Spatial reasoning (which face is "top"?) | Guesses Face6, Face12 | Understands geometry |
The vision model (gemini-3-flash-preview) can see the screen perfectly and click accurately. But CAD design requires reasoning about 3D geometry from 2D screenshots -- understanding which face is the "top face," how a pocket changes the shape, and what the correct FreeCAD Python API property name is. This is a reasoning task, not a vision task.
A more sophisticated model like gemini-3.1-pro-preview (when it supports Computer Use) would unlock:
- Correct macro generation -- right property names, right face selection logic
- Multi-feature designs in one shot -- L-brackets with holes, gears with teeth
- Self-correction -- recognizing when a macro failed and fixing the code
- Spatial planning -- knowing that after a Pocket, the face indices change
Our architecture anticipates this: the Planner (already running on gemini-3.1-pro) generates detailed step-by-step workflows with correct FreeCAD API examples. The CAD agent (running on Flash) follows them. When a stronger model becomes available for Computer Use, the agent can handle more complex reasoning directly, and the Planner can send simpler instructions.
Less instruction = better performance with current vision models. The CAD agent uses a minimal system instruction (~130 lines) for desktop navigation. All task-specific intelligence comes from the Planner's action plans.
+----------------+
| main.py |
| (CLI/REPL) |
+-------+--------+
| user request
v
+----------------+
| Planner | <- Gemini 3.1 Pro (text-only)
| (Router + | classifies task, extracts
| Plan Builder) | params, generates workflow
+-------+--------+
+------------+------------+
v v v
+--------+ +----------+ +----------------+
| CAD | | Research | | Research -> CAD |
| Agent | | Agent | | (chained) |
+---+----+ +----+-----+ +----------------+
| |
v v
+-----------+ +------------+
| Desktop | | Browser |
| Executor | | Executor |
| (xdotool) | |(Playwright)|
+-----------+ +------------+
| |
+-------+-------+ |
v v v
+-----------+ +------+ +--------+
| FreeCAD | |Macro | | Chrome |
| (GUI) | |Engine| |(DuckDG)|
+-----------+ +------+ +--------+
+-------------------------------------------+
| Shared Agentic Loop |
| screenshot -> Gemini -> function calls |
| -> executor -> screenshot -> repeat... |
+-------------------------------------------+
User: "Make a bracket for an M6 bolt"
|
+- Planner._plan() -> Gemini classifies as "research+cad"
|
+- Phase 1: ResearchAgent
| +- Opens DuckDuckGo -> searches "M6 bolt dimensions"
| +- Reads multiple websites, extracts data points
| +- Returns: {data_points: [{fact: "clearance hole", value: "6.6", unit: "mm"}, ...]}
| +- DocumentationAgent auto-generates Word + PDF report
|
+- Phase 2: Planner._extract_dimensions()
| +- Gemini extracts CAD params: {hole_diameter: "6.6mm", wall_thickness: "3mm"}
|
+- Phase 3: Planner._generate_cad_goal()
| +- Gemini 3.1 Pro generates step-by-step FreeCAD workflow
| +- Includes correct API examples, face-finding patterns
|
+- Phase 4: CADAgent
+- Gets enriched description + workflow + FreeCAD tips
+- Runs macros and/or drives GUI to build the model
# Full pipeline: research M6 bolt specs, then build bracket in FreeCAD
python3 main.py "Make a bracket for an M6 bolt"
# CAD only (skip research -- fast for testing/demos)
python3 main.py --cad "Make a bracket for an M6 bolt"
# CAD only with explicit dimensions
python3 main.py --cad --dims hole_diameter=6.6mm wall_thickness=3mm "L-bracket with bolt holes"
# Research only: find specs, generate report
python3 main.py "Research the standard dimensions of an M8 hex bolt"
# Simple CAD (auto-routes to CAD, no research needed)
python3 main.py "Create a 30mm cube in FreeCAD"Research tasks (~3-5 minutes):
- Planner routes to Research agent
- Browser opens DuckDuckGo, searches multiple queries
- Agent visits 2-4 websites, extracts structured data points
- Documentation agent generates Word + PDF reports in
outputs/research_results/
CAD tasks (~5-15 minutes depending on complexity):
- Planner generates a step-by-step FreeCAD workflow
- CAD agent opens FreeCAD (or uses existing window)
- Agent runs Python macros for precise geometry AND/OR uses GUI clicking
- Stage budgets track progress (setup -> base_shape -> features -> cleanup)
- Agent calls
task_complete()when finished
Chained tasks (Research -> CAD, ~8-20 minutes):
- Research phase finds real-world specifications
- Planner extracts concrete dimensions from research data
- CAD phase builds the model using researched specs
Provision a fully configured Google Cloud VM with everything installed:
# Prerequisites: gcloud CLI installed + authenticated + project set
export GEMINI_API_KEY="your-key"
# Create VM with Ubuntu + XFCE + Xvfb + VNC + FreeCAD + agent
./scripts/deploy.sh --gcp
# Custom options
./scripts/deploy.sh --gcp --name my-agent --zone us-east1-b --machine-type e2-standard-8What it creates:
e2-standard-4Ubuntu 22.04 VM with 50GB SSD- XFCE desktop on a virtual display (Xvfb at 1280x800)
- VNC server on port 5900 (watch the agent work remotely)
- FreeCAD, Chromium, Python venv, all dependencies
- Your API key injected into the environment
agentalias for quick start
After ~5-10 minutes setup:
# SSH in
gcloud compute ssh engineering-agent-v2 --zone=us-central1-a
# Start
agent
python main.py --cad "Make a bracket for an M6 bolt"
# Watch via VNC (separate terminal)
gcloud compute ssh engineering-agent-v2 --zone=us-central1-a -- -L 5900:localhost:5900
# Then connect VNC client to localhost:5900| Requirement | Specification |
|---|---|
| OS | Ubuntu 22.04+ LTS |
| Desktop | XFCE (lightweight, uses X11) |
| Display Server | X11 (NOT Wayland -- xdotool requires X11) |
| Screen Resolution | 1280x800 (16:10 aspect ratio) |
| RAM | 4 GB minimum, 8 GB recommended |
| Disk | 20 GB minimum |
| Network | Internet access for Gemini API + web research |
git clone https://github.com/Louisomeg/General_Computer_Use_Agent-.git
cd General_Computer_Use_Agent-
git checkout design
# Full automated setup (system deps + python + verify)
chmod +x scripts/deploy.sh
./scripts/deploy.sh# 1. System dependencies
sudo apt install -y python3 python3-pip python3-venv scrot xdotool xclip ffmpeg git
sudo apt install -y freecad chromium-browser
# 2. Python environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python -m playwright install chromium
# 3. API key
export GEMINI_API_KEY="your-key"
# 4. Verify
./scripts/deploy.sh --verify- 1280x800 resolution: Google recommends 1440x900 for Computer Use. 1280x800 is the closest available VM resolution with the same 16:10 aspect ratio. Screenshots are resized to 1440x900 before sending -- no distortion.
- XFCE desktop: Lightweight, X11-native, predictable UI layout. The system instruction teaches XFCE-specific navigation.
- X11 display server: Required by
xdotoolfor mouse/keyboard input. Wayland is not supported.
python main.pyAgentic Planner -- type a request or 'quit' to exit
Prefix with --cad to skip research
>>> Create a 50mm tall cylinder with radius 15mm
>>> --cad Make a bracket for an M6 bolt
>>> Research M6 bolt dimensions
>>> quit
# Full pipeline (planner decides if research is needed)
python main.py "Make a bracket for an M6 bolt"
# CAD only -- skip research for fast iteration
python main.py --cad "Make a bracket for an M6 bolt"
# CAD only with explicit dimensions
python main.py --cad --dims hole_diameter=6.6mm wall_thickness=3mm "L-bracket"
# Research only
python main.py "What are the standard dimensions of an M8 hex bolt?"File: agents/cad_agent.py
The CAD agent drives FreeCAD through a combination of GUI interaction (clicking menus, drawing geometry) and Python macro execution (precise programmatic geometry). It sees the FreeCAD window through screenshots, reasons about what to do, and acts.
Capabilities:
- Create 3D parts from descriptions and dimensions
- Run Python macros in FreeCAD for precise geometry
- Draw 2D sketches with geometry and constraints via GUI
- Apply Part Design operations (Pad, Pocket, Hole, Thickness, Fillet, Chamfer)
- Navigate FreeCAD menus and dialogs
- Handle error recovery (undo mistakes, close unexpected dialogs)
Stage Budgets -- prevents the agent from burning all turns on one step:
| Stage | Budget | Description |
|---|---|---|
| setup | 10 turns | Open FreeCAD, create body, enter first sketch |
| base_shape | 25 turns | Draw base profile, constrain, Pad |
| features | 50 turns | Holes, pockets, fillets, chamfers |
| cleanup | 10 turns | Fit view, verify, save |
| reserve | 25 turns | Recovery budget for undo/retry |
Key Design Decisions:
- Menu-driven interaction: Always use menu bar (large text targets) instead of toolbar icons (~24px). This was one of our most impactful design decisions -- dramatically improves click accuracy.
- Macro-first for geometry:
execute_freecad_macro(code)gives exact dimensions. GUI clicking is fallback. - One macro per feature: Never put the entire design in one macro. If line 5 fails, lines 6-20 fail silently.
- Dynamic face finding: Never hardcode
Face6orFace12. Find faces by position:max(shape.Faces, key=lambda f: f.CenterOfMass.z). - Verification gate: Agent must examine the screenshot before
task_complete()is accepted.
File: agents/research_agent.py
The Research agent browses the web using Playwright (Chromium). It uses DuckDuckGo exclusively -- most reliable for automated browsing (no CAPTCHAs, no cookie consent walls).
Capabilities:
- Web search via DuckDuckGo
- Multi-page browsing and data extraction
- Structured data output with confidence scoring and source URLs
- Automatic report generation via Documentation agent
Output Format:
{
"query": "M6 bolt dimensions",
"findings": {
"summary": "M6 bolts have a nominal diameter of 6mm...",
"data_points": [
{"fact": "Clearance hole (medium)", "value": "6.6", "unit": "mm", "source": "https://..."},
{"fact": "Head width across flats", "value": "10", "unit": "mm", "source": "https://..."}
],
"confidence": "high",
"sources": ["https://...", "https://..."]
}
}File: agents/documentation_agent.py
Converts raw research JSON into professionally formatted documents. Automatically called by the Research agent after data collection.
Output Formats:
- Word Document (.docx): Full report with headers, tables, styled text, citations
- PDF Document (.pdf): Matching PDF with tables and formatted sections
Output Location: outputs/research_results/
The Planner (core/agentic_planner.py) orchestrates multi-agent workflows using Gemini 3.1 Pro (text-only) to classify requests and generate plans.
| Request Type | Route | Example |
|---|---|---|
| Exact dimensions provided | cad only |
"Create a 50mm cylinder with 15mm radius" |
| Simple everyday object | cad only |
"Make a box for storing pens" |
| Information lookup | research only |
"What are M8 bolt specifications?" |
| Design needing real-world specs | research+cad |
"Make a bracket for an M6 bolt" |
| Desktop operation | cad only |
"Open FreeCAD and create a new body" |
--cad flag used |
cad only (skip research) |
Any request |
The pipeline includes quality gates between phases:
- Research quality gate: Checks if research produced data points with sufficient confidence. Fails fast if research returned nothing.
- Dimension extraction: Gemini picks ONE concrete value per dimension (not ranges).
- Verification gate: CAD agent must visually verify the result before declaring done.
File: core/agentic_loop.py
The core engine shared by all agents. Implements a multi-turn vision cycle:
+----------------------------------------------+
| Agentic Loop Cycle |
| |
| 1. Capture screenshot (scrot -> PNG bytes) |
| 2. Send to Gemini (screenshot + history) |
| 3. Gemini returns function calls |
| 4. Execute function calls via Executor |
| 5. Append results to conversation history |
| 6. Check stage budgets + turn warnings |
| 7. Repeat until task_complete or max_turns |
+----------------------------------------------+
| Scenario | Recovery |
|---|---|
| 400 INVALID_ARGUMENT | Reset history to initial prompt + fresh screenshot |
| Empty model response | Update screenshot in-place, retry (up to 3x) |
| Text-only response (no actions) | Inject nudge: "You must call a function" |
| Stage over budget | Inject warning to move to next stage |
| Max turns approaching | Inject countdown warnings (5, 4, 3, 2, 1) |
| Max consecutive API errors (5) | Return "api_error" status |
File: core/freecad_functions.py
The macro executor bridges the gap between the agent's reasoning and FreeCAD's Python API. Instead of relying solely on GUI clicking (imprecise for exact dimensions), the agent can run Python code directly in FreeCAD's embedded console.
- Agent calls
execute_freecad_macro(code)with FreeCAD Python code - Code is wrapped in
try/exceptthat writes errors to/tmp/agent_macro_log.txt - Macro is saved to
/tmp/agent_macro.py - FreeCAD window is found dynamically via
xdotool search --name FreeCAD - Window geometry is used to calculate Python console input position
- The run command is pasted via
xclip(fast) or typed viaxdotool(fallback) - Log file is read back to detect success or Python errors
- Errors are returned to the agent so it can self-correct
All user code is wrapped in error capture:
import traceback as _tb
_log = open('/tmp/agent_macro_log.txt', 'w')
try:
# user's FreeCAD code here
_log.write('OK\n')
except Exception as _e:
_log.write(f'ERROR: {_e}\n')
_log.write(_tb.format_exc())
finally:
_log.close()Possible return values:
{"success": true}-- macro ran without errors{"error": "FreeCAD macro error: ..."}-- Python exception with traceback{"success": false, "warning": "Macro produced no output..."}-- console wasn't focused
The macro tool description includes correct API examples so the model generates valid code:
# Find faces by position (NEVER hardcode Face6, Face12)
top_face = max(body.Shape.Faces, key=lambda f: f.CenterOfMass.z)
face_idx = body.Shape.Faces.index(top_face) + 1
# Clearance hole (circle + Pocket ThroughAll -- more reliable than PartDesign::Hole)
hole_sk = body.newObject('Sketcher::SketchObject', 'HoleSketch')
hole_sk.AttachmentSupport = [(body.Tip, f'Face{face_idx}')]
hole_sk.MapMode = 'FlatFace'
hole_sk.addGeometry(Part.Circle(FreeCAD.Vector(cx,cy,0), FreeCAD.Vector(0,0,1), 3.3))
doc.recompute()
hole_cut = body.newObject('PartDesign::Pocket', 'HoleCut')
hole_cut.Profile = hole_sk
hole_cut.Type = 1 # ThroughAll
doc.recompute()File: core/agentic_planner.py
The Planner does more than routing -- it generates detailed FreeCAD workflows using Gemini 3.1 Pro. This is critical because the vision model works best with clear, step-by-step instructions.
Instead of hardcoded shape templates, the Planner uses Gemini 3.1 Pro to generate workflows dynamically:
Input: "Make a bracket for an M6 bolt"
+ Research data: {hole_diameter: 6.6mm, wall_thickness: 3mm}
Output: Step 1: Create sketch on XY plane, 30x20mm rectangle, Pad 30mm
Step 2: Create sketch on top face, 27x20mm rectangle offset 3mm, Pocket 27mm
Step 3: Find top face of horizontal leg, circle r=3.3mm, Pocket ThroughAll
Step 4: Find outer face of vertical leg, circle r=3.3mm, Pocket ThroughAll
Step 5: Fillet inner corner 1mm radius
Research data comes with inconsistent key names. The Planner normalizes them:
PARAM_ALIASES = {
"length": "depth",
"total_width": "width",
"leg_thickness": "wall_thickness",
"bolt_hole_diameter": "hole_diameter",
"clearance_hole": "hole_diameter",
...
}The Planner tells Gemini 3.1 Pro exactly what FreeCAD operations exist, so it generates valid workflows:
Pad, Pocket, Hole, Fillet, Chamfer, Thickness, Mirrored,
Sketcher: Rectangle, Circle, Line, Arc, Point,
Constraints: Coincident, Horizontal, Vertical, DistanceX, DistanceY
Directory: pipeline/
Converts YouTube FreeCAD tutorial videos into structured YAML skill files. Currently disabled in the CAD agent (minimal-instruction approach works better with Flash), but the infrastructure is ready for stronger models.
YouTube URL -> Download (yt-dlp) -> Transcribe (VTT/Whisper) -> Keyframes (OpenCV MOG2)
-> Label (Gemini Vision) -> Filter (quality scoring) -> Build (YAML + PNG)
# Full pipeline
python -m pipeline.run_pipeline --url "https://www.youtube.com/watch?v=VIDEO_ID"
# Specific stages only
python -m pipeline.run_pipeline --url "VIDEO_ID" --stages keyframes,label,filter,build| Parameter | Default | Description |
|---|---|---|
threshold |
15000 | MOG2 keyframe detection sensitivity |
min_score |
3 | Minimum quality score (0-5) for filtering |
api_delay |
1.0s | Delay between Gemini API calls |
| Video format | h264, <=720p | AV1 codec not supported on most VMs |
Gemini Computer Use outputs coordinates on a normalized 0-1000 grid:
screen_x = int(normalized_x / 1000 * SCREEN_WIDTH) # 500/1000 * 1280 = 640
screen_y = int(normalized_y / 1000 * SCREEN_HEIGHT) # 500/1000 * 800 = 400
| Function | Description |
|---|---|
click_at(x, y) |
Left-click at normalized coordinates |
type_text_at(x, y, text) |
Click field, optionally clear, type text |
key_combination(keys) |
Key combo (e.g., "ctrl+z", "g+r" for rectangle) |
execute_freecad_macro(code) |
Run Python in FreeCAD console |
right_click_at(x, y) |
Right-click for context menus |
double_click_at(x, y) |
Double-click to open/select |
scroll_at(x, y, direction) |
Scroll at position |
drag_and_drop(...) |
Drag between two points |
task_complete(summary) |
Signal task completion |
General_Computer_Use_Agent-/
|
+-- main.py # Entry point (CLI + interactive, --cad flag)
+-- requirements.txt # Python dependencies
+-- scripts/
| +-- deploy.sh # Automated deployment (local + GCP VM)
|
+-- agents/ # Agent implementations
| +-- registry.py # @register decorator + get_agent() factory
| +-- cad_agent.py # FreeCAD CAD agent (GUI + macros)
| +-- research_agent.py # Web research agent (DuckDuckGo + Playwright)
| +-- documentation_agent.py # Word + PDF report generator
|
+-- core/ # Shared infrastructure
| +-- agentic_loop.py # Multi-turn vision loop (stage budgets, verification)
| +-- agentic_planner.py # Task router + LLM workflow generation
| +-- executor.py # Abstract Executor base class
| +-- desktop_executor.py # xdotool desktop executor
| +-- browser_executor.py # Playwright browser executor
| +-- screenshot.py # scrot capture + PIL resize
| +-- settings.py # Global config + system instruction
| +-- models.py # Task, TaskStatus data models
| +-- custom_tools.py # FreeCAD macro FunctionDeclaration
| +-- freecad_functions.py # Macro engine + xdotool wrappers
| +-- skill_retrieval.py # Demo skill matching (disabled)
|
+-- pipeline/ # YouTube -> Skill learning pipeline
| +-- run_pipeline.py # Pipeline orchestrator
| +-- crawl.py # yt-dlp download
| +-- transcribe.py # Subtitle extraction
| +-- extract_keyframes.py # OpenCV keyframe detection
| +-- label_actions.py # Gemini action labeling
| +-- filter_quality.py # Quality scoring
| +-- build_skills.py # YAML skill assembly
|
+-- skills/ # FreeCAD knowledge base (reference)
+-- outputs/ # Generated outputs
+-- research_results/ # Research JSON + Word + PDF reports
+-- cad_exports/ # Exported CAD files
| Setting | Default | Description |
|---|---|---|
SCREEN_WIDTH |
1280 | VM screen width in pixels |
SCREEN_HEIGHT |
800 | VM screen height in pixels |
MODEL_SCREEN_WIDTH |
1440 | Screenshot width sent to Gemini |
MODEL_SCREEN_HEIGHT |
900 | Screenshot height sent to Gemini |
ACTION_DELAY |
0.5s | Pause after each action |
TYPING_DELAY |
30ms | Delay between keystrokes (xdotool) |
CLICK_DELAY |
0.3s | Pause after mouse clicks |
DEFAULT_MODEL |
gemini-3-flash-preview |
Computer Use model |
PLANNING_MODEL |
gemini-3.1-pro-preview |
Text-only planning model |
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
Yes | Google Gemini API key (get one here) |
DISPLAY |
Yes (auto) | X11 display (usually :0, set automatically) |
- Research agent is reliable -- DuckDuckGo avoids CAPTCHAs, structured data extraction works consistently
- Documentation agent produces professional reports every time
- Simple CAD shapes (cubes, cylinders, basic pads) succeed ~80% of the time
- Menu-driven interaction dramatically outperforms toolbar clicking
- Macro execution gives precise dimensions when the console is focused correctly
- Stage budgets prevent the agent from wasting all turns on one step
- Two-model architecture works well -- use the smartest model (3.1 Pro) for planning/reasoning, and the Computer Use model (Flash) for execution
-
FreeCAD Python API knowledge: The vision model generates macros with wrong property names, wrong face references, and incorrect constraint syntax. The Planner mitigates this by providing correct API examples, but the model still hallucinates.
-
Face selection after topology changes: After a Pocket or Pad, face indices change. The model guesses
Face6orFace12instead of using the dynamic face-finding pattern we provide. -
Silent macro failures: Even with error capture, some failures are subtle -- a sketch might be created but with wrong constraints, or a Pad might succeed but with wrong dimensions.
-
Long sessions (80+ turns): The vision model's accuracy degrades as the context window fills with screenshots.
-
Rate limits: Heavy Computer Use sessions (20-80+ turns) can exhaust free tier quotas. Typically resets within 30-60 minutes.
- Macro + GUI hybrid: Best results come from using macros for precise geometry and GUI clicking for navigation/menus. Neither approach alone is sufficient.
- One feature per macro: Critical lesson. A single macro creating sketch + pad + pocket + hole will silently fail at the first error. One feature per call with screenshot verification between each.
- Thickness over Pocket for hollowing: For hollow shapes, the Thickness tool (~24 turns) massively outperforms Pocket workflows (65+ turns, low success rate).
The architecture is ready. When a reasoning-class model supports Computer Use:
- Complex parts in one session: L-brackets with holes, gears, multi-body assemblies
- Self-correcting macros: Model reads error output, fixes API calls, retries
- Spatial reasoning: Understands which face is "top" after a pocket changes geometry
- 150+ turn sessions: Maintains precision and coherence across long designs
- Reduced Planner dependency: Model can figure out the workflow itself
- KiCad for PCB design
- Blender for 3D modeling and rendering
- LibreOffice for document editing
- General desktop automation beyond engineering
- Web interface for remote agent control and monitoring
- Better parallelism -- running multiple agents simultaneously
- More robust error recovery with learning from failure patterns
- Leveraging the Skill Learning Pipeline with stronger models
- Multi-part assemblies using FreeCAD's Assembly workbench
"ERROR: Set GEMINI_API_KEY first!"
export GEMINI_API_KEY="your-key""FreeCAD window not found" (macro execution)
- FreeCAD must be running and visible
- Check:
xdotool search --name FreeCADshould return a window ID
"Macro produced no output"
- The Python console input wasn't focused. The agent should retry.
- Verify Python console is visible: View -> Panels -> Python console in FreeCAD
- Install xclip for more reliable paste:
sudo apt install xclip
FreeCAD "Document Recovery" dialog
rm -rf ~/.local/share/FreeCAD/recovery/*
rm -rf ~/.FreeCAD/recovery/*Research agent fails
- Should not happen with DuckDuckGo. Agent auto-retries on the same search engine.
400 INVALID_ARGUMENT from Gemini API
- Handled automatically by history reset in the agentic loop
Rate limit errors (429)
- Free tier quotas reset within 30-60 minutes
- Use
--cadflag to skip research and reduce API calls
CAD agent hits max_turns without finishing
- Complex designs may need more turns
- Increase
max_turnsinagents/cad_agent.py(default: 120) - Use
--cadwith explicit--dimsfor faster iteration
- Louis -- , desktop executor, CAD agent, , macro engine
- Emmanuel -- Research agent, browser executor, documentation agent, parallel research
- George -- Core framework, agentic loop, planner
This project is part of the Gemini API Developer Competition 2025. See repository for license details.