UI Loop Test

A standalone UI cloning agent that converts screenshots into pixel-accurate HTML/CSS using a DesignCoder-inspired 3-phase pipeline. Feed it a screenshot, get back a self-contained HTML page that looks like the original.

Quick Start

# Install dependencies (using uv)
uv sync

# Install Playwright browsers (needed for rendering)
uv run playwright install chromium

# Set your API key
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY

# Run the full pipeline
uv run python main.py ui-inspo/ui-booking-confirmation.jpg --name booking

# Or use VLLM single-shot codegen for a higher fidelity baseline
uv run python main.py ui-inspo/ui-booking-confirmation.jpg --name booking \
  --codegen-model qwen/qwen-2.5-vl-72b-instruct

How It Works

The pipeline has three phases. Each phase feeds into the next, and the whole thing runs as an async loop.

Phase 1: UI Grouping

Analyzes the screenshot to understand its structure before generating any code.

Step	Module	What it does
1.0	`element_detection.py`	Detect all UI elements (buttons, text, images, etc.) with bounding boxes
1.1	`division.py`	Partition the page into 3-10 semantic regions (navigation, hero, content-grid, etc.) using element positions to inform boundaries
1.2	`semantic.py`	Label each element with type, content, and interactability via a label-only path that preserves Phase 1.0 bounding boxes
1.3	`grouping.py`	Build a hierarchical component tree where parent-child relationships reflect visual containment

Key details:

Element detection runs first so region boundaries never split an element in half
Regions use 0-1000 normalized coordinates from the model, rescaled to actual pixels
Post-tightening overlap resolution handles cases where regions end up nested after bbox recomputation
All coordinates are saved to artifacts/ as overlay images for debugging

Phase 2: Code Generation

Turns the component tree into HTML/CSS. Two modes:

Tree-based (default): Generates HTML per-region by walking the component tree. Each element gets a data-elem-id attribute and semantic HTML tag. Layout CSS (flex direction, gap, padding) is computed from bounding box geometry. Region roots are absolutely positioned within a page container.

VLLM single-shot (--codegen-model): Sends the full screenshot to a strong vision-language model in one call to generate the complete page. The output is sanitized (external URLs stripped, images replaced with placeholders) and wrapped in proper document structure. This typically produces a higher-fidelity baseline.

Both modes produce plain CSS only, no Tailwind.

Phase 3: Self-Correcting Refinement

Iteratively improves the generated HTML by comparing it against the reference screenshot.

Each iteration:

Render the HTML via Playwright, screenshot it at reference dimensions
Compute per-region SSIM between reference and rendered
Element closeup comparison: for each Phase 1 element bbox, crop both reference and rendered at the same coordinates, compute per-element SSIM
Send the worst-scoring element closeups to VLLM for qualitative diff analysis
Targeted repair: fix specific components using the issue list + closeup crops as visual context
If SSIM drops, revert to best and stop; if SSIM plateaus, stop; if SSIM exceeds threshold, converge

Artifacts saved each iteration:

iter_N.png — rendered screenshot
iter_N_diff.png — per-region diff overlay
artifacts/iter_N_closeups/ — ref and gen crops for each element
artifacts/iter_N_element_comparison.json — full per-element SSIM log

Usage

# Full pipeline
uv run python main.py ui-inspo/sample.png --name my-component

# Specific phase
uv run python main.py ui-inspo/sample.png --phase 1   # grouping only
uv run python main.py ui-inspo/sample.png --phase 2   # codegen only
uv run python main.py ui-inspo/sample.png --phase 3   # refinement only

# VLLM codegen (single-shot from screenshot)
uv run python main.py ui-inspo/sample.png --codegen-model qwen/qwen-2.5-vl-72b-instruct

# Provider selection
uv run python main.py ui-inspo/sample.png --provider openrouter          # default
uv run python main.py ui-inspo/sample.png --provider fireworks
uv run python main.py ui-inspo/sample.png --vision-provider openrouter   # different model for vision

# Tuning
uv run python main.py ui-inspo/sample.png --max-iter 16 --output-dir ./my-output

# Run tests
uv run python -m pytest tests/ -v --tb=short

Output Structure

output/{component_id}/
├── reference.png                          # Original screenshot
├── component.json                         # Full state (regions, tree, iterations, metrics)
├── pipeline.log                           # Detailed pipeline log
├── index.html                             # Generated HTML
├── region_*.png                           # Cropped region images from Phase 1
├── iter_*.png                             # Rendered screenshots per iteration
├── iter_*_diff.png                        # Per-region diff overlays
└── artifacts/
    ├── phase1_detection_overlay.png       # Element bboxes drawn on reference
    ├── phase1_segmentation_overlay.png    # Region bboxes with element outlines
    ├── phase1_drift_overlay.png           # Phase 1.0 vs 1.2 bbox drift
    ├── phase1_tree.json                   # Component tree structure
    ├── phase1_elements.json               # Per-region element data
    ├── phase2_colors.json                 # Extracted color palette
    ├── phase3_metrics.json                # SSIM/TreeBLEU/etc per iteration
    ├── iter_N_closeups/                   # Per-element ref+gen crops
    │   ├── elem_0_ref.png
    │   ├── elem_0_gen.png
    │   └── ...
    └── iter_N_element_comparison.json     # Per-element SSIM log

Project Structure

├── main.py                     # CLI entry point
├── config.py                   # Configuration (providers, thresholds, codegen model)
├── loop.py                     # Main orchestration loop (Phase 1 → 2 → 3)
├── llm_client.py               # OpenAI-compatible HTTP client (OpenRouter/Fireworks)
├── log.py                      # Logging setup
│
├── phases/
│   ├── phase1_grouping/
│   │   ├── element_detection.py    # Pre-step: detect all UI elements with bboxes
│   │   ├── division.py             # 1.1: Partition into semantic regions
│   │   ├── semantic.py             # 1.2: Label elements (label-only path)
│   │   └── grouping.py            # 1.3: Build component tree
│   │
│   ├── phase2_codegen/
│   │   ├── html_gen.py            # 2.1: Tree-to-HTML + VLLM fullpage codegen
│   │   └── style_gen.py           # 2.2: Bbox-computed CSS + document assembly
│   │
│   └── phase3_refinement/
│       ├── element_comparator.py  # Per-element closeup SSIM + VLLM analysis
│       ├── repair.py              # Targeted component repair with closeup context
│       ├── matcher.py             # (legacy) DOM-to-tree matching
│       └── comparator.py          # (legacy) Per-component visual comparison
│
├── storage/
│   └── component.py               # Data models (Component, Region, Element, Tree, etc.)
│
├── utils/
│   ├── image.py                   # SSIM, color extraction, cropping, base64
│   ├── dom.py                     # Playwright rendering, DOM tree extraction
│   └── metrics.py                 # TreeBLEU, Container Match, Tree Edit Distance
│
├── tests/                         # 182 tests
│   ├── test_phase1.py
│   ├── test_phase2.py
│   ├── test_phase3.py
│   ├── test_metrics.py
│   ├── test_storage.py
│   └── test_utils.py
│
├── scripts/
│   ├── debug_phase1.py            # Standalone Phase 1 debugging
│   └── debug_phase2.py            # Standalone Phase 2 debugging
│
├── ui-inspo/                      # Input screenshots
├── output/                        # Generated output
├── requirements.txt
└── .env                           # API keys (not committed)

Configuration

Create a .env file (or copy .env.example):

OPENROUTER_API_KEY=sk-or-v1-...
FIREWORKS_API_KEY=fw-...          # optional, only if using fireworks provider

Key config parameters (in config.py):

Parameter	Default	Description
`codegen_model`	`None`	VLLM model for single-shot codegen (e.g. `qwen/qwen-2.5-vl-72b-instruct`)
`max_iterations`	32	Max Phase 3 refinement iterations
`ssim_threshold`	0.88	Target SSIM to converge
`per_component_threshold`	0.90	Per-element SSIM threshold for flagging issues
`target_regions_min` / `max`	3 / 10	Region count bounds for Phase 1
`plateau_patience`	2	Iterations of no improvement before stopping

Evaluation Metrics

Tracked per iteration:

Metric	Range	Description
SSIM	0-1	Per-region structural similarity (mean across regions)
Per-element SSIM	0-1	Closeup SSIM at each detected element bbox
TreeBLEU	0-1	Proportion of matching height-1 subtrees
Container Match	0-1	Percentage of containers with structurally equivalent matches
Tree Edit Distance	0+	Min operations to transform generated tree to reference

Dependencies

aiohttp          # Async HTTP for API calls
playwright       # Browser rendering + DOM extraction
pillow           # Image processing
scikit-image     # SSIM computation
scikit-learn     # K-means color extraction
numpy            # Array ops
zss              # Zhang-Shasha tree edit distance
beautifulsoup4   # HTML parsing for targeted repair
python-dotenv    # .env file loading

Design Decisions

No SDK dependencies for LLM calls — pure HTTP to OpenRouter/Fireworks for flexibility
Plain CSS only — no Tailwind utility classes, ever
Element detection first — regions are informed by where elements actually are, not guessed from visual appearance
Label-only Phase 1.2 — bounding boxes from Phase 1.0 are preserved through semantic labeling, preventing coordinate drift
Per-element closeup comparison — catches issues that whole-page SSIM averages away (a 10px button misalignment barely moves global SSIM)
Revert-if-worse — if a repair iteration drops SSIM, the best HTML is restored and refinement stops
Dual/triple provider support — different models for code gen, vision analysis, and VLLM codegen

Known Limitations

Works from raw screenshots only (no Figma metadata like original DesignCoder)
Bounding boxes give approximate sizes, not pixel-perfect measurements
Static HTML only (no JavaScript for interactive behaviors)
TreeBLEU and Container Match metrics currently report 0.0 (known issue, see FUTURE-005 in TODO.md)
Vision model quality directly impacts Phase 1 accuracy

References

DesignCoder (Chen et al., 2025) — Hierarchy-aware UI code generation
VIGA — Write-run-render-compare-revise loop pattern
abi/screenshot-to-code — Single-pass baseline comparison

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
agent		agent
legacy		legacy
memory		memory
phases		phases
scripts		scripts
src		src
static		static
storage		storage
tests		tests
ui-inspo		ui-inspo
utils		utils
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md
TODO.md		TODO.md
__init__.py		__init__.py
config.py		config.py
designcoder-implementation-handoff.md		designcoder-implementation-handoff.md
llm_client.py		llm_client.py
log.py		log.py
loop.py		loop.py
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
run_iteration.py		run_iteration.py
start.sh		start.sh
svelte.config.js		svelte.config.js
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UI Loop Test

Quick Start

How It Works

Phase 1: UI Grouping

Phase 2: Code Generation

Phase 3: Self-Correcting Refinement

Usage

Output Structure

Project Structure

Configuration

Evaluation Metrics

Dependencies

Design Decisions

Known Limitations

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UI Loop Test

Quick Start

How It Works

Phase 1: UI Grouping

Phase 2: Code Generation

Phase 3: Self-Correcting Refinement

Usage

Output Structure

Project Structure

Configuration

Evaluation Metrics

Dependencies

Design Decisions

Known Limitations

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages