How well can LLMs replicate UIs from screenshots? This project tests how well different LLMs can recreate application interfaces from screenshots.
View all examples in the full gallery
Each model received the same simple prompt along with a reference screenshot:
Generate this {APPLICATION_NAME} interface in HTML, CSS and JavaScript.
Return a single HTML file with embedded CSS and JavaScript.
Here are some of the most impressive outputs:
This one impressed me with how much actually works. The text formatting buttons are functional, the zoom slider works, and you can change paragraph alignment. That said, it's missing the word count feature that Opus and Gemini both implemented.
This looks almost identical to the real Jira interface. What really stands out is that you can actually drag cards between columns, and there's a nice smooth animation when you do. The card count at the top of each column even updates when you move things around.
If it wasn't for the missing album artwork images, I would have been fooled into thinking this was the real Spotify interface. The visual fidelity is remarkably close to the original. Though it's worth noting there aren't any actual functionality or interactive features beyond the appearance.
This is arguably the most impressive of all the interfaces. Gemini even implemented the Google Sheets logo as an SVG! You can navigate around the different cells, and it correctly updates the cell reference (like "A1", "B2") when you select a cell. You can type both directly in cells and in the formula bar at the top. The only thing missing is that none of the formatting buttons actually work.
This entire project was created using Claude Code. You can check out initial_prompt.txt to see the original prompt that kicked everything off, along with the resulting create_interface.py for calling the OpenRouter API and capture_screenshots.py for taking screenshots of the generated HTML pages.
After the initial creation, a few follow-up prompts were needed. Some of the Gemini outputs were getting truncated because OpenRouter was setting a max token limit on the API calls, so those had to be regenerated. There was also a round of regenerating screenshots that were captured at awkward aspect ratios that made them difficult to compare.
- Claude Sonnet 4.5
- Claude Opus 4.5
- GPT-5.1
- GPT-5.1 Codex
- Gemini 2.5 Pro
- Gemini 3 Pro Preview
- Grok 4 Fast
- Qwen3 VL 235B
- Microsoft Word
- Jira
- Spotify
- VS Code
- Google Sheets
Check out the full gallery to see outputs from all models across all applications.



