An LLM generates a Textual app - and run_test() decides whether it counts #6576

in5devilinspace · 2026-06-12T19:43:27Z

in5devilinspace
Jun 12, 2026

I built a small tool with an unusual relationship to Textual: it studies this repo, then has an LLM generate an original Textual app — and the only reason I trust the output at all is run_test().

The pipeline: shallow-clone a TUI repo → detect the framework with plain heuristics (extension counts + import grep, deliberately no AI in that step) → feed the README plus a few of the most framework-dense source files to Claude → ask for a small original app → then drive whatever comes back through the test pilot, headless:

async with app_cls().run_test() as pilot:
    await pilot.pause()

If it doesn't mount and respond, the run fails. No green-checkmark theater. Two of the generated apps are committed verbatim as examples — a "Pixel Pond" (drop pebbles, feed the fish, day/night toggle) and a prime-sieve visualizer: https://github.com/in5devilinspace/tui-master-agent (demo GIF of one uncut run in the README).

Honestly, the testing API is the unsung hero here. Generation is easy to demo and hard to trust; run_test() is what turns "the model wrote something plausible" into "this app actually runs." Thank you for building it.

One question for folks deeper in Textual's internals: what failure modes will a bare run_test() + pilot.pause() miss? I'm thinking timers, workers, async teardown — things where the app mounts fine but would misbehave seconds later. I'd like the verification gate to be as honest as possible, and I'd rather steal an established pattern than invent a flaky one.

(The story behind the project — it sat dead as a spec for 4.5 months before a deadline forced a scope-down — is written up here: https://dev.to/matt_b650aa89776af88513ae/i-scoped-a-multi-agent-tui-system-in-january-it-sat-dead-for-4-months-here-is-the-comeback-jp8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An LLM generates a Textual app - and run_test() decides whether it counts #6576

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

An LLM generates a Textual app - and run_test() decides whether it counts #6576

Uh oh!

in5devilinspace Jun 12, 2026

Replies: 0 comments

in5devilinspace
Jun 12, 2026