An LLM generates a Textual app - and run_test() decides whether it counts #6576
in5devilinspace
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I built a small tool with an unusual relationship to Textual: it studies this repo, then has an LLM generate an original Textual app — and the only reason I trust the output at all is
run_test().The pipeline: shallow-clone a TUI repo → detect the framework with plain heuristics (extension counts + import grep, deliberately no AI in that step) → feed the README plus a few of the most framework-dense source files to Claude → ask for a small original app → then drive whatever comes back through the test pilot, headless:
If it doesn't mount and respond, the run fails. No green-checkmark theater. Two of the generated apps are committed verbatim as examples — a "Pixel Pond" (drop pebbles, feed the fish, day/night toggle) and a prime-sieve visualizer: https://github.com/in5devilinspace/tui-master-agent (demo GIF of one uncut run in the README).
Honestly, the testing API is the unsung hero here. Generation is easy to demo and hard to trust;
run_test()is what turns "the model wrote something plausible" into "this app actually runs." Thank you for building it.One question for folks deeper in Textual's internals: what failure modes will a bare
run_test()+pilot.pause()miss? I'm thinking timers, workers, async teardown — things where the app mounts fine but would misbehave seconds later. I'd like the verification gate to be as honest as possible, and I'd rather steal an established pattern than invent a flaky one.(The story behind the project — it sat dead as a spec for 4.5 months before a deadline forced a scope-down — is written up here: https://dev.to/matt_b650aa89776af88513ae/i-scoped-a-multi-agent-tui-system-in-january-it-sat-dead-for-4-months-here-is-the-comeback-jp8)
Beta Was this translation helpful? Give feedback.
All reactions