Autonomous Bare-Metal OS Agent | π Code β π₯οΈ VM Interact β ποΈ Inspect β π¨ Harden
MaragingLoop is a ReAct-style AI agent engineered for autonomous bare-metal OS and kernel development. By orchestrating large language models with automated compilation, robust VirtualBox VM lifecycle control, and vision-based screenshot inspection, it iteratively writes, builds, tests, and refines low-level C code until the target system boots and behaves as specified.
Unlike text-only coding agents, MaragingLoop treats code generation and VM interaction as equally critical pillars of its development cycle. The LLM doesnβt just write code in isolationβit actively modifies kernel source, compiles it, boots a live virtual machine, injects I/O, captures visual feedback, and then rewrites the code based on real-world execution results. This closed-loop process bridges abstract reasoning with bare-metal reality.
Highly Experimental !!!
You can see some images here -> https://gistnoesis.github.io/MaragingLoop/ and in the docs folder
This project is LOCAL first, -> performance lags behind frontier models, and NO Consideration for CREDENTIALS were tested (DISCLAIMER : Use at your own risk for example if you put credentials in and they get leak, that's not my responsibilty).
Everything was created Locally through prompting. Development took 10 days. I've constructed them through interacting with the llama webui. And the kernels, either via the chat interface or via a single initial prompt starting from a current os folder state, (higher towers in full autonomy like in the shoggoth.db project have not been included here to keep the project minimal)
The single builderagent.py has been kept below 1000 lines (currently ~900 lines), you should read it make sure it correspond to what you want to run (like editing the system_prompt and summary_prompt, and compaction threshold).
The model used is unsloth/Qwen3.6-35B-A3B-GGUF/Qwen3.6-35B-A3B-UD-IQ4_NL_XL.gguf with llama.cpp
./llama-server --model unsloth/Qwen3.6-35B-A3B-GGUF/Qwen3.6-35B-A3B-UD-IQ4_NL_XL.gguf --mmproj unsloth/Qwen3.6-35B-A3B-GGUF/mmproj-BF16.gguf --image-min-tokens 1024 --image-max-tokens 1024 --temp 1.0 --top-p 0.95 --min-p 0.00 --jinja --top-k 20 --ctx-size 150000 --reasoning_budget 3000 --presence_penalty 1.5 -ctk f16 -ctv f16 --host 0.0.0.0 --port 8080 --no-mmap
The options are not very rigid, but keeping a small reasoning_budget help. the host 0.0.0.0 is because the LLM is on an other machine of the network (it allows anyone who can connect to your machine to use your llama server so set this option according to your needs), the ctx-size is kept small for speed and memory issues. If you need up to 250000 ctx-size you should offload the vision stack to the cpu, or you will encounter out of memory crashes. The image number of tokens can be adapted to trade speed for quality.
With a 4090 this produces ~150 tokens/s (starting at 180tok/s and slowing down to 110tok/s )
You'll need to use virtualbox and create a virtual machine called "agentos" and select the os/os.iso that will be generated by the first compilation and put this on the virtual optical drive, and configure the boot order.
The installation scripts are only necessary if you want more isolation of the LLM through Access Control List, the scripts were also vibecoded from the permission.md files I (the human) write. I have tested them and they worked (aka with the permissions set by the script the agent tools where functional, including the VM could create sreenshots, inspect_screenshots, send keyboard commands, ...) on my machine but in this process I had to regenerate them a lot of times so read permissions.md and use at your own risk, they were designed to use from the directory where they sit without having to specify arguments. I have put them in the repo for easier onboarding, but because permissions are highly dependent on your current setup they are for advanced users only and to help your trusted LLM agents to write a working script for your specific machine.
What doesn't work yet : I couldn't get the mouse pointer to work, nor write a correct network driver detect some network packet coming from the host configured as NAT with port forwarding while vibecoding even using reference PCFastIII driver from linux (looks like it would almost work, but that was at the beginning of the development phase). (The network stack need extra-tools for builderagent.py)
The project is a boiler plate, you can easily add your own tools by asking your local llama. Give it the single builderagent.py and describe the new tool you want, then either copy paste in place, or regenerate the full content.
I have used virtual-box but you can ask your local llama to generate a qemu version instead. In particular with virtual-box the mouse-pointer seems to require some guest-additions, (currently couldn't get the mouse doesn't work for baremetal os).
For examples you can try adding a copy_file_from_reference or adding a sqlite database memory like is done in my other project : https://github.com/GistNoesis/Shoggoth.db/
This agent control loop with vision like when it plays snake, is quite similar to my other project (Not yet on github) about driving a Robot with a LLM to get towards an object (Works but it is too slow for real-time robotics natively but can be used to generate some initial dataset to train specific neural networks.
I have tried to respect good engineering practice but I'm only human. It was designed with Security in mind. This is a harness with strong constraints, contrary to the other agents who abandonned all control hope to the whims of the LLM (what we used to call polymorphic malware) . Here you are still in control. LLM can't run arbitrary command, except inside the virtualbox. On your host permissions are severely restricted to writing to specific folder .c or .h files, and calling specific commands. But information on any files on your host can be trivially read by the LLM with simple tricks. I've also proposed a solution using Access Control Lists.
Like in the Shoggoth.db project, the main idea is setting up the foundation of a self-building tower. Make sure the keep control of the foundation, and push the safety problem to the big binary blob that are the model weights but they are generated deterministically from a dataset, which is also self-distilled from interaction with a VM, and known wiki.
"reflections on trusting trust"
Remember it's all AI slop.
The agent operates in a continuous think β code β interact β observe β refine cycle. Code manipulation and VM interaction are the twin engines that drive iteration forward.
- Reason: The LLM plans the next step using the
thinkingtool. - π Code Generation & Modification: The agent writes or updates
.c/.hfiles in theos/directory, implements kernel functions, adjusts linker scripts, or modifies GRUB configs based on the current goal and prior feedback. - π¨ Build & Compile: Automated toolchain invocation (
gcc -m32,ld,grub-mkrescue) transforms source into a bootableos.iso. - π₯οΈ VM Interaction (Critical Step): The agent boots a headless VirtualBox VM, waits for the bootloader/kernel to load, and injects precise keyboard/mouse input. This simulates real hardware boundaries and exposes runtime behavior that static analysis cannot reveal.
- ποΈ Vision Inspection: A screenshot is captured, base64-encoded, and fed directly into the LLMβs context for visual analysis (boot logs, VGA output, kernel panics, GRUB errors, or unexpected behavior).
- Refine: Based on vision feedback, the agent iteratesβrewriting code, fixing compilation errors, adjusting I/O sequences, or recovering VM statesβuntil the kernel meets the specified requirements.
| Category | Capabilities |
|---|---|
| π Code Generation & Modification | Directly creates/edits .c/.h files, manages references, and writes kernel entry points. The LLM shapes the actual OS logic through precise file I/O tools. |
| π₯οΈ Critical VM Interaction | Full VirtualBox lifecycle: headless boot, ACPI power cycling, precise mouse/keyboard injection, and robust state recovery (locked, crashed, running). |
| Vision-Driven Feedback | Captures VM screens, encodes to base64, and feeds them to the LLM for real-time visual debugging and behavior verification. |
| Bare-Metal Build Pipeline | Automated 32-bit cross-compilation: gcc -m32, ld -m elf_i386, as --32, and grub-mkrescue for ISO generation. |
| Context Management | Auto-summarization when token/message limits are reached. Prevents context overflow while preserving session state. |
| Graceful Interruption | Custom SIGINT handler finishes the current VM/build step before halting. Safe for long-running iterations. |
| Dual Usage Modes | Single-query mode for automation, or interactive chat mode for step-by-step guidance. |
- Python 3.9+
llama.cppserver running locally:
llama-server --model <path-to-model> --host 0.0.0.0 --port 8080- VirtualBox with
VBoxManageinPATH - 32-bit Cross-Compilation Toolchain:
gcc-multilib/gcc -m32binutils(forld,as)grub-mkrescue(fromgrub2-commonorgrub-pc-bin)
requestsPython package
# 1. Clone the repository
git clone https://github.com/<your-username>/MaragingLoop.git
cd MaragingLoop
# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# 3. Install dependencies
pip install requestsUpdate the server endpoint in builderagent.py or use .env file (with no spaces) if needed:
COMPLETION_API_URL="http://localhost:8080/v1/chat/completions" # Default llama.cpp# Run a single task
python builderagent.py "Make the kernel print 'Hello Maraging' to the VGA console and halt."
# Enter interactive chat mode
python builderagent.py chatEnvironment Variables (Optional):
| Variable | Default | Description |
|---|---|---|
COMPLETION_API_URL |
http://localhost:8080/v1/chat/completions |
LLM server endpoint |
VM_NAME |
agentos |
VirtualBox VM name to control |
MAX_ITERATIONS |
3000 |
Safety limit for the ReAct loop |
| Group | Tools |
|---|---|
| π Code & Build | write_file, read_file, read_reference_file, compile_kernel, compile_kernel_files, write_kernel |
| π₯οΈ VM Interaction & Lifecycle | start_vm, stop_vm, set_mouse_position, send_keyboard_input, launch_current_vm |
| Vision Feedback | take_screenshot, take_screenshot_and_inspect, inspect_snapshot |
| Agent Flow | thinking, finish, calculator |
The name MaragingLoop draws from maraging steelβan ultra-high-strength alloy that hardens through controlled precipitation aging. Just as maraging steel gains resilience through repeated thermal and mechanical stress, MaragingLoop hardens bare-metal code through an iterative dual-process: π active code generation and π₯οΈ live VM interaction.
Code is where the LLM shapes the kernelβs logic, memory layout, and hardware interfaces. VM interaction is where that logic is stress-tested against real execution boundaries, bootloader behavior, and VGA output. Neither step is optional: the agent writes code to solve the problem, boots it to see reality, reads the screen to understand what broke, and rewrites the code to fix it. Each cycle precipitates stability, transforming fragile prototypes into production-ready bare-metal systems.
This project is released under the MIT License. Contributions, bug reports, and feature requests are welcome. Please follow standard fork/pull-request workflows.
- llama.cpp for the high-performance local LLM server
- Oracle VM VirtualBox for robust VM automation and
VBoxManageCLI - OpenAI Function Calling specification for the native tool format
- The bare-metal OS community for relentless inspiration
Built for developers who want AI to actually touch metal, boot it, and fix what breaks. π§π₯οΈπ»