MaragingLoop

Autonomous Bare-Metal OS Agent | 📝 Code → 🖥️ VM Interact → 👁️ Inspect → 🔨 Harden

🔍 Overview

MaragingLoop is a ReAct-style AI agent engineered for autonomous bare-metal OS and kernel development. By orchestrating large language models with automated compilation, robust VirtualBox VM lifecycle control, and vision-based screenshot inspection, it iteratively writes, builds, tests, and refines low-level C code until the target system boots and behaves as specified.

Unlike text-only coding agents, MaragingLoop treats code generation and VM interaction as equally critical pillars of its development cycle. The LLM doesn’t just write code in isolation—it actively modifies kernel source, compiles it, boots a live virtual machine, injects I/O, captures visual feedback, and then rewrites the code based on real-world execution results. This closed-loop process bridges abstract reasoning with bare-metal reality.

👤 Quick Note from the human author (Read or Skip this as you like)

Highly Experimental !!!

You can see some images here -> https://gistnoesis.github.io/MaragingLoop/ and in the docs folder

This project is LOCAL first, -> performance lags behind frontier models, and NO Consideration for CREDENTIALS were tested (DISCLAIMER : Use at your own risk for example if you put credentials in and they get leak, that's not my responsibilty).

Everything was created Locally through prompting. Development took 10 days. I've constructed them through interacting with the llama webui. And the kernels, either via the chat interface or via a single initial prompt starting from a current os folder state, (higher towers in full autonomy like in the shoggoth.db project have not been included here to keep the project minimal)

The single builderagent.py has been kept below 1000 lines (currently ~900 lines), you should read it make sure it correspond to what you want to run (like editing the system_prompt and summary_prompt, and compaction threshold).

The model used is unsloth/Qwen3.6-35B-A3B-GGUF/Qwen3.6-35B-A3B-UD-IQ4_NL_XL.gguf with llama.cpp

./llama-server --model unsloth/Qwen3.6-35B-A3B-GGUF/Qwen3.6-35B-A3B-UD-IQ4_NL_XL.gguf --mmproj unsloth/Qwen3.6-35B-A3B-GGUF/mmproj-BF16.gguf --image-min-tokens 1024 --image-max-tokens 1024 --temp 1.0 --top-p 0.95 --min-p 0.00 --jinja --top-k 20 --ctx-size 150000 --reasoning_budget 3000 --presence_penalty 1.5 -ctk f16 -ctv f16 --host 0.0.0.0 --port 8080 --no-mmap

The options are not very rigid, but keeping a small reasoning_budget help. the host 0.0.0.0 is because the LLM is on an other machine of the network (it allows anyone who can connect to your machine to use your llama server so set this option according to your needs), the ctx-size is kept small for speed and memory issues. If you need up to 250000 ctx-size you should offload the vision stack to the cpu, or you will encounter out of memory crashes. The image number of tokens can be adapted to trade speed for quality.

With a 4090 this produces ~150 tokens/s (starting at 180tok/s and slowing down to 110tok/s )

You'll need to use virtualbox and create a virtual machine called "agentos" and select the os/os.iso that will be generated by the first compilation and put this on the virtual optical drive, and configure the boot order.

The installation scripts are only necessary if you want more isolation of the LLM through Access Control List, the scripts were also vibecoded from the permission.md files I (the human) write. I have tested them and they worked (aka with the permissions set by the script the agent tools where functional, including the VM could create sreenshots, inspect_screenshots, send keyboard commands, ...) on my machine but in this process I had to regenerate them a lot of times so read permissions.md and use at your own risk, they were designed to use from the directory where they sit without having to specify arguments. I have put them in the repo for easier onboarding, but because permissions are highly dependent on your current setup they are for advanced users only and to help your trusted LLM agents to write a working script for your specific machine.

What doesn't work yet : I couldn't get the mouse pointer to work, nor write a correct network driver detect some network packet coming from the host configured as NAT with port forwarding while vibecoding even using reference PCFastIII driver from linux (looks like it would almost work, but that was at the beginning of the development phase). (The network stack need extra-tools for builderagent.py)

The project is a boiler plate, you can easily add your own tools by asking your local llama. Give it the single builderagent.py and describe the new tool you want, then either copy paste in place, or regenerate the full content.

I have used virtual-box but you can ask your local llama to generate a qemu version instead. In particular with virtual-box the mouse-pointer seems to require some guest-additions, (currently couldn't get the mouse doesn't work for baremetal os).

For examples you can try adding a copy_file_from_reference or adding a sqlite database memory like is done in my other project : https://github.com/GistNoesis/Shoggoth.db/

This agent control loop with vision like when it plays snake, is quite similar to my other project (Not yet on github) about driving a Robot with a LLM to get towards an object (Works but it is too slow for real-time robotics natively but can be used to generate some initial dataset to train specific neural networks.

I have tried to respect good engineering practice but I'm only human. It was designed with Security in mind. This is a harness with strong constraints, contrary to the other agents who abandonned all control hope to the whims of the LLM (what we used to call polymorphic malware) . Here you are still in control. LLM can't run arbitrary command, except inside the virtualbox. On your host permissions are severely restricted to writing to specific folder .c or .h files, and calling specific commands. But information on any files on your host can be trivially read by the LLM with simple tricks. I've also proposed a solution using Access Control Lists.

Like in the Shoggoth.db project, the main idea is setting up the foundation of a self-building tower. Make sure the keep control of the foundation, and push the safety problem to the big binary blob that are the model weights but they are generated deterministically from a dataset, which is also self-distilled from interaction with a VM, and known wiki.

"reflections on trusting trust"

Remember it's all AI slop.

🔄 Core Workflow

The agent operates in a continuous think → code → interact → observe → refine cycle. Code manipulation and VM interaction are the twin engines that drive iteration forward.

Reason: The LLM plans the next step using the thinking tool.
📝 Code Generation & Modification: The agent writes or updates .c/.h files in the os/ directory, implements kernel functions, adjusts linker scripts, or modifies GRUB configs based on the current goal and prior feedback.
🔨 Build & Compile: Automated toolchain invocation (gcc -m32, ld, grub-mkrescue) transforms source into a bootable os.iso.
🖥️ VM Interaction (Critical Step): The agent boots a headless VirtualBox VM, waits for the bootloader/kernel to load, and injects precise keyboard/mouse input. This simulates real hardware boundaries and exposes runtime behavior that static analysis cannot reveal.
👁️ Vision Inspection: A screenshot is captured, base64-encoded, and fed directly into the LLM’s context for visual analysis (boot logs, VGA output, kernel panics, GRUB errors, or unexpected behavior).
Refine: Based on vision feedback, the agent iterates—rewriting code, fixing compilation errors, adjusting I/O sequences, or recovering VM states—until the kernel meets the specified requirements.

🛠️ Key Features

Category	Capabilities
📝 Code Generation & Modification	Directly creates/edits `.c`/`.h` files, manages references, and writes kernel entry points. The LLM shapes the actual OS logic through precise file I/O tools.
🖥️ Critical VM Interaction	Full VirtualBox lifecycle: headless boot, ACPI power cycling, precise mouse/keyboard injection, and robust state recovery (`locked`, `crashed`, `running`).
Vision-Driven Feedback	Captures VM screens, encodes to base64, and feeds them to the LLM for real-time visual debugging and behavior verification.
Bare-Metal Build Pipeline	Automated 32-bit cross-compilation: `gcc -m32`, `ld -m elf_i386`, `as --32`, and `grub-mkrescue` for ISO generation.
Context Management	Auto-summarization when token/message limits are reached. Prevents context overflow while preserving session state.
Graceful Interruption	Custom `SIGINT` handler finishes the current VM/build step before halting. Safe for long-running iterations.
Dual Usage Modes	Single-query mode for automation, or interactive `chat` mode for step-by-step guidance.

📋 Prerequisites

Python 3.9+
llama.cpp server running locally:
llama-server --model <path-to-model> --host 0.0.0.0 --port 8080
VirtualBox with VBoxManage in PATH
32-bit Cross-Compilation Toolchain:
- gcc-multilib / gcc -m32
- binutils (for ld, as)
- grub-mkrescue (from grub2-common or grub-pc-bin)
requests Python package

🚀 Installation & Setup

# 1. Clone the repository
git clone https://github.com/<your-username>/MaragingLoop.git
cd MaragingLoop

# 2. Create a virtual environment
python -m venv .venv
source .venv/bin/activate          # Linux/macOS
# .venv\Scripts\activate           # Windows

# 3. Install dependencies
pip install requests

Update the server endpoint in builderagent.py or use .env file (with no spaces) if needed:

COMPLETION_API_URL="http://localhost:8080/v1/chat/completions"  # Default llama.cpp

💻 Usage

# Run a single task
python builderagent.py "Make the kernel print 'Hello Maraging' to the VGA console and halt."

# Enter interactive chat mode
python builderagent.py chat

Environment Variables (Optional):

Variable	Default	Description
`COMPLETION_API_URL`	`http://localhost:8080/v1/chat/completions`	LLM server endpoint
`VM_NAME`	`agentos`	VirtualBox VM name to control
`MAX_ITERATIONS`	`3000`	Safety limit for the ReAct loop

🧰 Tool Ecosystem

Group	Tools
📝 Code & Build	`write_file`, `read_file`, `read_reference_file`, `compile_kernel`, `compile_kernel_files`, `write_kernel`
🖥️ VM Interaction & Lifecycle	`start_vm`, `stop_vm`, `set_mouse_position`, `send_keyboard_input`, `launch_current_vm`
Vision Feedback	`take_screenshot`, `take_screenshot_and_inspect`, `inspect_snapshot`
Agent Flow	`thinking`, `finish`, `calculator`

📐 Design Philosophy

The name MaragingLoop draws from maraging steel—an ultra-high-strength alloy that hardens through controlled precipitation aging. Just as maraging steel gains resilience through repeated thermal and mechanical stress, MaragingLoop hardens bare-metal code through an iterative dual-process: 📝 active code generation and 🖥️ live VM interaction.

Code is where the LLM shapes the kernel’s logic, memory layout, and hardware interfaces. VM interaction is where that logic is stress-tested against real execution boundaries, bootloader behavior, and VGA output. Neither step is optional: the agent writes code to solve the problem, boots it to see reality, reads the screen to understand what broke, and rewrites the code to fix it. Each cycle precipitates stability, transforming fragile prototypes into production-ready bare-metal systems.

📜 License & Contributing

This project is released under the MIT License. Contributions, bug reports, and feature requests are welcome. Please follow standard fork/pull-request workflows.

🙏 Acknowledgments

llama.cpp for the high-performance local LLM server
Oracle VM VirtualBox for robust VM automation and VBoxManage CLI
OpenAI Function Calling specification for the native tool format
The bare-metal OS community for relentless inspiration

Built for developers who want AI to actually touch metal, boot it, and fix what breaks. 🔧🖥️💻

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
os		os
screenshots		screenshots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
builderagent.py		builderagent.py
create_user_and_setup_permissions.sh		create_user_and_setup_permissions.sh
permissions.md		permissions.md
run-agent.sh		run-agent.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaragingLoop

Autonomous Bare-Metal OS Agent | 📝 Code → 🖥️ VM Interact → 👁️ Inspect → 🔨 Harden

🔍 Overview

👤 Quick Note from the human author (Read or Skip this as you like)

🔄 Core Workflow

🛠️ Key Features

📋 Prerequisites

🚀 Installation & Setup

💻 Usage

🧰 Tool Ecosystem

📐 Design Philosophy

📜 License & Contributing

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MaragingLoop

Autonomous Bare-Metal OS Agent | 📝 Code → 🖥️ VM Interact → 👁️ Inspect → 🔨 Harden

🔍 Overview

👤 Quick Note from the human author (Read or Skip this as you like)

🔄 Core Workflow

🛠️ Key Features

📋 Prerequisites

🚀 Installation & Setup

💻 Usage

🧰 Tool Ecosystem

📐 Design Philosophy

📜 License & Contributing

🙏 Acknowledgments

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages