| title | Os Expert Environment Server | |
|---|---|---|
| emoji | 🌟 | |
| colorFrom | yellow | |
| colorTo | red | |
| sdk | docker | |
| pinned | false | |
| app_port | 8000 | |
| base_path | / | |
| sleep_timeout | ||
| tags |
|
We gave a language model a shell, a live Linux kernel, and 15 broken systems. We didn't just ask it to call tools; we asked it to diagnose forensic anomalies that no simple command can fix.
This is OS_EXPERT_ENV — a self-improving environment where an RL agent learns to diagnose and resolve real production Linux failures through "Sovereign" system dynamics, adversarial safety gating, and GRPO-ready trajectory gathering.
Bangalore OpenEnv Hackathon 2026 Submission | Built with OpenEnv | Deployed on HF Spaces | Training via Unsloth in Colab
Early in development, our agent was a "Tool Caller." It could run ls and cat, but it lacked reasoning. Our initial logs showed an agent "Reward Hacking"—receiving points for accidentally fixing problems it didn't understand or stumbling into success because the environment was too fragile. It was a "Mock" world where real system signals were missing.
As we scaled to the full 15-task suite, the system broke. We hit the "Cascade of Reconnect" failures. The environment crashed when the agent touched the firewall, and the LLM (Llama-3.1-8B) choked on its own context window, hitting the 8,192 token wall. We realized that a simple simulator wouldn't cut it. To build a truly Sovereign Agent, we had to build a Sovereign Environment.
We moved away from mocks and implemented real Linux Kernel Dynamics. We used os.fork() to create genuine Zombie Processes and os.unlink() on open handles to create File Descriptor Leaks. We built a Safety Oracle with an explicit "Allowed Writes" table to let the agent fix /etc/sudoers without burning down the host.
Today, OS_EXPERT_ENV is a Forensic Detective Gym that demands reasoning over simple automation.
We have implemented 15 deterministic tasks that test "Detective Reasoning" across four high-stakes categories:
| Category | Forensic Challenge | Innovation |
|---|---|---|
| Kernel Forensics | Zombie Reaping (T4) | The agent cannot just kill -9. It must identify the parent and signal it to reap the child. |
| Resource Leaks | Deleted-Open FDs (T9) | df says the disk is full, but du says it's empty. The agent must trace the FD handle back to a leaked PID. |
| Security Audits | Sudoers/SUID (T11/12) | The agent must distinguish between authorized configs and rogue backdoors using kernel stat checks. |
| Network Triage | Port Hijacking (T5) | A rogue process is aggressively binding to production ports. The agent must use net.ports to find the interloper. |
Built on the OpenEnv Framework, our environment prioritizes Forensic Realism and Host Stability.
-
The Safety Oracle: A robust gating system that prevents destructive commands (like
rm -rf /) while allowing surgical edits to protected files (like/etc/hosts) only for authorized task IDs. -
Dense Reward Rubric:
$R_{total} = R_{outcome} + R_{process} + R_{safety} - P_{risk} - P_{steps}$ . This rewards the journey (finding clues) as much as the destination (fixing the bug). - Context Sliding Window: A proprietary "Pinned Header" strategy that keeps the System Prompt and Task instructions pinned at the top while cycling out stale step history to prevent context overflows.
-
Lazy Reconnection: A resilient client-side logic that detects Hugging Face Space hibernation, pings the
/healthendpoint, and re-establishes the WebSocket connection without losing task progress.
We used Expert Iteration with GRPO logic to train a Qwen2.5-1.5B agent. The results prove that system forensics is a learnable skill.
- Initial Avg Reward: -0.155 (Baseline agents frequently hit Safety Oracle "walls" or failed to find hidden leaks).
- Post-Training Avg Reward: +3.669 (The agent learned to use specialized forensic tools and investigate before acting).
- Safety: Near-zero rate of "Recursive chmod Hacking" as the agent learned that specific file fixes yield higher rewards than world-writable blanket permissions.
from client import OsExpertEnv
from models import SovereignAction
# Connect to the live "Detective Gym"
with OsExpertEnv(base_url="https://aksharaguptahehehehehe-os-expert-env.hf.space") as env:
result = env.reset(task_id=4)
print(f"Goal: {result.observation.system_snapshot['task_description']}")
# Example action: list processes to find zombies
result = env.step(SovereignAction(tool="proc.list", params={}))
print(result.observation.tool_result.stdout)# From the environment directory
openenv push --repo-id your-username/os-expert-envos_expert_env/
├── client.py # Sovereign WebSocket client with lazy reconnect
├── inference.py # RL evaluation script with context management
├── pipeline/
│ ├── project_templates.py # The "Sovereign 15" setup logic (os.fork, etc.)
│ └── episode_generator.py # Episode initialization
├── reward/
│ ├── aggregator.py # Dense reward calculation (R_total)
│ ├── grader.py # Deterministic Python-based outcome checkers
│ └── safety_oracle.py # Task-aware security gating
└── env/
├── action_router.py # 35-tool inventory & Linux-native shims
└── world_state.py # System state tracking
Team: Future Reward Contributors: Akshara Gupta, Bhavya Jain