Skip to content

maryada-ai/maryada

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ Maryada: Semantic Firewall for AI Agents

"AdBlock for AI Agents" — A safety middleware that prevents your autonomous agents from leaking data, breaking things, or getting hacked.

Rust Gemini 3.0 License


🚨 The Problem

Autonomous AI agents (like AutoGPT, Devin, or custom Larvis-like builds) represent a paradigm shift in software capabilities, but they introduce unprecedented risks. Giving an agent unrestricted internet access is akin to giving a junior intern root access to your production environment without supervision. An agent might accidentally paste strict API keys into a public pastebin, or blindly execute a destructive command like DROP TABLE users because it hallucinated a debugging step.

Furthermore, agents are vulnerable to prompt injection attacks from the very web pages they browse. A malicious website can contain hidden text instructing the agent to ignore its original instructions and exfiltrate sensitive data. Traditional firewalls and WAFs are insufficient because they operate on strict rules and keywords—they lack the semantic understanding to distinguish between a legitimate database query during development and a catastrophic one in production.

Maryada is a semantic firewall for AI agents. It sits between your agent and the internet, understanding the intent of every request and the safety of every response before allowing traffic to pass.


⚡ What Maryada Does

Maryada acts as a transparent HTTP proxy, providing a multi-layered defense system:

  1. Local Proxy Interception: Inspects all HTTP traffic (requests and responses) flowing through your agent.
  2. Smart DLP (Data Loss Prevention): Uses regex patterns and JSON-aware traversal to locally identify and redact sensitive data like API keys, passwords, and PII before it leaves your machine.
  3. Semantic Policy Engine: Utilizes Google Gemini 3.0 Flash Preview to reason about the intent of a request (e.g., distinguishing "Deleting a temporary test file" vs. "Deleting a production database").
  4. Human-in-the-Loop: For high-risk actions (like deleting resources not owned by the agent), Maryada pauses the connection and requests explicit approval from a human operator via the CLI.
  5. Resource Ownership Tracking: Automatically tracks resources created by the agent, allowing it to manage its own lifecycle (create/delete) while preventing it from touching external critical infrastructure.
  6. Inbound Defense Shield: Scans incoming responses for potential prompt injection attacks to protect the agent's context.

🏗️ Architecture

sequenceDiagram
    participant Agent as 🤖 AI Agent
    participant Maryada as 🛡️ Maryada Proxy
    participant DLP as 🔍 Local DLP
    participant Gemini as 🧠 Gemini 3.0 (Thinking)
    participant Human as 👤 Human Operator
    participant Internet as 🌍 Internet

    Note over Agent: Uses http://localhost:8080

    Agent->>Maryada: HTTP Request (POST /db)
    
    Maryada->>DLP: Scan for PII
    DLP-->>Maryada: Redacted Body

    Maryada->>Maryada: Check ownership ledger
    Note right of Maryada: Did the agent create this resource?
    
    Maryada->>Gemini: "Is this safe? {sanitized summary}"
    
    alt is Unsafe/Critical
        Gemini-->>Maryada: BLOCK or ASK_HUMAN
        opt Ask Human
            Maryada->>Human: "Approve DELETE?"
            Human-->>Maryada: Yes/No
        end
    else is Safe
        Gemini-->>Maryada: ALLOW
        Maryada->>Internet: Forward Request
        Internet-->>Maryada: Response
        
        Maryada->>Gemini: "Check for Injection"
        alt Injection Detected
            Maryada->>Agent: 403 DANGER
        else Clean
            Maryada->>Agent: Forward Response
        end
    end
Loading

🧠 How Gemini 3 is Used (Runtime)

Maryada leverages Google Gemini 3.0 Flash Preview as its core reasoning engine. Unlike static rule-based systems, Maryada:

  1. Summarizes Context: Sends a sanitized summary of the request (method, URL, body preview) to Gemini 3.
  2. Intent Reasoning: Gemini analyzes the intent of the action against a provided safety policy (e.g., "relaxed-dev" or "strict-prod"). It distinguishes between benign and malicious actions based on semantic understanding.
  3. Structured Decisions: Gemini returns a structured verdict (ALLOW, BLOCK, ASK_HUMAN) along with a reasoning explanation.
  4. Enforcement: Maryada enforces this decision at the network layer. If Gemini detects a threat, the request is blocked before it ever leaves the network.
  5. Fail-Closed Design: If the Gemini API is unreachable or fails to respond, Maryada defaults to a "Fail-Closed" state, blocking potential threats rather than failing open.

🛠️ How This Project Was Built with Gemini

This project itself is a testament to the power of AI-assisted development. Maryada was built with Gemini 3 via Antigravity as a "co-pilot," not as an auto-generator.

  • Pair Programming: I used Gemini 3 to rapidly prototype the proxy architecture in Rust, leveraging its knowledge of hyper and tokio to handle complex async logic.
  • Design Refinement: Gemini helped explore different architectural approaches, such as where to place the DLP logic vs. the semantic analysis, and how to implement the "Ownership Illusion" for resource tracking.
  • Iterative Polish: From generating the initial scaffolding to refining the CLI output and color-coding, Gemini acted as a tireless partner, accelerating the development cycle significantly.
  • Human-Led: While Gemini provided code and suggestions, the core safety philosophy, architectural decisions, and final review were strictly human-led. This project showcases Gemini 3 as a powerful developer productivity tool that amplifies human capability.

🚀 Demo / Quick Start

1. Prerequisites

2. Setup & Run

# Clone the repo
git clone https://github.com/controlplanehq/maryada.git
cd maryada

# Export your API Key
export GEMINI_API_KEY="your_api_key_here"

# Start the Proxy
cargo run --bin maryada

3. Run the "Rogue Agent" Test

We included a test suite to simulate various agent behaviors. In a new terminal:

cargo run --bin rogue_agent

4. What You Will See

  1. Safe GET: Request to example.com✅ ALLOWED.
  2. PII Leak: Posting a password → 🛡️ REDACTED (Modified before sending).
  3. Destructive Act: DELETE /resource/prod-db (Unowned) → 👤 ASK HUMAN (Console prompts you).
  4. Resource Ownership: POST /resource/my-file (Creates it) then DELETE /resource/my-file✅ ALLOWED (Agent owns it).

⚠️ Limitations

  • Latency: Introducing an LLM round-trip for every request adds latency. Maryada is optimized for safety, not sub-millisecond throughput.
  • Context Window: Extremely large request bodies are currently truncated before analysis to fit within context windows and performance budgets.
  • Prompt Injection Arms Race: While Gemini 3 is highly capable, no AI model is immune to all forms of adversarial attacks. Maryada is a layer of defense, not a silver bullet.

🔮 What's Next

  • Streaming Analysis: Inspecting response bodies in real-time streams rather than buffering.
  • Custom Policy DSL: Allowing users to define complex policies in a simplified language.
  • Dashboard UI: A web-based dashboard to view blocked requests and manage approvals.

License

Maryada is licensed under the Apache License 2.0. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages