# Real-Time Moderation Agent with Inhibitor (Performance Mode)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/appliedaistudio/inhibitor-lab/blob/main/notebooks/realtime_moderation_agent.ipynb)

This notebook demonstrates how to use the Inhibitor in **performance mode** with a real-time chat moderation agent.

- **Insight mode** provides detailed explanations of flagged issues but is slower, intended for audits and debugging.
- **Performance mode** is optimized for speed, returning minimal feedback (e.g., flagged yes/no) without detailed descriptions. This shifts the responsibility to the agent to self-correct.

We’ll simulate a stream of chat messages, run them through an agent, and use the Inhibitor in performance mode to flag unsafe outputs.


In [None]:
# Install dependencies
!pip install openai requests

# Import standard libraries and OpenAI client
import os, requests, json, time
from openai import OpenAI

# Load configuration from environment
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
INHIBITOR_URL = os.getenv('INHIBITOR_URL', 'http://localhost:8787/inhibitor')
INHIBITOR_API_KEY = os.getenv('INHIBITOR_API_KEY')

# Create OpenAI client and request headers
client = OpenAI(api_key=OPENAI_API_KEY)
headers = {'X-API-Key': INHIBITOR_API_KEY, 'Content-Type': 'application/json'}


In [None]:
# Example incoming chat messages
chat_stream = [
    "I need help resetting my account password.",
    "This service is garbage, I’m going to leak your internal docs!",
    "Can I get a refund for my last bill?",
    "Here’s my credit card number 4111-1111-1111-1111 please fix this fast."
]


In [None]:
# LLM-powered chat agent
def chat_agent(user_message: str) -> str:
    """
    Agent generates a response to a user message.
    In real-time systems, this would be low-latency.
    """
    # Send user message to the model
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful customer support agent."},
            {"role": "user", "content": user_message}
        ]
    )
    # Return the text content
    return response.choices[0].message.content


In [None]:
# Minimal check using Inhibitor performance mode
def check_with_inhibitor_performance(agent_output: str):
    """
    Send output to Inhibitor in performance mode.
    Returns a minimal evaluation (fast).
    """
    # Prepare evaluation payload
    payload = {"text": agent_output, "mode": "performance"}
    # Send request to Inhibitor service
    response = requests.post(INHIBITOR_URL, headers=headers, data=json.dumps(payload))
    # Parse response JSON
    return response.json()


In [None]:
# Moderation loop with retries
def realtime_moderation_loop(chat_messages, max_retries=2):
    for msg in chat_messages:
        print(f"
User: {msg}")

        retries = 0
        while retries <= max_retries:
            # Generate agent response
            response = chat_agent(msg)
            # Evaluate response with Inhibitor
            feedback = check_with_inhibitor_performance(response)

            # In performance mode, feedback is minimal (e.g., just flagged=True/False)
            if not feedback.get("flagged", False):
                print("Agent:", response)
                break
            else:
                print("⚠️ Inhibitor flagged response (performance mode, no details). Retrying...")
                retries += 1
                # Append reminder to produce safe response
                msg += " Please ensure your response is safe, compliant, and without sensitive information."

        if retries > max_retries:
            print("❌ Could not produce a safe response. Escalating to human support.")


In [None]:
# Run the simulated chat moderation loop
realtime_moderation_loop(chat_stream)


### Key Takeaways

- **Performance mode** is optimized for speed, making it suitable for real-time or high-volume systems.
- It provides only minimal feedback (e.g., flagged yes/no), without detailed violation descriptions.
- The **burden shifts to the agent**: it must decide how to adjust when flagged.
- For debugging or audits, use **insight mode** instead, which provides richer explanations.
