# Magentic-One to FARA-GRC: Implementation Guide for Forensic M365 Auditing

This notebook demonstrates how **FARA-GRC** (Forensic AI-Reasoned Automation for Governance, Risk & Compliance) extends the **Magentic-One** multi-agent architecture (Fourney et al., 2024) to deliver court-admissible M365 compliance audits.

> ðŸ“– **Citation**: Fourney, A., et al. (2024). *Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks*. Microsoft Research AI Frontiers. arXiv:2411.04468

### Key Architectural Innovations (from Magentic-One)
*   **Two-Ledger System**: The Orchestrator maintains a **Task Ledger** (high-level facts and plans) and a **Progress Ledger** (execution state).
*   **Nested Loop Logic**: An **Outer Loop** for strategic planning and an **Inner Loop** for tactical execution.
*   **Stall Counter**: A mechanism to detect unproductive loops and trigger re-planning (threshold â‰¤ 2).
*   **Educated Guesses**: Using LLM-generated hypotheses to guide search and reduce hallucination sensitivity.

### FARA-GRC Extensions for Forensic Auditing
| Magentic-One Concept | FARA-GRC Extension | Rationale |
|----------------------|--------------------|-----------| 
| WebSurfer Agent | **FaraWebSurfer** | Adds forensic metadata capture (timestamps, hashes) to every screenshot |
| Approval Heuristics | **ApprovalGuard** | Court-admissible chain-of-custody logging for all sensitive actions |
| Docker Isolation | **LXD Containers** | Stronger forensic isolation with read-only filesystem snapshots |
| Error Logging | **Forensic Failure Classification** | Maps errors to research-validated codes (see Â§6) |

---

## 1. Environment Setup for FARA-GRC

FARA-GRC is built on the **AutoGen** framework. For forensic-grade isolation and reproducibility, code execution happens within Docker/LXD containers.

```bash
pip install autogen-agentchat autogen-ext magentic-ui
```

Ensure Docker is available for the `ComputerTerminal` and `Coder` agents. For maximum forensic isolation, consider **LXD** (see `docker-lxd-conversion-tools.md`).

## 2. Configuring LLM Endpoints for Multi-Agent Teams

Magentic-One benefits from model diversity. The **Orchestrator** and **Coder** often use high-reasoning models like `o1-preview`, while **WebSurfer** and **FileSurfer** use multimodal models like `gpt-4o`.

```python
import os
from autogen_core.models import UserContent, SystemMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient

# Configuration for strategic agents (Orchestrator/Coder)
strategic_config = {
    "model": "o1-preview",
    "api_key": os.environ.get("OPENAI_API_KEY"),
}

# Configuration for multimodal agents (WebSurfer/FileSurfer)
multimodal_config = {
    "model": "gpt-4o",
    "api_key": os.environ.get("OPENAI_API_KEY"),
}

orchestrator_client = OpenAIChatCompletionClient(**strategic_config)
surfer_client = OpenAIChatCompletionClient(**multimodal_config)
```

> **ðŸ”¬ FARA-GRC Note**: For M365 auditing, we recommend Azure OpenAI endpoints for **data residency compliance**. See `samples/sample_azure_agent.py` for configuration.

---

## 3. Implementing the Orchestrator with Task and Progress Ledgers

The Orchestrator is the "brain" of the system. Unlike simple routers, it uses two structured ledgers to maintain state.

### The Task Ledger (Outer Loop)
Maintains long-term memory:
1.  **Given/Verified Facts**: Ground truth discovered during the task (e.g., "MFA is enabled for all users").
2.  **Facts to Look Up**: Missing information required for the plan (e.g., "What is the current Conditional Access policy?").
3.  **Educated Guesses**: Hypotheses to guide agents when stuck (e.g., "MFA exemption might exist for break-glass accounts").
4.  **Task Plan**: The high-level sequence of steps (e.g., "1. Navigate to Azure AD â†’ 2. Extract MFA report â†’ 3. Compare against policy").

### The Progress Ledger (Inner Loop)
Maintains short-term execution state:
*   **Is the task complete?** (Requires evidence screenshot for "yes")
*   **Is the team looping?** (Triggers Stall Counter)
*   **Is progress being made?** (Used to detect "thrashing")
*   **Who is the next speaker?** (Agent routing)
*   **What is the specific instruction for them?** (Task delegation)

### The Stall Counter (Research-Validated)
> *"If a loop is detected, or there is a lack of forward progress, the counter is incremented. Once the counter reaches 2, the Orchestrator breaks from the inner loop, returning control back to the outer loop."* â€” Fourney et al. (2024), Â§3.1

This prevents the agent from looping indefinitely on failed strategiesâ€”critical for forensic auditing where every action is logged and billed.

## 4. Defining Specialized Agents for M365 Forensic Auditing

Each agent is a specialist with a unique action space. FARA-GRC extends the base Magentic-One agents for compliance workflows:

| Magentic-One Agent | FARA-GRC Equivalent | M365 Audit Role |
|--------------------|---------------------|-----------------|
| **WebSurfer** | **FaraWebSurfer** | Navigate M365 Admin Center, capture forensic screenshots with metadata |
| **FileSurfer** | FileSurfer (unchanged) | Parse audit reports (CSV, Excel, PDF) |
| **Coder** | Coder (unchanged) | Generate compliance scripts, transform audit data |
| **ComputerTerminal** | Sandboxed in Docker/LXD | Execute scripts in isolated environment |

```python
from autogen_agentchat.agents import CodeExecutorAgent, AssistantAgent
from magentic_ui.agents.web_surfer.fara import FaraWebSurfer
from autogen_ext.code_executors.docker import DockerCommandLineCodeExecutor

# Initialize the FaraWebSurfer (forensic variant)
web_surfer = FaraWebSurfer(
    name="FaraWebSurfer",
    model_client=surfer_client,
    # Captures: timestamp, URL, viewport coords, element hash, user session ID
)

# Initialize the Coder and Terminal (sandboxed)
executor = DockerCommandLineCodeExecutor(image="python:3.11")
terminal = CodeExecutorAgent(name="ComputerTerminal", code_executor=executor)

coder = AssistantAgent(
    name="Coder",
    model_client=orchestrator_client,
    system_message="You are a GRC automation specialist. Write Python to transform audit data."
)
```

---

## 5. The Orchestrator: Two-Ledger System in Detail

The Orchestrator is FARA-GRC's "brain"â€”coordinating agents and maintaining state. Unlike simple routers, it uses the **Two-Ledger System** from Magentic-One (Fourney et al., 2024).

### Task Ledger (Outer Loop) â€” Strategic Memory

```python
task_ledger = {
    "verified_facts": [
        "MFA is enabled for the 'All Users' group",
        "Break-glass account 'admin-emergency@contoso.com' exists"
    ],
    "facts_to_look_up": [
        "Is break-glass account exempt from MFA?",
        "When was MFA policy last modified?"
    ],
    "educated_guesses": [
        "Break-glass accounts are often exempted for recovery scenarios"
    ],
    "plan": [
        "Step 1: Navigate to Azure AD > Conditional Access",
        "Step 2: Find MFA policy and check exclusions",
        "Step 3: Capture evidence screenshot",
        "Step 4: Generate compliance report"
    ]
}
```

### Progress Ledger (Inner Loop) â€” Tactical State

```python
progress_ledger = {
    "is_request_satisfied": False,
    "is_team_in_loop": False,
    "is_progress_being_made": True,
    "next_speaker": "FaraWebSurfer",
    "instruction_for_next_speaker": "Navigate to https://security.microsoft.com and click on 'Conditional Access'",
    "stall_count": 0  # Incremented if no progress; triggers re-plan at > 2
}
```

### The Stall Counter Mechanism

> *"If a loop is detected, or there is a lack of forward progress, the counter is incremented. As long as this counter remains below a threshold (â‰¤ 2 in our experiments), the Orchestrator initiates the next team action... However, if the counter exceeds the threshold, the Orchestrator breaks from the inner loop, and proceeds with another iteration of the outer loop."* â€” Fourney et al. (2024), Â§3.1

```python
def check_stall(progress_ledger: dict, threshold: int = 2) -> bool:
    """Returns True if we should re-plan (break to outer loop)."""
    if not progress_ledger["is_progress_being_made"]:
        progress_ledger["stall_count"] += 1
    else:
        progress_ledger["stall_count"] = 0  # Reset on progress
    
    return progress_ledger["stall_count"] > threshold
```

---

## 6. Example: Executing an MFA Compliance Audit

A typical FARA-GRC workflow mirrors the Magentic-One pattern but adds forensic evidence capture.

**Example Task**: *"Verify that MFA is enabled for all M365 users and generate an audit report."*

### Workflow Diagram

```mermaid
sequenceDiagram
    participant User
    participant Orchestrator
    participant FaraWebSurfer
    participant FileSurfer
    participant Coder
    participant ApprovalGuard

    User->>Orchestrator: "Audit MFA compliance"
    Orchestrator->>Orchestrator: Create Task Ledger & Plan
    
    loop Inner Loop (until stall or complete)
        Orchestrator->>FaraWebSurfer: Navigate to M365 Security Center
        FaraWebSurfer->>FaraWebSurfer: Capture screenshot + metadata
        FaraWebSurfer->>Orchestrator: "Found Conditional Access page"
        
        Orchestrator->>FaraWebSurfer: Click on MFA policy
        FaraWebSurfer->>ApprovalGuard: Check if action needs approval
        ApprovalGuard-->>FaraWebSurfer: Auto-approved (read-only)
        FaraWebSurfer->>Orchestrator: "MFA policy details captured"
    end
    
    Orchestrator->>Coder: Generate compliance report
    Coder->>Orchestrator: Report generated
    
    Orchestrator->>ApprovalGuard: Request human sign-off
    ApprovalGuard->>User: "Review audit findings?"
    User->>ApprovalGuard: Approved
    
    Orchestrator->>User: Final report with evidence package
```

### Evidence Package Output

```json
{
  "audit_id": "mfa-audit-2024-001",
  "timestamp": "2024-12-27T15:30:00Z",
  "findings": {
    "mfa_enabled": true,
    "user_coverage": "98.5%",
    "exclusions": ["admin-emergency@contoso.com"]
  },
  "evidence": [
    {
      "screenshot": "evidence/mfa-policy-main.png",
      "sha256": "a3f2b1c4d5e6...",
      "captured_at": "2024-12-27T15:28:12Z",
      "url": "https://security.microsoft.com/conditionalaccess/policy/abc123"
    }
  ],
  "chain_of_custody": {
    "auditor_id": "user@contoso.com",
    "session_id": "sess-7f8e9d0c",
    "approvals": [
      {"action": "final_report", "approved_by": "user@contoso.com", "at": "2024-12-27T15:32:00Z"}
    ]
  }
}
```

---

## 7. Forensic Failure Classification (From Research)

The Magentic-One paper identified systematic failure modes through automated log analysis. FARA-GRC uses these codes for **forensic failure classification**:

| Code | Definition | FARA-GRC Mitigation |
|------|------------|---------------------|
| `persistent-inefficient-actions` | Looping on failed strategies | **Stall Counter** triggers re-planning after 2 failures |
| `insufficient-verification-steps` | Marking complete without proof | **ApprovalGuard** requires evidence screenshot before sign-off |
| `inefficient-navigation-attempts` | Getting lost in UI | **OmniParser** for structured M365 navigation |
| `underutilized-resource-options` | Ignoring available tools | **Task Ledger** tracks all available agents/tools |
| `access-and-security-barriers` | Blocked by auth | **Authentik SSO** integration for delegated credentials |

> ðŸ“– **Citation**: These error codes are derived from Â§4.2 of Fourney et al. (2024), where logs from 7 model configurations were analyzed across GAIA, AssistantBench, and WebArena benchmarks.

---

## 8. Evaluation: FARA-GRC Performance Targets

Building on Magentic-One's benchmark results, FARA-GRC targets **M365-specific audit accuracy**:

| Benchmark | Magentic-One (Paper) | FARA-GRC Target | Notes |
|-----------|----------------------|-----------------|-------|
| GAIA | 38.0% | N/A | General AI assistant benchmark |
| WebArena | 32.8% | N/A | General web navigation |
| AssistantBench | 27.7% | N/A | Time-consuming web tasks |
| **M365 MFA Audit** | N/A | **95%+** | Primary use case |
| **Conditional Access Verify** | N/A | **90%+** | Policy compliance |

> **ðŸ“Š Why Higher Targets?** FARA-GRC operates on **structured enterprise UIs** (M365 Admin Center), not the open web. OmniParser achieves 85-95% accuracy on such UIs, far exceeding general web navigation benchmarks.

---

## 9. Next Steps: Implementing FARA-GRC

To build on this guide:

1. **Read the Full Architecture**: [m365-forensic-audit-system-design.md](../../m365-forensic-audit-system-design.md)
2. **Try the Browser Tutorial**: [web_agent_tutorial_full.ipynb](tutorials/web_agent_tutorial_full.ipynb) â€” Learn the foundations step-by-step
3. **Explore the Codebase**:
   - [approval_guard.py](../../src/magentic_ui/approval_guard.py) â€” Human-in-the-loop safety
   - [teams/orchestrator/_orchestrator.py](../../src/magentic_ui/teams/orchestrator/_orchestrator.py) â€” Two-Ledger implementation
   - [agents/web_surfer/fara/](../../src/magentic_ui/agents/web_surfer/fara/) â€” Forensic browser agent

> **ðŸ“– Full Citation**: Fourney, A., Bansal, G., Mozannar, H., Campos, J., Wang, X., Nushi, B., Dibia, V., Sun, R., Zhang, H., & Amershi, S. (2024). *Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks*. Microsoft Research AI Frontiers. arXiv:2411.04468