# EU AI Act RAG Demo - Compliance Assistant

This notebook demonstrates **Retrieval-Augmented Generation (RAG)** for EU AI Act compliance queries.

## Architecture

```
User Question ‚Üí Llama Stack Agent ‚Üí Milvus Vector Search ‚Üí Retrieved Context ‚Üí Mistral 24B ‚Üí Grounded Answer
```

## Key Features

- **Baseline vs. RAG Comparison**: See the difference between generic LLM responses and RAG-enhanced answers
- **Source Attribution**: Every answer includes precise citations (Article, Page, Document)
- **Compliance Use Case**: Demonstrates AI for legal/regulatory scenarios
- **Universal Relevance**: EU AI Act applies globally to any AI deployed in EU


## Setup


In [None]:
# Install required packages
!pip install -q llama-stack-client openai


In [None]:
import os
import uuid
from openai import OpenAI
from llama_stack_client import LlamaStackClient, Agent, AgentEventLogger

# Configuration
MISTRAL_URL = "https://mistral-24b-quantized-private-ai-demo.apps.cluster-n8cnx.n8cnx.sandbox2830.opentlc.com/v1"
LLAMASTACK_URL = "http://llama-stack-service.private-ai-demo.svc:8321"
MODEL_ID = "mistral-24b-quantized"

# Initialize clients
vllm_client = OpenAI(base_url=MISTRAL_URL, api_key="dummy")
llama_client = LlamaStackClient(base_url=LLAMASTACK_URL)

print("‚úÖ Clients initialized")
print(f"   Direct vLLM: {MISTRAL_URL}")
print(f"   Llama Stack: {LLAMASTACK_URL}")


## Test Questions

We'll test 4 types of queries that showcase RAG's value for compliance scenarios.


In [None]:
# Test questions covering different aspects of the EU AI Act
QUESTIONS = {
    "prohibited": "According to Article 5 of the EU AI Act, what AI practices are explicitly prohibited? List only the distinct categories found in the retrieved context.",
    "high_risk": "Is an AI-powered CV screening tool for hiring considered high-risk under the EU AI Act? Why or why not?",
    "timeline": "When do the main obligations of the EU AI Act come into force? What are the key dates?",
    "gpai": "What are the specific obligations for General Purpose AI (GPAI) models under the EU AI Act?"
}

# We'll use the "high_risk" question for the main demo
DEMO_QUESTION = QUESTIONS["high_risk"]

print("üìã Test Questions:")
for key, question in QUESTIONS.items():
    print(f"\n{key.upper()}:")
    print(f"  {question}")

## Scenario 1: Baseline (No RAG)

First, let's ask the model **without RAG**. The model will respond based only on its training data.


In [None]:
print("="*70)
print("BASELINE RESPONSE (No RAG - Direct vLLM)")
print("="*70)
print()
print(f"Question: {DEMO_QUESTION}")
print()

# Call vLLM directly (no RAG)
response_baseline = vllm_client.chat.completions.create(
    model=MODEL_ID,
    messages=[{"role": "user", "content": DEMO_QUESTION}],
    max_tokens=500
)

baseline_answer = response_baseline.choices[0].message.content
print(baseline_answer)
print()
print("="*70)


### ‚ö†Ô∏è Issues with Baseline Response:

- Generic, may be outdated
- No specific citations
- No reference to actual EU AI Act text
- Vague recommendations


## Scenario 2: RAG (With Llama Stack Agent)

Now let's use **RAG with the Llama Stack Agent**. The agent will:
1. Search for relevant EU AI Act content in Milvus
2. Retrieve precise articles and sections
3. Generate an answer grounded in the actual legal text


In [None]:
print("="*70)
print("RAG RESPONSE (With Vector Retrieval - Llama Stack Agent)")
print("="*70)
print()
print(f"Question: {DEMO_QUESTION}")
print()

try:
    # Create RAG agent using high-level API (Red Hat pattern)
    rag_agent = Agent(
        llama_client,
        model=MODEL_ID,  # Note: "model" not "model_id" for Agent class!
        instructions=(
            "You are an EU AI Act compliance assistant. "
            "Answer questions using ONLY information from the retrieved EU AI Act documents. "
            "\n"
            "For list-based questions about prohibited practices or similar:\n"
            "- State how many items the retrieved context contains\n"
            "- List each distinct item ONCE with [OJ p.X, Art.Y] citation\n"
            "- If items look similar, they are likely the same - list only once\n"
            "- STOP immediately after listing all distinct items\n"
            "\n"
            "For analytical questions (e.g., 'is X considered high-risk'):\n"
            "- Provide a clear, complete explanation\n"
            "- Reference specific Articles and Annexes with citations\n"
            "- Explain the reasoning and criteria\n"
            "\n"
            "If information is not in the sources, say 'Not found in sources.'"
        ),
        tools=[
            {
                "name": "builtin::rag/knowledge_search",
                "args": {"vector_db_ids": ["rag_documents"]},
            }
        ],
    )
    
    print("‚úÖ Agent created")
    print()
    
    # Create session and query
    session_id = rag_agent.create_session(session_name=f"eu-ai-act-demo-{uuid.uuid4().hex[:8]}")
    print(f"‚úÖ Session created: {session_id}")
    print()
    
    # Ask question with RAG (streaming)
    response = rag_agent.create_turn(
        messages=[{"role": "user", "content": DEMO_QUESTION}],
        session_id=session_id,
        stream=True,
    )
    
    # Capture and log the response
    rag_answer = ""
    for log in AgentEventLogger().log(response):
        log.print()
        # Capture all content chunks (streaming tokens)
        if hasattr(log, 'content') and log.content:
            rag_answer += log.content
    
    print()
    
except Exception as e:
    print(f"‚ùå Error: {e}")
    print("\n‚ÑπÔ∏è  Note: RAG requires documents to be ingested into Milvus.")
    print("   Run the Tekton pipeline to process documents first.")
    rag_answer = None

print("="*70)


## Side-by-Side Comparison


In [None]:
import textwrap

def print_comparison(baseline, rag):
    print("="*140)
    print(f"{'BASELINE (No RAG)':^70} | {'RAG (With Retrieval)':^70}")
    print("="*140)
    
    baseline_lines = textwrap.wrap(baseline or "N/A", width=68)
    rag_lines = textwrap.wrap(rag or "N/A", width=68)
    
    max_lines = max(len(baseline_lines), len(rag_lines))
    
    for i in range(max_lines):
        baseline_line = baseline_lines[i] if i < len(baseline_lines) else ""
        rag_line = rag_lines[i] if i < len(rag_lines) else ""
        print(f"{baseline_line:68} | {rag_line:68}")
    
    print("="*140)

if rag_answer:
    print_comparison(baseline_answer, rag_answer)
else:
    print("‚ö†Ô∏è  RAG answer not available - documents may need to be ingested")


## Test All Questions

Let's test all 4 question types to see RAG's performance across different scenarios.


In [None]:
def test_question_with_rag(question, question_type):
    """Test a question with RAG"""
    print(f"\n{'='*70}")
    print(f"TEST: {question_type.upper()}")
    print(f"{'='*70}")
    print(f"\nQuestion: {question}")
    print()
    
    try:
        # Create new session for each question using the Agent class
        new_session_id = rag_agent.create_session(session_name=f"test-{question_type}-{uuid.uuid4().hex[:8]}")
        
        # Ask question with streaming
        response = rag_agent.create_turn(
            messages=[{"role": "user", "content": question}],
            session_id=new_session_id,
            stream=True,
        )
        
        # Capture and log the response
        answer = ""
        for log in AgentEventLogger().log(response):
            log.print()
            # Capture all content chunks (streaming tokens)
            if hasattr(log, 'content') and log.content:
                answer += log.content
        
        print()
        return answer
        
    except Exception as e:
        print(f"‚ùå Error: {e}")
        import traceback
        traceback.print_exc()
        return None

# Test all questions if agent is available
if rag_answer and 'rag_agent' in locals():
    results = {}
    for q_type, question in QUESTIONS.items():
        if q_type != "high_risk":  # Already tested
            results[q_type] = test_question_with_rag(question, q_type)
    
    print("\n‚úÖ All questions tested!")
else:
    print("‚ö†Ô∏è  Skipping additional tests - agent not available")


## Business Value: Compliance Assistant

### Why This Matters for Enterprises

**1. Compliance Cost Reduction**
- ‚ùå Without RAG: Manual legal review, expensive consultants, weeks of research
- ‚úÖ With RAG: Instant, accurate compliance guidance with citations

**2. Risk Mitigation**
- ‚ùå Without RAG: Generic answers, potential non-compliance, fines up to ‚Ç¨35M
- ‚úÖ With RAG: Precise, cited answers grounded in actual legal text

**3. Speed to Market**
- ‚ùå Without RAG: Weeks to understand requirements
- ‚úÖ With RAG: Minutes to get actionable guidance

**4. Global Relevance**
- EU AI Act applies to ANY AI system deployed in the EU
- Affects US, Asian, and global companies
- Penalties: Up to 7% of global annual turnover

### Use Cases Beyond EU AI Act

This same architecture works for:
- **Legal**: Contract analysis, case law research
- **Healthcare**: Clinical guidelines, drug interactions
- **Finance**: Regulatory compliance (GDPR, MiFID II, etc.)
- **Manufacturing**: Safety standards, quality procedures
- **Any domain** with complex documentation requirements


## Red Hat AI Four Pillars

This demo showcases all four pillars:

### 1. ‚úÖ Efficient Inferencing
- Quantized Mistral 24B (4-bit compression)
- 50%+ cost savings vs. full precision
- Tool calling for efficient RAG retrieval

### 2. ‚úÖ Simplified Data Connection
- **Docling**: AI-powered document processing (tables, equations, annexes)
- **Tekton Pipeline**: Automated ingestion workflow
- **Milvus**: Enterprise vector database
- **Llama Stack**: Unified RAG runtime

### 3. ‚úÖ Hybrid Cloud Flexibility
- All data on-premise (sovereign AI)
- Multi-tenant architecture (`ai-infrastructure` shared services)
- GitOps-managed deployments
- Air-gap ready

### 4. ‚úÖ Agentic AI Delivery
- Llama Stack Agent with tool calling
- Autonomous retrieval and generation
- Foundation for Stage 3 (MCP servers)


## Summary

**What We Demonstrated:**
- ‚úÖ RAG significantly improves answer quality for compliance queries
- ‚úÖ Source attribution builds trust and auditability
- ‚úÖ Automated document processing (Tekton pipeline)
- ‚úÖ Enterprise-grade architecture (Llama Stack, Milvus, vLLM)
- ‚úÖ Red Hat AI Four Pillars in action

**Key Takeaway:**  
RAG transforms LLMs from generic chatbots into **precise, auditable compliance assistants** that can save enterprises millions in legal review costs while mitigating regulatory risk.

---

**Status**: ‚úÖ Production Ready  
**Demo Time**: 10-15 minutes  
**Audience**: Legal, Compliance, Risk Management, Enterprise AI
