# 03: Building the Filter Agent

**Duration:** 1 hour

**What You'll Learn:**
- What an "agent" actually is (spoiler: simpler than you think)
- Building a classification agent with reasoning
- Prompt engineering for consistent results
- Testing and evaluating agent performance

**What We're Building:**
An agent that reads tender descriptions and decides if they're relevant for our tech company. This is Agent #1 in our procurement pipeline.

---

## What is an Agent?

An "agent" sounds fancy, but it's really just:

```python
def agent(input_data):
    prompt = build_prompt(input_data)
    response = call_llm(prompt)
    return parse_response(response)
```

That's it! An agent is a function that:
1. Takes structured input
2. Builds a prompt
3. Calls an LLM
4. Returns structured output

No magic. Just good engineering.

## Step 1: Setup

We'll use the structured output patterns from notebook 02.

In [None]:
!pip install httpx pydantic

In [None]:
import httpx
import json
import asyncio
from typing import List, Type, TypeVar
from pydantic import BaseModel, Field, ValidationError
from enum import Enum

# LLM configuration
BASE_URL = "http://localhost:1234/v1"
MODEL = "local-model"

print("‚úì Imports ready")

## Step 2: Define Input and Output Models

Good agents have clear contracts. Let's define what goes in and what comes out.

In [None]:
# INPUT: Tender data
class Tender(BaseModel):
    """A procurement tender to analyze"""
    id: str
    title: str
    description: str
    organization: str
    deadline: str
    estimated_value: str | None = None

# OUTPUT: Filter result
class TenderCategory(str, Enum):
    """Possible tender categories"""
    CYBERSECURITY = "cybersecurity"
    AI = "ai"
    SOFTWARE = "software"
    OTHER = "other"

class FilterResult(BaseModel):
    """Structured output from filter agent"""
    is_relevant: bool = Field(description="Is this tender relevant?")
    confidence: float = Field(description="Confidence 0-1", ge=0, le=1)
    categories: List[TenderCategory] = Field(description="Detected categories")
    reasoning: str = Field(description="Explanation for decision")

print("‚úì Models defined")

## Step 3: Build the LLM Service

We'll reuse the structured output function from notebook 02, with improvements.

In [None]:
T = TypeVar('T', bound=BaseModel)

def build_structured_prompt(prompt: str, model_class: Type[BaseModel]) -> str:
    """Add schema to prompt"""
    schema = model_class.model_json_schema()
    
    return f"""{prompt}

CRITICAL: Respond with ONLY valid JSON matching this schema:
{json.dumps(schema, indent=2)}

Do not include markdown, code blocks, or explanatory text.
Return ONLY the raw JSON object.
"""

async def call_llm(
    prompt: str,
    response_model: Type[T],
    system_prompt: str,
    temperature: float = 0.1,
    max_retries: int = 3
) -> T:
    """Call LLM with structured output and retries"""
    
    full_prompt = build_structured_prompt(prompt, response_model)
    
    for attempt in range(max_retries):
        try:
            async with httpx.AsyncClient(timeout=60.0) as client:
                response = await client.post(
                    f"{BASE_URL}/chat/completions",
                    json={
                        "model": MODEL,
                        "messages": [
                            {"role": "system", "content": system_prompt},
                            {"role": "user", "content": full_prompt}
                        ],
                        "temperature": temperature,
                    },
                )
                
                result = response.json()
                content = result["choices"][0]["message"]["content"]
                
                # Clean response
                content = content.strip()
                if content.startswith("```json"):
                    content = content[7:]
                if content.startswith("```"):
                    content = content[3:]
                if content.endswith("```"):
                    content = content[:-3]
                content = content.strip()
                
                # Parse and validate
                data = json.loads(content)
                return response_model.model_validate(data)
                
        except Exception as e:
            if attempt < max_retries - 1:
                print(f"  ‚ö† Attempt {attempt + 1} failed, retrying...")
                await asyncio.sleep(1)
            else:
                raise Exception(f"Failed after {max_retries} attempts: {e}")

print("‚úì LLM service ready")

## Step 4: Build the Filter Agent

Now the core logic: a function that takes a tender and returns a filter result.

In [None]:
async def filter_tender(tender: Tender) -> FilterResult:
    """
    Filter Agent: Determines if a tender is relevant
    
    Key decisions:
    - Low temperature (0.1) for consistency
    - Clear criteria in prompt
    - Explicit system prompt for role-playing
    """
    
    # Build the prompt with clear criteria
    prompt = f"""Analyze this procurement tender:

TITLE: {tender.title}

DESCRIPTION: {tender.description}

ORGANIZATION: {tender.organization}

CRITERIA FOR RELEVANCE:
A tender is relevant if it involves:
1. Cybersecurity (threat detection, pentesting, security audits, SIEM, firewalls)
2. Artificial Intelligence/ML (AI solutions, automation, ML models, data science)
3. Software Development (custom software, web/mobile apps, SaaS, APIs)

A tender is NOT relevant if it's only:
- Hardware procurement (servers, computers, networking equipment)
- Physical infrastructure (buildings, cabling, facilities)
- Non-technical services (catering, cleaning, office supplies)

Analyze carefully and provide your assessment with reasoning.
"""
    
    system = """You are an expert procurement analyst specializing in technology tenders. 
You evaluate opportunities for a tech consultancy with expertise in cybersecurity, 
AI/ML, and software development. Be precise and conservative in your assessments."""
    
    return await call_llm(
        prompt=prompt,
        response_model=FilterResult,
        system_prompt=system,
        temperature=0.1  # Low temperature for consistent classification
    )

print("‚úì Filter agent ready")

## Step 5: Test with Clear Cases

Let's start with obvious examples to verify basic functionality.

In [None]:
# TEST 1: Clearly relevant - AI + Cybersecurity
print("TEST 1: AI Cybersecurity System")
print("=" * 70)

tender1 = Tender(
    id="T001",
    title="AI-Powered Threat Detection and Response System",
    description="""Government agency requires development of machine learning-based 
    cybersecurity platform for real-time threat detection, automated incident response, 
    and integration with existing SIEM infrastructure. Must include AI model training 
    and continuous improvement capabilities.""",
    organization="National Cybersecurity Center",
    deadline="2024-12-31",
    estimated_value="$2.5M"
)

result1 = await filter_tender(tender1)
print(f"Relevant: {result1.is_relevant}")
print(f"Confidence: {result1.confidence:.2f}")
print(f"Categories: {[c.value for c in result1.categories]}")
print(f"Reasoning: {result1.reasoning}")
print()

In [None]:
# TEST 2: Clearly NOT relevant - Office supplies
print("TEST 2: Office Furniture")
print("=" * 70)

tender2 = Tender(
    id="T002",
    title="Office Furniture and Equipment Supply",
    description="""Supply 500 ergonomic office chairs, standing desks, filing cabinets, 
    and general office equipment for new government building. Installation and 
    3-year warranty required.""",
    organization="General Services Administration",
    deadline="2024-11-15",
    estimated_value="$750K"
)

result2 = await filter_tender(tender2)
print(f"Relevant: {result2.is_relevant}")
print(f"Confidence: {result2.confidence:.2f}")
print(f"Categories: {[c.value for c in result2.categories]}")
print(f"Reasoning: {result2.reasoning}")
print()

In [None]:
# TEST 3: Software development - clearly relevant
print("TEST 3: Custom Web Application")
print("=" * 70)

tender3 = Tender(
    id="T003",
    title="Custom Tax Portal Development",
    description="""Develop modern web application for tax filing and payment. 
    Must include secure user authentication, payment processing integration, 
    mobile responsive design, and REST API for third-party integrations.""",
    organization="Department of Revenue",
    deadline="2025-03-01",
    estimated_value="$1.2M"
)

result3 = await filter_tender(tender3)
print(f"Relevant: {result3.is_relevant}")
print(f"Confidence: {result3.confidence:.2f}")
print(f"Categories: {[c.value for c in result3.categories]}")
print(f"Reasoning: {result3.reasoning}")
print()

## Step 6: Test Edge Cases

The interesting part: ambiguous cases where the right answer isn't obvious.

In [None]:
# EDGE CASE 1: Hardware with software component
print("EDGE CASE 1: Hardware + Software")
print("=" * 70)

tender4 = Tender(
    id="T004",
    title="Firewall Hardware Procurement",
    description="""Purchase 50 enterprise-grade firewall appliances. 
    Vendor must provide installation, configuration, and integration with 
    existing network infrastructure. Includes built-in software license.""",
    organization="IT Security Department",
    deadline="2024-10-30",
    estimated_value="$500K"
)

result4 = await filter_tender(tender4)
print(f"Relevant: {result4.is_relevant}")
print(f"Confidence: {result4.confidence:.2f}")
print(f"Categories: {[c.value for c in result4.categories]}")
print(f"Reasoning: {result4.reasoning}")
print("\nExpected: NOT relevant (primarily hardware procurement)\n")

In [None]:
# EDGE CASE 2: Research/consulting vs development
print("EDGE CASE 2: AI Research Study")
print("=" * 70)

tender5 = Tender(
    id="T005",
    title="AI Feasibility Study and Recommendations",
    description="""Conduct research and provide recommendations on implementing 
    AI for fraud detection. Deliverable is a written report with strategic 
    recommendations. No software development required.""",
    organization="Financial Crimes Unit",
    deadline="2024-12-15",
    estimated_value="$150K"
)

result5 = await filter_tender(tender5)
print(f"Relevant: {result5.is_relevant}")
print(f"Confidence: {result5.confidence:.2f}")
print(f"Categories: {[c.value for c in result5.categories]}")
print(f"Reasoning: {result5.reasoning}")
print("\nNote: This is debatable - consulting vs development\n")

In [None]:
# EDGE CASE 3: Training vs development
print("EDGE CASE 3: Cybersecurity Training")
print("=" * 70)

tender6 = Tender(
    id="T006",
    title="Cybersecurity Awareness Training Program",
    description="""Provide comprehensive cybersecurity training for 500 employees. 
    Include phishing simulations, online courses, and quarterly workshops. 
    Training materials and learning management system required.""",
    organization="Human Resources Department",
    deadline="2024-11-30",
    estimated_value="$200K"
)

result6 = await filter_tender(tender6)
print(f"Relevant: {result6.is_relevant}")
print(f"Confidence: {result6.confidence:.2f}")
print(f"Categories: {[c.value for c in result6.categories]}")
print(f"Reasoning: {result6.reasoning}")
print("\nNote: Training services might or might not fit our business model\n")

## Step 7: Batch Testing

Let's test multiple tenders and see overall performance.

In [None]:
# Create a diverse batch of tenders
test_tenders = [
    Tender(
        id="BATCH-1",
        title="Machine Learning Model Development",
        description="Build and deploy predictive ML models for customer churn analysis",
        organization="Telecom Company",
        deadline="2024-12-01"
    ),
    Tender(
        id="BATCH-2",
        title="Janitorial Services Contract",
        description="Daily cleaning and maintenance services for office building",
        organization="Facilities Management",
        deadline="2024-11-01"
    ),
    Tender(
        id="BATCH-3",
        title="Penetration Testing Services",
        description="Annual security assessment and penetration testing of web applications",
        organization="IT Security",
        deadline="2024-10-15"
    ),
    Tender(
        id="BATCH-4",
        title="Vehicle Fleet Procurement",
        description="Purchase 20 vehicles for government fleet",
        organization="Transportation Department",
        deadline="2024-09-30"
    ),
    Tender(
        id="BATCH-5",
        title="Mobile App Development - iOS and Android",
        description="Develop citizen services mobile application with biometric authentication",
        organization="Digital Services",
        deadline="2025-01-15"
    ),
]

print("BATCH TESTING")
print("=" * 70)
print(f"Processing {len(test_tenders)} tenders...\n")

results = []
for tender in test_tenders:
    result = await filter_tender(tender)
    results.append((tender, result))
    
    status = "‚úì RELEVANT" if result.is_relevant else "‚úó NOT RELEVANT"
    print(f"{tender.id}: {status} (confidence: {result.confidence:.2f})")
    print(f"  Title: {tender.title[:60]}")
    print(f"  Categories: {[c.value for c in result.categories]}")
    print()

# Summary statistics
relevant_count = sum(1 for _, r in results if r.is_relevant)
avg_confidence = sum(r.confidence for _, r in results) / len(results)

print(f"\nSUMMARY:")
print(f"Relevant: {relevant_count}/{len(test_tenders)}")
print(f"Average confidence: {avg_confidence:.2f}")

## Step 8: Wrap It in a Class

Let's make it production-ready with a proper class structure.

In [None]:
class FilterAgent:
    """
    Production-ready Filter Agent
    
    Responsibilities:
    - Classify tenders by relevance
    - Provide confidence scores
    - Categorize by domain (AI, cybersecurity, software)
    - Explain reasoning
    """
    
    def __init__(self, base_url: str = BASE_URL, temperature: float = 0.1):
        self.base_url = base_url
        self.temperature = temperature
        
    async def filter(self, tender: Tender) -> FilterResult:
        """Filter a tender for relevance"""
        
        prompt = f"""Analyze this procurement tender:

TITLE: {tender.title}
DESCRIPTION: {tender.description}
ORGANIZATION: {tender.organization}

CRITERIA FOR RELEVANCE:
Relevant if it involves:
1. Cybersecurity (threat detection, pentesting, security audits, SIEM)
2. Artificial Intelligence/ML (AI solutions, automation, ML models)
3. Software Development (custom software, web/mobile apps, SaaS)

NOT relevant if only:
- Hardware procurement
- Physical infrastructure
- Non-technical services

Provide your assessment with reasoning.
"""
        
        system = """You are an expert procurement analyst specializing in technology tenders.
Be precise and conservative in your assessments."""
        
        return await call_llm(
            prompt=prompt,
            response_model=FilterResult,
            system_prompt=system,
            temperature=self.temperature
        )

# Test the class
agent = FilterAgent()

test = Tender(
    id="CLASS-TEST",
    title="API Security Assessment",
    description="Conduct security review of REST APIs and provide remediation recommendations",
    organization="Tech Company",
    deadline="2024-12-01"
)

result = await agent.filter(test)
print(f"Agent test: {result.is_relevant} (confidence: {result.confidence:.2f})")
print(f"Categories: {[c.value for c in result.categories]}")

## üéâ Congratulations!

You built a real AI agent!

## What You Learned

1. **Agents are just functions** - Input ‚Üí Prompt ‚Üí LLM ‚Üí Output
2. **Clear criteria matter** - Specific instructions = better results
3. **Low temperature for consistency** - 0.1 for classification tasks
4. **System prompts set the role** - "You are an expert..."
5. **Test with edge cases** - Don't just test the obvious ones
6. **Confidence scores are useful** - Let downstream systems decide thresholds

## Design Decisions We Made

| Decision | Rationale |
|----------|----------|
| Temperature 0.1 | Consistent classification, not creative |
| Conservative criteria | Better to miss an opportunity than waste time |
| Required reasoning | Explainability for human review |
| Multiple categories | A tender can match multiple domains |
| Confidence score | Downstream agents can apply thresholds |

## Next Steps

Now that we can filter tenders, we need to **rate** them. Which relevant tenders are actually worth bidding on?

That's a more complex task requiring multi-dimensional scoring.

‚û°Ô∏è Continue to `04_rating_agent.ipynb`