# 04: Building the Rating Agent

**Duration:** 1 hour

**What You'll Learn:**
- Multi-dimensional scoring with LLMs
- Business logic in AI agents
- Prompt engineering for analytical tasks
- Balancing objective metrics with subjective reasoning

**What We're Building:**
An agent that evaluates filtered tenders on multiple dimensions (strategic fit, win probability, effort required) to help prioritize opportunities. This is Agent #2 in our pipeline.

**The Challenge:**
"Is it relevant?" is easy. "Is it worth bidding on?" requires business judgment.

---

## Why Rating Is Harder Than Filtering

Filtering is binary: relevant or not.

Rating requires:
- Multiple dimensions (fit, probability, effort)
- Comparative judgment (is 7.5 better than 6.8?)
- Business context (company capabilities, competition)
- Risk assessment (what could go wrong?)

This is where LLMs shine: fuzzy judgment based on incomplete information.

## Step 1: Setup

In [1]:
!pip install httpx pydantic



In [2]:
import httpx
import json
import asyncio
from typing import List, Type, TypeVar
from pydantic import BaseModel, Field

BASE_URL = "http://localhost:1234/v1"
MODEL = "local-model"
T = TypeVar('T', bound=BaseModel)

print("‚úì Imports ready")

‚úì Imports ready


## Step 2: Define Rating Output Schema

A good rating has multiple dimensions, not just a single score.

In [3]:
class RatingResult(BaseModel):
    """Multi-dimensional tender rating"""
    
    # Scores (0-10)
    overall_score: float = Field(description="Overall opportunity score 0-10", ge=0, le=10)
    strategic_fit: float = Field(description="How well this matches our expertise 0-10", ge=0, le=10)
    win_probability: float = Field(description="Likelihood of winning 0-10", ge=0, le=10)
    effort_required: float = Field(description="Complexity and resource needs 0-10", ge=0, le=10)
    
    # Qualitative analysis
    strengths: List[str] = Field(description="Top 3 strengths/opportunities")
    risks: List[str] = Field(description="Top 3 risks/challenges")
    recommendation: str = Field(description="Go/No-Go recommendation with reasoning")

# Test the model
example = RatingResult(
    overall_score=8.5,
    strategic_fit=9.0,
    win_probability=7.5,
    effort_required=8.0,
    strengths=[
        "Perfect match for our AI cybersecurity expertise",
        "High-value contract with long-term potential",
        "Existing relationship with client organization"
    ],
    risks=[
        "Tight timeline may strain resources",
        "Likely to attract large competitors",
        "Complex integration requirements"
    ],
    recommendation="GO - Strong strategic fit and reasonable win probability justify the effort."
)

print("Example rating:")
print(example.model_dump_json(indent=2))

Example rating:
{
  "overall_score": 8.5,
  "strategic_fit": 9.0,
  "win_probability": 7.5,
  "effort_required": 8.0,
  "strengths": [
    "Perfect match for our AI cybersecurity expertise",
    "High-value contract with long-term potential",
    "Existing relationship with client organization"
  ],
  "risks": [
    "Tight timeline may strain resources",
    "Likely to attract large competitors",
    "Complex integration requirements"
  ],
  "recommendation": "GO - Strong strategic fit and reasonable win probability justify the effort."
}


## Step 3: Build LLM Helper Function

**Important Learning:** In notebooks 02 and 03, we showed the LLM the JSON schema (the structure definition). That works, but LLMs sometimes get confused and return the schema itself instead of actual data!

**The Problem:**
```python
# Schema approach (what we used before)
schema = {"properties": {"score": {"type": "number"}}}
# LLM sees: "type": "number" 
# LLM returns: {"properties": {"score": {"type": "number"}}} ‚ùå
```

**The Solution:**
Show the LLM a concrete example instead! LLMs understand examples better than abstract schemas.

```python
# Example approach (what we use now)
example = {"score": 8.5}
# LLM sees: 8.5
# LLM returns: {"score": 7.2} ‚úÖ
```

This is a **production pattern** used in real systems. Let's implement it!

In [4]:
T = TypeVar('T', bound=BaseModel)

def build_structured_prompt(prompt: str, model_class: Type[BaseModel]) -> str:
    """
    Build prompt with concrete example instead of abstract schema.
    
    Why? LLMs understand examples better than JSON schemas.
    Showing {"score": 8.5} is clearer than showing {"type": "number", "min": 0}
    """
    
    # Build a concrete example based on the model
    if model_class.__name__ == "RatingResult":
        # For RatingResult, show realistic rating values
        example = {
            "overall_score": 8.5,
            "strategic_fit": 9.0,
            "win_probability": 7.5,
            "effort_required": 8.0,
            "strengths": [
                "Strong match for our expertise",
                "Reasonable contract value",
                "Good client relationship"
            ],
            "risks": [
                "Tight timeline may be challenging",
                "Competition from larger firms",
                "Technical complexity"
            ],
            "recommendation": "GO - Strategic fit justifies the effort."
        }
    else:
        # Fallback: use schema if we don't have a specific example
        example = model_class.model_json_schema()
    
    return f"""{prompt}

CRITICAL INSTRUCTIONS:
You must respond with ONLY a valid JSON object containing YOUR ANALYSIS.
Do NOT return a schema, return ACTUAL DATA.

EXPECTED FORMAT (fill in with your actual analysis):
{json.dumps(example, indent=2)}

IMPORTANT:
- Use actual numbers (like 8.5), not descriptions
- Use actual text in arrays (3 items for strengths and risks)
- All fields are required
- No markdown, no code blocks, no explanations
- Start with {{ and end with }}

Your JSON response:"""

async def call_llm(
    prompt: str,
    response_model: Type[T],
    system_prompt: str,
    temperature: float = 0.1,
    max_retries: int = 3
) -> T:
    """Call LLM with structured output and retries"""
    
    full_prompt = build_structured_prompt(prompt, response_model)
    
    for attempt in range(max_retries):
        try:
            async with httpx.AsyncClient(timeout=60.0) as client:
                response = await client.post(
                    f"{BASE_URL}/chat/completions",
                    json={
                        "model": MODEL,
                        "messages": [
                            {"role": "system", "content": system_prompt},
                            {"role": "user", "content": full_prompt}
                        ],
                        "temperature": temperature,
                    },
                )
                
                result = response.json()
                content = result["choices"][0]["message"]["content"]
                
                # Clean response - remove markdown code blocks
                content = content.strip()
                if content.startswith("```json"):
                    content = content[7:]
                if content.startswith("```"):
                    content = content[3:]
                if content.endswith("```"):
                    content = content[:-3]
                content = content.strip()
                
                # Extract JSON by finding balanced braces (handles extra text)
                start_idx = content.find('{')
                if start_idx != -1:
                    brace_count = 0
                    end_idx = -1
                    for i in range(start_idx, len(content)):
                        if content[i] == '{':
                            brace_count += 1
                        elif content[i] == '}':
                            brace_count -= 1
                            if brace_count == 0:
                                end_idx = i
                                break
                    if end_idx != -1:
                        content = content[start_idx:end_idx + 1]
                
                # Parse and validate
                data = json.loads(content)
                return response_model.model_validate(data)
                
        except Exception as e:
            if attempt < max_retries - 1:
                print(f" !! Attempt {attempt + 1} failed: {str(e)[:80]}...")
                print(f" !! Retrying...")
                await asyncio.sleep(1)
            else:
                raise Exception(f"Failed after {max_retries} attempts: {e}")

print("‚úì LLM helper ready (using example-based prompts)")

‚úì LLM helper ready (using example-based prompts)


## Step 4: Build the Rating Agent

This is where prompt engineering matters. We need to guide the LLM to think like a business analyst.

In [None]:
class Tender(BaseModel):
    """Input tender data"""
    id: str
    title: str
    description: str
    organization: str
    deadline: str
    estimated_value: str | None = None

async def rate_tender(
    tender: Tender, 
    categories: List[str],
    temperature: float = 0.1
) -> RatingResult:
    """
    Rating Agent: Evaluate business opportunity
    
    Key design decisions:
    - Low temperature (0.1) for consistent scoring
    - Multiple dimensions to avoid single-number bias
    - Explicit company context in prompt
    - Required strengths AND risks (balanced view)
    """
    
    prompt = f"""Rate this tender opportunity for a small tech consultancy:

TENDER DETAILS:
Title: {tender.title}
Client: {tender.organization}
Value: {tender.estimated_value or "Not specified"}
Deadline: {tender.deadline}
Categories: {', '.join(categories)}

DESCRIPTION:
{tender.description}

OUR COMPANY PROFILE:
- Small tech consultancy (10-15 people)
- Core expertise: AI/ML, Cybersecurity, Software Development
- Strong technical skills, limited by team size
- Track record with mid-sized government contracts
- Prefer projects lasting 3-12 months

EVALUATION CRITERIA:

1. STRATEGIC FIT (0-10):
   - How well does this match our expertise in {', '.join(categories)}?
   - Does it leverage our unique strengths?
   - Will it build valuable capabilities or relationships?

2. WIN PROBABILITY (0-10):
   - Are we genuinely competitive for this?
   - What's the likely competition (size, specialization)?
   - Do we have relevant experience and credibility?

3. EFFORT REQUIRED (0-10):
   - Technical complexity and scope
   - Resource requirements vs our team size
   - Timeline pressure and delivery risk

4. OVERALL SCORE (0-10):
   - Weighted assessment considering all factors
   - Value of opportunity vs investment required

Provide REALISTIC scores (not optimistic). Most opportunities should score 5-7.
Identify specific strengths and concrete risks.
Give clear Go/No-Go recommendation. 
"""
    
    system = """You are a business development expert evaluating tender opportunities.
You have 15 years of experience in government contracting and tech consulting.
Be analytical and realistic, not optimistic. Consider both opportunity and risk."""
    
    return await call_llm(
        prompt=prompt,
        response_model=RatingResult,
        system_prompt=system,
        temperature=temperature
    )

print("‚úì Rating agent ready")

‚úì Rating agent ready


## Step 5: Test with High-Value Opportunity

Let's rate a tender that should score well.

In [13]:
print("TEST 1: Strong Opportunity")
print("=" * 70)

tender1 = Tender(
    id="R001",
    title="AI-Powered Fraud Detection System",
    description="""Develop machine learning system to detect fraudulent transactions 
    in real-time. Must integrate with existing payment processing infrastructure. 
    Project includes model development, deployment, and 6 months of monitoring and 
    refinement. Team will work closely with our data science division.""",
    organization="State Financial Services Commission",
    deadline="2025-02-01",
    estimated_value="$850K"
)

rating1 = await rate_tender(
    tender1, 
    categories=["ai", "software"]
)

print(f"Overall Score: {rating1.overall_score}/10")
print(f"Strategic Fit: {rating1.strategic_fit}/10")
print(f"Win Probability: {rating1.win_probability}/10")
print(f"Effort Required: {rating1.effort_required}/10")
print(f"\nStrengths:")
for s in rating1.strengths:
    print(f"  + {s}")
print(f"\nRisks:")
for r in rating1.risks:
    print(f"  - {r}")
print(f"\nRecommendation: {rating1.recommendation}")
print()

TEST 1: Strong Opportunity
Overall Score: 6.0/10
Strategic Fit: 8.0/10
Win Probability: 5.0/10
Effort Required: 7.0/10

Strengths:
  + Strong AI/ML expertise aligns with fraud detection
  + Existing cybersecurity skills aid secure integration
  + Opportunity to deepen relationship with state commission

Risks:
  - Large competition from 50+ firm vendors
  - High technical complexity of real‚Äëtime integration
  - Limited team size may strain delivery timeline

Recommendation: NO-GO - High effort and competitive risk outweigh strategic fit.



## Step 6: Test with Poor Fit

A tender that's technically relevant but a bad business opportunity.

In [14]:
print("TEST 2: Poor Fit (Scale Mismatch)")
print("=" * 70)

tender2 = Tender(
    id="R002",
    title="National Cybersecurity Infrastructure Modernization",
    description="""Massive 5-year program to modernize cybersecurity infrastructure 
    across all federal agencies. Requires dedicated team of 100+ engineers, 
    proven experience with large-scale deployments, and existing national security 
    clearances. Prime contractor will coordinate 10+ subcontractors.""",
    organization="Department of Homeland Security",
    deadline="2024-11-01",
    estimated_value="$250M"
)

rating2 = await rate_tender(
    tender2,
    categories=["cybersecurity"]
)

print(f"Overall Score: {rating2.overall_score}/10")
print(f"Strategic Fit: {rating2.strategic_fit}/10")
print(f"Win Probability: {rating2.win_probability}/10")
print(f"Effort Required: {rating2.effort_required}/10")
print(f"\nStrengths:")
for s in rating2.strengths:
    print(f"  + {s}")
print(f"\nRisks:")
for r in rating2.risks:
    print(f"  - {r}")
print(f"\nRecommendation: {rating2.recommendation}")
print()

TEST 2: Poor Fit (Scale Mismatch)
Overall Score: 4.0/10
Strategic Fit: 6.0/10
Win Probability: 3.0/10
Effort Required: 9.0/10

Strengths:
  + Strong cybersecurity expertise
  + Experience with AI/ML can add value to modernization
  + Existing government contract experience

Risks:
  - Requires 100+ engineers, far beyond our team size
  - Need national security clearances we lack
  - High competition from large prime contractors with established subcontractor networks

Recommendation: NO - The effort and resource gap outweigh the strategic fit.



## Step 7: Test Edge Case - High Risk, High Reward

In [15]:
print("TEST 3: High Risk, High Reward")
print("=" * 70)

tender3 = Tender(
    id="R003",
    title="Experimental AI Research Platform",
    description="""Build novel AI research platform using cutting-edge techniques 
    (federated learning, differential privacy, quantum-resistant cryptography). 
    No existing commercial solutions. High technical risk but potential for 
    groundbreaking capabilities. 18-month timeline with staged milestones.""",
    organization="Defense Advanced Research Agency",
    deadline="2024-12-15",
    estimated_value="$1.2M"
)

rating3 = await rate_tender(
    tender3,
    categories=["ai", "cybersecurity"]
)

print(f"Overall Score: {rating3.overall_score}/10")
print(f"Strategic Fit: {rating3.strategic_fit}/10")
print(f"Win Probability: {rating3.win_probability}/10")
print(f"Effort Required: {rating3.effort_required}/10")
print(f"\nStrengths:")
for s in rating3.strengths:
    print(f"  + {s}")
print(f"\nRisks:")
for r in rating3.risks:
    print(f"  - {r}")
print(f"\nRecommendation: {rating3.recommendation}")
print()

TEST 3: High Risk, High Reward
Overall Score: 5.0/10
Strategic Fit: 8.0/10
Win Probability: 4.0/10
Effort Required: 9.0/10

Strengths:
  + Strong alignment with AI and cybersecurity expertise
  + Potential to develop cutting‚Äëedge capabilities
  + High contract value for a small firm

Risks:
  - Very high technical risk with no commercial precedent
  - Extremely tight 18‚Äëmonth timeline for a 10‚Äë15 person team
  - Likely competition from large, specialized defense contractors

Recommendation: NO - High effort and low win probability outweigh strategic benefits.



## Step 8: Comparative Analysis

Let's rate multiple tenders and compare them.

In [16]:
# Create diverse batch
batch_tenders = [
    ("Simple Web App", Tender(
        id="B1",
        title="Citizen Portal Development",
        description="Build responsive web portal for permit applications. Standard tech stack, 6-month timeline.",
        organization="City Services",
        deadline="2025-01-31",
        estimated_value="$450K"
    ), ["software"]),
    
    ("Complex ML System", Tender(
        id="B2",
        title="Predictive Maintenance ML Platform",
        description="ML system for predicting infrastructure failures. Real-time analytics, IoT integration.",
        organization="Transportation Authority",
        deadline="2025-03-01",
        estimated_value="$980K"
    ), ["ai", "software"]),
    
    ("Security Assessment", Tender(
        id="B3",
        title="Annual Penetration Testing",
        description="Quarterly pentesting of 20 web applications. Reports and remediation guidance.",
        organization="State IT Security",
        deadline="2024-12-01",
        estimated_value="$180K"
    ), ["cybersecurity"]),
]

print("COMPARATIVE RATING")
print("=" * 70)

ratings = []
for name, tender, categories in batch_tenders:
    rating = await rate_tender(tender, categories)
    ratings.append((name, tender, rating))

# Sort by overall score
ratings.sort(key=lambda x: x[2].overall_score, reverse=True)

print("\nRANKED OPPORTUNITIES:\n")
for i, (name, tender, rating) in enumerate(ratings, 1):
    print(f"{i}. {name}")
    print(f"   Overall: {rating.overall_score:.1f} | "
          f"Fit: {rating.strategic_fit:.1f} | "
          f"Win: {rating.win_probability:.1f} | "
          f"Effort: {rating.effort_required:.1f}")
    print(f"   Value: {tender.estimated_value}")
    print(f"   ‚Üí {rating.recommendation[:80]}...")
    print()

COMPARATIVE RATING

RANKED OPPORTUNITIES:

1. Complex ML System
   Overall: 6.5 | Fit: 8.0 | Win: 6.0 | Effort: 7.5
   Value: $980K
   ‚Üí GO - Strategic fit and client value outweigh the resource risks....

2. Security Assessment
   Overall: 6.5 | Fit: 8.0 | Win: 6.0 | Effort: 7.5
   Value: $180K
   ‚Üí GO - Strategic fit outweighs moderate competition risk....

3. Simple Web App
   Overall: 5.0 | Fit: 6.0 | Win: 4.0 | Effort: 8.0
   Value: $450K
   ‚Üí NO-GO - High effort and low win probability outweigh strategic fit....



## Step 9: Production-Ready Class

Wrap everything in a reusable class.

In [17]:
class RatingAgent:
    """
    Production-ready Rating Agent
    
    Evaluates tender opportunities on multiple dimensions:
    - Strategic fit with company capabilities
    - Win probability considering competition
    - Effort required vs resources available
    
    Returns structured rating with explanation.
    """
    
    def __init__(
        self, 
        base_url: str = BASE_URL,
        temperature: float = 0.1,
        min_score: float = 7.0
    ):
        self.base_url = base_url
        self.temperature = temperature
        self.min_score = min_score
    
    async def rate(
        self, 
        tender: Tender, 
        categories: List[str]
    ) -> RatingResult:
        """Rate a tender opportunity"""
        return await rate_tender(tender, categories, self.temperature)
    
    def should_proceed(self, rating: RatingResult) -> bool:
        """Business logic: should we generate bid documents?"""
        return rating.overall_score >= self.min_score

# Test the class
agent = RatingAgent(min_score=7.0)

test = Tender(
    id="CLASS-TEST",
    title="Cloud Security Audit",
    description="Comprehensive security audit of AWS infrastructure",
    organization="Tech Startup",
    deadline="2024-12-01",
    estimated_value="$120K"
)

result = await agent.rate(test, ["cybersecurity"])
proceed = agent.should_proceed(result)

print(f"Rating: {result.overall_score:.1f}/10")
print(f"Proceed to bid document: {proceed}")
print(f"Recommendation: {result.recommendation}")

Rating: 6.5/10
Proceed to bid document: False
Recommendation: NO-GO - High effort and moderate win probability outweigh strategic fit.


## üéâ Congratulations!

You built a sophisticated rating agent!

## What You Learned

1. **Multi-dimensional scoring** - Avoid single-number bias
2. **Balanced analysis** - Force consideration of both strengths and risks
3. **Company context matters** - Prompt includes capabilities and constraints
4. **Comparative ranking** - Multiple scores enable prioritization
5. **Business logic integration** - Thresholds and rules on top of AI

## Design Decisions

| Decision | Rationale |
|----------|----------|
| Temperature 0.1 | Consistent scoring across tenders |
| Multiple dimensions | More nuanced than single score |
| Required strengths AND risks | Prevents overly optimistic ratings |
| 0-10 scale | Intuitive and fine-grained |
| Company profile in prompt | Context for realistic assessment |

## Prompt Engineering Lessons

1. **Explicit calibration** - "Most should score 5-7, not 8-10"
2. **Multiple perspectives** - Force analysis from different angles
3. **Concrete criteria** - Not just "rate this", but "rate on X, Y, Z"
4. **Role definition** - "15 years experience" sets expectation
5. **Balanced instructions** - "Realistic, not optimistic"

## Next Steps

Now we can:
1. ‚úì Filter tenders for relevance
2. ‚úì Rate opportunities on multiple dimensions
3. ? Generate professional bid documents

Let's build the document generator!

‚û°Ô∏è Continue to `05_doc_generator.ipynb`