# **Tutorial 4** - Building AI Agents: From Conversations to Career Matching

Welcome to the world of intelligent AI agents! After getting on the leaderboard with Tutorial 3's quick submissions, it's time to build something actually good.

**Prerequisites**: You should have completed Tutorials 1-3, which covered setup, API basics, and your first submission.

## What You'll Learn

After completing this tutorial, you'll understand:
- **Multi-agent system architecture** using the Strands framework
- **Structured information extraction** from unstructured conversations  
- **Intelligent matching algorithms** that go beyond simple keyword matching
- **Best practices** for building efficient and cost-effective AI solutions

## The AI Agent Advantage

Remember Tutorial 3's random matcher? That probably scored around 5%. Now we'll build agents that actually understand context:

- **Contextual Understanding**: Agents grasp meaning, not just keywords
- **Dynamic Conversations**: They can ask follow-up questions and adapt to responses
- **Collaborative Intelligence**: Multiple agents can work together with specialized roles
- **Learning and Adaptation**: They improve their recommendations based on interactions

In our green careers challenge, we'll see how these capabilities create a powerful system for connecting Brazilian youth with sustainable career opportunities.

Let's build something intelligent!

## Agenda

1. [Environment Setup](#environment-setup)
   - Installing the Strands Agents framework
   - Quick system verification

2. [Understanding AI Agents](#understanding-ai-agents)
   - What makes AI agents different from traditional AI
   - Key components: Models, Tools, and Structured Output
   - Introduction to the Strands framework

3. [Core Components and Utilities](#core-utilities)
   - Essential functions for agent interaction
   - Data processing utilities for job and training information
   - Understanding the codebase architecture

4. [Data Structures and Models](#data-structures)
   - PersonaInfo, JobInfo, and TrainingInfo models
   - Structured data extraction with Pydantic
   - Why structured output is crucial for AI agents

5. [Building Your First AI Agent](#first-agent)
   - Creating conversation agents that can interview personas
   - Extracting information from natural language conversations
   - Managing conversation state and limits

6. [Multi-Agent Collaboration](#multi-agent-system)
   - Specialized information extraction agents
   - Intelligent matching and recommendation agents
   - Orchestrating multiple agents for complex workflows

7. [Real-World Application](#real-world-application)
   - Processing job descriptions and training programs at scale
   - Implementing sophisticated matching algorithms
   - Generating personalized career recommendations

8. [Testing and Optimization](#testing-optimization)
   - Validating your matching algorithm with real examples
   - Token usage optimization and cost management
   - Debugging AI agent interactions

9. [Conclusion and Next Steps](#conclusion)
   - Key learnings and best practices
   - Preparing for challenge submission
   - Advanced techniques for competitive advantage

---

# Environment Setup <a id='environment-setup'></a>

Since you've already set up your basic environment in previous tutorials, we only need to install the AI agents framework and verify our system is ready.

In [None]:
# Install required packages for AI agent development
# This may take a few minutes - be patient!
!pip install python-dotenv strands-agents[mistral] strands-agents-tools tqdm 

The [Strands Agents](https://strandsagents.com/latest/) framework provides powerful tools for creating and managing AI agents. We'll also need supporting libraries for progress tracking and data handling.

**Note**: If you completed Tutorial 2, most dependencies are already installed. This cell ensures we have the latest agent framework:

# Understanding AI Agents <a id='understanding-ai-agents'></a>

With our environment ready, let's dive into building AI agents with the [Strands Agents](https://strandsagents.com/latest/) framework and [Mistral](https://mistral.ai/) models.

**Important**: You can modify any code to optimize your solution, as long as you provide results in the required format (covered in the final sections).

## What Makes AI Agents Special?

AI agents are intelligent systems that can perceive, reason, and act autonomously. Unlike traditional rule-based systems, they adapt their behavior based on context and can work collaboratively to solve complex problems.

### Key Capabilities of Our AI Agents:

1. **Intelligent Conversations**: Conduct natural interviews with job seekers, asking follow-up questions based on responses
2. **Structured Extraction**: Convert unstructured text into precise, actionable data using Pydantic models
3. **Contextual Matching**: Understand semantic relationships between skills, not just keyword matches
4. **Collaborative Problem-Solving**: Multiple specialized agents work together on different aspects of the matching process

### The Strands Agents Framework

Strands provides enterprise-grade tools for building production AI agents:

- **Model Abstraction**: Works with multiple LLM providers (we use Mistral)
- **Structured Output**: Built-in support for Pydantic models ensures consistent data formats
- **Conversation Management**: Handles multi-turn dialogues with state preservation
- **Error Handling**: Robust retry mechanisms and graceful failure handling

### Our Multi-Agent Architecture

In this tutorial, we'll build a system with specialized agents:

- **Conversation Agent**: Interviews personas to gather career information
- **Extraction Agent**: Converts conversations into structured PersonaInfo objects
- **Job Matching Agent**: Finds suitable opportunities based on skills and preferences
- **Training Matching Agent**: Recommends learning paths to bridge skill gaps
- **Orchestration Logic**: Coordinates all agents to produce final recommendations

This division of labor makes each agent more focused and efficient, while the system as a whole handles complex workflows.

Let's start building:

In [None]:
# Core libraries for AI agent development
import json
import os
import sys
import boto3
import dotenv
import requests

from pathlib import Path
from typing import Dict, List, Optional, Tuple, Type, TypeVar, Any
from tqdm import tqdm

# Add parent directory to import our utilities
sys.path.append('..')
from src.utils import (
    save_json, 
    read_json, 
    load_file_content,
    get_job_paths,
    get_training_paths,
    sanity_check
)

# Structured data models
from pydantic import BaseModel, Field

# Strands Agents framework - our main AI agent library
from strands.agent import Agent
from strands.models.mistral import MistralModel

# AWS integration for API calls
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

# Type hints for better code quality
T = TypeVar('T')
M = TypeVar('M', bound=BaseModel)

# Load environment variables from our env file
dotenv.load_dotenv(".env")

print("✅ Imported shared utilities from src/")
print("💡 Notice: We're reusing code from Tutorial 3's infrastructure")

## System Health Check

Before we start building our AI agents, let's verify that our connection to the challenge infrastructure is working correctly. This sanity check ensures we can communicate with the persona API endpoints.

In [None]:
# Run the sanity check to verify our setup (reusing from src/utils.py)
sanity_check()

# Core Components and Utilities <a id='core-utilities'></a>

This section defines the essential building blocks for our AI agent system. These functions demonstrate key patterns in agent development: delegation of specialized tasks, structured data handling, and efficient batch processing.

## AI Agent Patterns in Our System:

### Agent Creation and Configuration
- **`get_agent()`**: Factory function that creates agents with specific roles and capabilities
- **System Prompts**: Define agent behavior, expertise, and response patterns

### Conversation Management
- **`send_message_to_chat()`**: Handles secure communication with persona agents via AWS API
- **`get_conversation()`**: Orchestrates multi-turn dialogues with goal-oriented questioning

### Intelligent Information Processing
- **`extract_info()`**: Uses specialized agents to convert unstructured text into structured data
- **`extract_info_to_json()`**: Batch processing with caching, retry logic, and progress tracking

### Data Discovery and Management
- **`load_file_content()`**, **`read_json()`**, **`save_json()`**: Standard I/O operations
- **`get_job_paths()`**, **`get_training_paths()`**: Discover available data files

## Why These Patterns Matter:

1. **Specialization**: Each agent has a clear, focused role
2. **Reusability**: Common patterns are abstracted into utility functions  
3. **Reliability**: Built-in error handling, retries, and caching
4. **Scalability**: Batch processing patterns handle large datasets efficiently
5. **Cost Control**: Caching prevents redundant API calls

Let's examine the implementation:

In [None]:
def get_agent(
    system_prompt: str = "",
    model_id: str = "mistral-medium-latest"
) -> Agent:
    """
    Create and configure an AI agent with specified capabilities.
    
    This is the core function for creating AI agents. The system prompt defines
    the agent's role, expertise, and behavior patterns. Different model IDs
    offer different capabilities and cost profiles.
    
    Args:
        system_prompt: Instructions defining the agent's role and behavior
        model_id: Mistral model to use (e.g., 'mistral-medium-latest', 'mistral-small-latest')
    
    Returns:
        Configured Agent ready for interaction
    """
    model = MistralModel(
        api_key=os.environ["MISTRAL_API_KEY"],
        model_id=model_id,
        stream=False  # Non-streaming for better control in batch operations
    )
    return Agent(model=model, system_prompt=system_prompt, callback_handler=None)

# Note: File I/O functions are now imported from src.utils
# This keeps our notebook focused on AI agent logic

# Data Structures and Models <a id='data-structures'></a>

**Structured output** is what separates professional AI agent systems from simple chatbots. Instead of generating free-form text that requires post-processing, our agents produce data that follows exact schemas, enabling reliable programmatic handling.

## The Pydantic Advantage

We use **Pydantic models** to define our data structures because they provide:

- **Type Safety**: Automatic validation ensures data integrity
- **Clear Contracts**: Agents know exactly what fields to populate
- **Error Prevention**: Invalid data is caught immediately, not in downstream processing
- **Documentation**: Field descriptions guide agent behavior
- **JSON Serialization**: Seamless conversion between Python objects and storage formats

## Our Data Models for Career Matching

### PersonaInfo: The Complete Candidate Profile
Captures everything needed to understand a job seeker's background, skills, and preferences. Notice how skills include both the skill name and proficiency level - this nuanced approach enables better matching than simple keyword lists.

### JobInfo: Structured Job Requirements  
Extracts the essential criteria for job opportunities. This standardized format lets us compare any job against any candidate programmatically.

### TrainingInfo: Learning Pathway Data
Describes what skills training programs develop and at what level. This enables us to recommend learning paths that bridge the gap between a candidate's current skills and job requirements.

### IDList: Agent Recommendations
A simple but powerful structure for when agents need to return multiple recommendations (job IDs, training IDs, etc.).

## The Power of Structured Thinking

Consider this transformation:

**Unstructured Agent Output:**
> "Maria seems like a good fit for environmental jobs in São Paulo since she has sustainability experience and project management skills, though she might need some training in data analysis."

**Structured Agent Output:**
```python
PersonaInfo(
    name="Maria Santos",
    skills=[("sustainability", "intermediate"), ("project_management", "beginner")],
    location="São Paulo",
    age="25",
    years_of_experience="3"
)
```

The structured approach enables:
- **Precise Matching**: Compare skill levels against job requirements
- **Automatic Processing**: No need to parse natural language descriptions
- **Quality Assurance**: Ensure all required information is captured
- **Scalable Operations**: Process thousands of candidates consistently

Let's examine our models:

In [None]:
class JobInfo(BaseModel):
    """
    Structured representation of job requirements and characteristics.
    
    This model captures the essential information needed to match candidates
    to job opportunities. The AI agent extracts this information from
    unstructured job descriptions.
    """
    required_skills: List[str] = Field(
        default_factory=list,
        description="List of required skills for the job."
    )
    location: str = Field(
        default="",
        description="Job location."
    )
    years_of_experience_required: str = Field(
        default="",
        description="Years of experience required to get this job."
    )

    def describe(self) -> str:
        """Create a human-readable description for AI agents to process."""
        skills = ', '.join(self.required_skills)
        return (
            f"Required skills: {skills}\nLocation: {self.location}\n"
            f"Years of experience required: {self.years_of_experience_required}"
        )


class TrainingInfo(BaseModel):
    """
    Represents training programs and the skills they provide.
    
    Each training program teaches specific skills at defined proficiency levels.
    This information is crucial for recommending learning paths to candidates.
    """
    skill_acquired_and_level: Tuple[str, str] = Field(
        default=("not specified", "not specified"),
        description="A pair of skill name and level. This tuple contains only 2 elements!"
    )

    def describe(self) -> str:
        """Create a human-readable description for AI agents to process."""
        skill = f'{self.skill_acquired_and_level[0]}: level {self.skill_acquired_and_level[1]}'
        return f"Acquired skills: {skill}"


class PersonaInfo(BaseModel):
    """
    Comprehensive profile of a job seeker extracted from conversations.
    
    This model captures all the essential information needed to match
    candidates with appropriate opportunities and training programs.
    """
    name: str = Field(
        default="",
        description="Persona's name"
    )
    skills: List[Tuple[str, str]] = Field(
        default_factory=list,
        description="List of pairs representing skills and its level."
    )
    location: str = Field(
        default="unknown",
        description="Current location"
    )
    age: str = Field(
        default="unknown",
        description="Age of the persona"
        )
    years_of_experience: str = Field(
        default="unknown",
        description="Years of experience in a field."
    )

    def describe(self) -> str:
        """Create a comprehensive human-readable profile for AI agents."""
        skills = ', '.join([
            f'{skill}: {level}'
            for skill, level in self.skills
        ])
        return (
            f"Name: {self.name}\n"
            f"Skills: {skills}\n"
            f"Location: {self.location}\n"
            f"Age: {self.age}\n"
            f"Years of experience: {self.years_of_experience}"
        )


class IDList(BaseModel):
    """
    Simple container for lists of identifiers.
    
    Used when AI agents need to return multiple recommendations,
    such as a list of suitable job IDs or training program IDs.
    """
    values: List[str] = Field(default_factory=list, description="A list of string IDs (job IDs, training IDs, etc.)")

# Building Your First AI Agent <a id='first-agent'></a>

Now we'll create AI agents that can conduct intelligent conversations with job seekers and extract structured information. This demonstrates the core capability that makes AI agents powerful: **adaptive interaction**.

## The Persona System

In our challenge, "personas" are AI agents representing job seekers with unique backgrounds, skills, and career goals. Our job is to build agents that can:

1. **Conduct Adaptive Interviews**: Ask relevant follow-up questions based on responses
2. **Extract Complete Profiles**: Gather all necessary information efficiently  
3. **Manage Conversation Limits**: Work within resource constraints effectively

## Critical Resource Constraints

⚠️ **Important**: There are strict limits on persona interactions:
- **5 conversations per day** per persona per team
- **20 messages maximum** per conversation
- **Conversation IDs** must be tracked to maintain context

These constraints are essential for:
- Fair competition between teams
- Preventing brute-force solutions  
- Managing API costs and system load
- Encouraging efficient conversation design

## Conversation Strategy

Effective conversation agents follow these principles:

### Goal-Oriented Questioning
- Have a clear information-gathering objective
- Ask open-ended questions to encourage detailed responses
- Follow up on vague or incomplete answers

### Efficiency Under Constraints
- Gather maximum information in minimum turns
- Prioritize essential data over nice-to-have details
- Use conversational flow to collect multiple data points per exchange

### Context Preservation
- Maintain conversation history across turns
- Use conversation IDs to resume interrupted sessions
- Build on previous responses rather than repeating questions

Let's see how this works in practice:

In [None]:
def send_message_to_chat(message: str, persona_id: str, conversation_id: str = None) -> Optional[Tuple[str, str]]:
    """
    Send a message to a persona agent and receive their response.
    
    This function handles the low-level communication with the challenge API,
    managing AWS authentication and conversation state. Each persona maintains
    their own conversation context across multiple turns.
    
    Args:
        message: The message to send to the persona
        persona_id: Unique identifier for the persona (e.g., 'persona_001')
        conversation_id: Optional conversation ID for maintaining context
        
    Returns:
        Tuple of (persona_response, conversation_id) or None if failed
    """
    url = "https://cygeoykm2i.execute-api.us-east-1.amazonaws.com/main/chat"
    
    # Set up AWS authentication
    session = boto3.Session(region_name='us-east-1')
    credentials = session.get_credentials()
    region = 'us-east-1'
    headers = {'Content-Type': 'application/json'}
    
    # Prepare the message payload
    payload = {
        "persona_id": persona_id,
        "conversation_id": conversation_id,
        "message": message
    }

    # Create and sign the AWS request
    request = AWSRequest(
        method='POST',
        url=url,
        data=json.dumps(payload),
        headers=headers
    )
    SigV4Auth(credentials, 'execute-api', region).add_auth(request)

    # Send the request
    response = requests.request(
        method=request.method,
        url=request.url,
        headers=dict(request.headers),
        data=request.body
    )

    if response.status_code != 200:
        return None
    
    response_json = response.json()
    return response_json['response'], response_json['conversation_id']


def get_conversation(persona_id: str, max_turns: int = 5, print_conversation: bool = False, print_token_no: bool = False) -> List[str]:
    """
    Conduct a structured conversation with a persona to gather career information.
    
    This function demonstrates key AI agent capabilities:
    - Goal-oriented dialogue management
    - Adaptive questioning based on responses  
    - Context maintenance across conversation turns
    - Efficient information extraction within token limits
    
    Args:
        persona_id: Unique identifier for the persona to interview
        max_turns: Maximum conversation turns to prevent infinite loops
        print_conversation: Whether to display the full conversation
        print_token_no: Whether to show token usage statistics
        
    Returns:
        List of conversation messages for further processing
    """
    # Define the agent's role and objectives
    system_prompt = """
    Continue to ask questions about this person - do not provide the jobs, trainings or anything yet.
    You are a helpful and empathetic assistant. Your goal is to engage in a natural conversation with a persona to gather the following information:
    - Their name
    - Their skills and **level of this skill**
    - Their current location
    - Their age
    - Their preferences
    - Years of experience

    Remember to always gather all of those information!
    Ask open-ended questions to encourage detailed responses. Be polite, patient, and adapt your questions based on their answers.
    If the persona is unsure or vague, gently probe for more details. Do not ask all questions at once; let the conversation flow naturally.
    **Do not comment on whatever the response is. Just ask questions to retrieve the information.**
    """
    conversation = []
    current_turn = 0
    total_tokens = 0
    conversation_agent = get_agent(system_prompt)
    conversation_id = None

    # greeting
    agent_message = "Hello! I'm here to help you find the best job or training opportunities. Can you tell me your name?"
    conversation_agent.messages = [{
        "role": "assistant",
        "content": [{
            "text": agent_message
        }]
    }]
    conversation.append(f"Assistant: {agent_message}")

    # Conduct the conversation
    while current_turn < max_turns:
        # Send message to persona and get response
        resp = send_message_to_chat(
            agent_message,
            persona_id,
            conversation_id
        )
        
        if resp is None:
            print(f"⚠️ Persona {persona_id} did not respond - ending conversation")
            break
            
        user_response, conversation_id = resp
        conversation.append(f"User: {user_response}")
        
        # Generate agent's next question/response
        agent_response = conversation_agent(user_response)
        total_tokens = agent_response.metrics.accumulated_usage['totalTokens']
        agent_message = str(agent_response)
        conversation.append(f"Assistant: {agent_message}")
        
        current_turn += 1
    
    # Optional debugging output
    if print_conversation:
        print('\n=== CONVERSATION ===')
        print('\n'.join(conversation))
        print('===================\n')
        
    if print_token_no:
        print(f'💡 Total tokens used: {total_tokens}')
        
    return conversation


# Multi-Agent Collaboration <a id='multi-agent-system'></a>

The real power of AI agents emerges when they work together. Our career matching system uses **specialized agents** that collaborate to solve different aspects of the challenge, each optimized for specific tasks.

## Agent Specialization Strategy

### Information Extraction Agents
**Role**: Convert unstructured text into structured data
**Why Specialized**: These agents are tuned for precise data extraction, with prompts optimized for identifying specific field types (skills, locations, experience levels)
**Input**: Raw conversations, job descriptions, training materials  
**Output**: Structured Pydantic objects

### Matching Intelligence Agents  
**Role**: Understand relationships between candidates and opportunities
**Why Specialized**: These agents focus on semantic understanding - they know that "environmental sustainability" relates to "green energy" and "climate action"
**Input**: Structured candidate and opportunity data
**Output**: Ranked recommendation lists

### Career Path Agents
**Role**: Bridge skill gaps with training recommendations
**Why Specialized**: These agents understand learning progressions and skill prerequisites
**Input**: Current skills vs. target job requirements  
**Output**: Personalized learning pathways

## Collaborative Intelligence Patterns

### 1. Pipeline Processing
Each agent performs one step in a sequential process:
```
Raw Conversation → Extraction Agent → PersonaInfo → Matching Agent → Job Recommendations
```

### 2. Cross-Validation  
Multiple agents can process the same input to improve accuracy:
```
Job Description → [Agent A, Agent B, Agent C] → Consensus JobInfo
```

### 3. Hierarchical Decision Making
High-level agents delegate to specialists:
```
Career Advisor Agent → [Job Matcher, Training Recommender, Location Analyzer] → Integrated Plan
```

## Enterprise-Grade Batch Processing

The `extract_info_to_json()` function demonstrates production AI patterns:

- **Intelligent Caching**: Avoids redundant API calls by storing results
- **Graceful Error Handling**: Retries failed extractions with exponential backoff  
- **Progress Monitoring**: Shows processing status for large datasets
- **Incremental Processing**: Resumes from interruption points
- **Cost Optimization**: Batches operations to minimize API overhead

This approach is crucial when processing hundreds of documents efficiently and reliably while managing costs.

In [None]:
def extract_info(model: Type[M], text: str) -> M:
    extraction_agent = get_agent()
    return extraction_agent.structured_output(output_model=model, prompt=text)


def extract_info_from_conversation(conversation: List[str]) -> PersonaInfo:
    text = '\n'.join(conversation)
    return extract_info(PersonaInfo, text)


def extract_info_from_job_path(path: str | Path) -> JobInfo:
    text = load_file_content(path)
    return extract_info(JobInfo, text)


def extract_info_from_training_path(path: str | Path) -> TrainingInfo:
    text = load_file_content(path)
    return extract_info(TrainingInfo, text)


def extract_info_to_json(
    model: BaseModel,
    description_paths: List[str | Path],
    save_path: str | Path,
    cache_period: int = 20,
    max_retries: int = 3
):
    save_path = Path(save_path)

    if not save_path.exists():
        save_path.touch()
        save_json(save_path, {})

    extracted_data = read_json(save_path)
    description_paths = [Path(path) for path in description_paths]
    print(f'Total descriptions for {model.__name__}: {len(description_paths)}')
    print(f'Extracted infos: {len(extracted_data)}')

    counter = 0
    for path in description_paths:
        id_ = path.stem
        retries = 0
        err = None
        if id_ not in extracted_data:
            text = load_file_content(path)
            while retries < max_retries:
                try:
                    info = extract_info(model, text)
                    extracted_data[id_] = info.model_dump_json()
                    counter += 1
                    break
                except ValueError as e:
                    retries += 1
                    err = e
            else:
                print(f'Error for id: {id_}', err)
        if counter % cache_period == 1:
            save_json(save_path, extracted_data)
            print(len(extracted_data))
    save_json(save_path, extracted_data)


def extract_jobs_info_to_json(
    save_path: str | Path,
    cache_period: int = 20,
    max_retries: int = 3
):
    job_paths = get_job_paths()
    extract_info_to_json(
        model=JobInfo,
        description_paths=job_paths,
        save_path=save_path,
        cache_period=cache_period,
        max_retries=max_retries
    )


def extract_trainings_info_to_json(
    save_path: str | Path,
    cache_period: int = 20,
    max_retries: int = 3
):
    training_paths = get_training_paths()
    extract_info_to_json(
        model=TrainingInfo,
        description_paths=training_paths,
        save_path=save_path,
        cache_period=cache_period,
        max_retries=max_retries
    )


# Real-World Application: Intelligent Career Matching <a id='real-world-application'></a>

This is where our AI agents solve the core challenge: **intelligent matching that goes beyond keywords**. Our system understands context, relationships, and nuances that traditional matching systems miss.

## The Intelligence Advantage

### Traditional Keyword Matching:
- Person has "sustainability" → Job requires "environmental" → ❌ **No Match**
- Rigid, literal interpretation leads to missed opportunities

### AI Agent Matching:
- Agent understands semantic relationships: sustainability ↔ environmental ↔ green energy
- Considers skill transferability, location flexibility, experience scalability  
- ✅ **Intelligent Match** with confidence reasoning

## Multi-Layer Matching Intelligence

### 1. Semantic Understanding Layer
Our agents don't just match words - they understand meaning:
- "Organic farming" connects to "sustainable agriculture" and "food systems"
- "Community outreach" relates to "stakeholder engagement" and "social impact"
- "Data analysis" applies across environmental monitoring, impact measurement, and policy research

### 2. Context-Aware Evaluation Layer  
Agents consider multiple factors simultaneously:
- **Skill Level Alignment**: Does the candidate's proficiency match job requirements?
- **Growth Potential**: Can this person develop into the role with training?
- **Geographic Feasibility**: Are location constraints realistic?
- **Career Trajectory**: Does this opportunity align with their goals?

### 3. Learning Path Intelligence Layer
For skill gaps, agents recommend strategic development:
- **Prerequisite Analysis**: What foundational skills are needed first?
- **Progressive Learning**: How to build from current skills to job requirements?
- **Time-to-Competency**: Realistic timelines for skill development

## Why This Approach Works at Scale

1. **Contextual Intelligence**: Understands meaning behind requirements and capabilities
2. **Holistic Evaluation**: Balances multiple factors for better recommendations  
3. **Personalized Pathways**: Each recommendation is tailored to individual circumstances
4. **Scalable Processing**: Same intelligent matching works for thousands of candidates
5. **Continuous Learning**: Agent performance improves with more interactions

The result is a career guidance system that thinks like an experienced career counselor, but operates at the scale and speed of modern AI.

Let's see this intelligence in action:

In [None]:
def find_job_matches_for_persona(
    persona_info: PersonaInfo,
    jobs_data: Dict[str, JobInfo],
) -> List[str]:
    jobs_text = "\n".join([
        f'{job_id}: {job_info.describe()}'
        for job_id, job_info in jobs_data.items()
    ])
    system_prompt = f"""
    You have a list of all available jobs. Given a candidate info provide
    a list of up to 4 job IDs that would match that candidate.
    The list might be empty if no match is found:
    {jobs_text}
    """
    agent = get_agent(system_prompt=system_prompt)
    res = agent.structured_output(IDList, persona_info.describe())
    return res.values


def find_training_matches_for_persona(
    persona_info: PersonaInfo,
    trainings_data: Dict[str, TrainingInfo]
) -> List[str]:
    trainings_text = "\n".join([
        f'{training_id}: {training_info.describe()}'
        for training_id, training_info in trainings_data.items()
    ])
    system_prompt = f"""
    You have a list of all available trainings. Given a candidate info provide
    a list of up to 4 training IDs that would match that candidate.
    The list might be empty if no match is found:\n
    {trainings_text}
    """
    agent = get_agent(system_prompt=system_prompt)
    res = agent.structured_output(IDList, persona_info.describe())
    return res.values


def find_training_matches_for_job(
    job_info: JobInfo,
    trainings_data: Dict[str, TrainingInfo],
):
    trainings_text = "\n".join([
        f'{training_id}: {training_info.describe()}'
        for training_id, training_info in trainings_data.items()
    ])
    system_prompt = f"""
    You have a list of all available trainings. Given a job info provide
    a list of up to 4 training IDs that would be nice to have before
    taking that job. The list may be empty if no training fit:\n
    {trainings_text}
    """
    agent = get_agent(system_prompt=system_prompt)
    res = None
    err = None
    max_retries = 3
    retries = 0
    while retries < max_retries:
        try:
            res = agent.structured_output(IDList, job_info.describe())
            break
        except ValueError as e:
            retries += 1
            err = e
    if res is None:
        raise ValueError(f'Agent could not get matches for job {job_info}. Err: {err}')
    return res.values

# Extracting job infos

In [None]:
jobs_save_path = './extracted_jobs_info.json'
extract_jobs_info_to_json(jobs_save_path)
jobs_info = read_json(jobs_save_path)
jobs_info = {
    job_id: JobInfo.model_validate_json(job_data)
    for job_id, job_data in jobs_info.items()
}

# Extracting trainings info

In [None]:
trainings_save_path = './extracted_trainings_info.json'
extract_trainings_info_to_json(trainings_save_path)
trainings_info = read_json(trainings_save_path)
trainings_info = {
    training_id: TrainingInfo.model_validate_json(training_data)
    for training_id, training_data in trainings_info.items()
}

# Match trainings to jobs

In [None]:
job_training_matches_path = Path('./job_training_matches.json')
job_training_matches = {}
for job_id, job_info in tqdm(jobs_info.items()):
    training_ids = find_training_matches_for_job(job_info, trainings_info)
    job_training_matches[job_id] = training_ids
save_json(job_training_matches_path, job_training_matches)

# Testing and Optimization <a id='testing-optimization'></a>

Testing AI agent systems requires evaluating both **technical correctness** and **intelligent behavior**. We need to verify that agents make sensible decisions, not just that code executes without errors.

## Testing AI Agent Intelligence

### Quality Metrics to Evaluate:
- ✅ **Relevance**: Do job recommendations align with candidate skills and interests?
- ✅ **Feasibility**: Are experience requirements and locations realistic?  
- ✅ **Growth Potential**: Do training recommendations create logical skill progression?
- ✅ **Semantic Understanding**: Does the system recognize skill relationships and transferability?

### Edge Cases to Test:
- **Minimal Information**: How does the system handle incomplete profiles?
- **Conflicting Preferences**: What happens with unrealistic expectations?
- **Unusual Skill Combinations**: Can agents find opportunities for unique backgrounds?
- **Geographic Constraints**: How does location filtering affect recommendations?

## Performance and Cost Optimization

### Token Efficiency Strategies:
1. **Prompt Optimization**: Concise, focused prompts reduce token usage
2. **Batch Processing**: Group similar operations to minimize API overhead
3. **Intelligent Caching**: Store results to avoid redundant processing
4. **Model Selection**: Use smaller models for simple tasks, larger for complex reasoning

### Debugging AI Agent Behaviors:
- **Conversation Logging**: Track agent decisions and reasoning
- **Intermediate Results**: Examine extracted data quality at each stage
- **A/B Testing**: Compare different prompt strategies
- **Error Analysis**: Understand when and why agents make poor decisions

## Competitive Intelligence

For the challenge, consider these optimization areas:
- **Conversation Efficiency**: Gather maximum information in minimum turns
- **Matching Sophistication**: Find connections that other teams miss
- **Recommendation Quality**: Provide actionable, personalized guidance
- **System Reliability**: Handle edge cases gracefully

Let's test our system with a realistic example:

In [None]:
# Create a test persona with environmental/sustainability focus
persona = PersonaInfo(
    name='Pedro Araújo',
    skills=[('Food Safety', 'Intermediate'), ('Food Sustainability', 'Intermediate')],
    location='Brasília',
    age='16',  # Note: Young age may affect job eligibility
    years_of_experience='1'
)

print("🧪 Testing with Sample Persona:")
print("=" * 40)
print(persona.describe())
print("=" * 40)

# Test job matching
print("\n🎯 Finding Job Matches...")
persona_jobs = find_job_matches_for_persona(persona, jobs_info)
print(f"Recommended Jobs: {persona_jobs}")

# Test training matching  
print("\n📚 Finding Training Matches...")
persona_trainings = find_training_matches_for_persona(persona, trainings_info)
print(f"Recommended Trainings: {persona_trainings}")

print("\n💡 Analysis:")
print("- Are the job recommendations appropriate for someone with intermediate food safety/sustainability skills?")
print("- Do the training suggestions help develop complementary skills?") 
print("- How does the young age (16) affect job eligibility?")
print("- Are there regional opportunities available in Brasília?")

# Collecting conversations

In [None]:
persona_ids = [f'persona_{i:03}' for i in range(1, 101)]
cache_period = 4

personas_save_path = Path('./extracted_personas_info.json')
if not personas_save_path.exists():
    personas_save_path.touch()
    save_json(personas_save_path, {})

persona_infos = read_json(personas_save_path)
print(f'Total conversations for personas: {len(persona_infos)}')
print(f'Collected conversations: {len(persona_infos)}')

counter = 0
for persona_id in persona_ids:
    if persona_id not in persona_infos:
        conversation = get_conversation(persona_id, max_turns=2)
        persona_info = extract_info_from_conversation(conversation)
        persona_infos[persona_id] = persona_info.model_dump_json()
        counter += 1
    if counter % cache_period == 1:
        save_json(personas_save_path, persona_infos)
        print(len(persona_infos))
save_json(personas_save_path, persona_infos)

persona_infos = {
    persona_id: PersonaInfo.model_validate_json(persona_info)
    for persona_id, persona_info in persona_infos.items()
}


# Generating final data

Each line in the sample represents a single prediction result for a persona, formatted as a JSON object (JSON Lines format). The fields include:

- `persona_id`: The unique identifier for the persona.
- `predicted_type`: The type of recommendation made. It can be `"jobs+trainings"`, `"trainings_only"`, or `"awareness"`.
- Depending on the `predicted_type`, additional fields are included:
    - For `"jobs+trainings"`: a `jobs` list, where each item contains a `job_id` and a list of `suggested_trainings` for that job.
    - For `"trainings_only"`: a `trainings` list with recommended training IDs.
    - For `"awareness"`: a `predicted_items` field, e.g., `"too_young"`.

**Why this format?**

- **Consistent structure** ensures our endpoint can reliably parse and validate each prediction.
- **Flexible fields** support different recommendation types while keeping the schema simple and machine-readable.
- **Automation-ready**: This format enables direct ingestion into evaluation or deployment systems without manual intervention.

Participants must use this format to ensure compatibility with the challenge's automated result validation and scoring systems.

Example result:
```json
{"persona_id": "persona_001", "predicted_type": "trainings_only", "trainings": ["t1", "t2"]}
{"persona_id": "persona_002", "predicted_type": "jobs+trainings", "jobs": [{"job_id": "j1", "suggested_trainings": ["t3"]},{"job_id": "j2", "suggested_trainings": ["t34"]},{"job_id": "j7", "suggested_trainings": ["t1", "t33"]}]}
{"persona_id": "persona_127", "predicted_type": "awareness", "predicted_items": "too_young"}
```

In [None]:
results = []
for persona_id, persona_info in tqdm(persona_infos.items()):
    jobs = find_job_matches_for_persona(persona_info, jobs_info)
    data = {'persona_id': persona_id}
    if jobs and any(job_training_matches.get(job_id) for job_id in jobs):
        data['predicted_type'] = 'jobs+trainings'
        data['jobs'] = [
            {
                'job_id': job_id,
                'suggested_trainings': job_training_matches[job_id]
            }
            for job_id in jobs
        ]
    elif not jobs:
        trainings = find_training_matches_for_persona(persona_info, jobs_info)
        data['predicted_type'] = 'trainings_only'
        data['trainings'] = trainings
    else:
        data['predicted_type'] = 'awareness'
        data['predicted_items'] = ''
    results.append(data)

In [None]:
results[0]

In [None]:
results_path = Path('./results.json')
save_json(results_path, results)

# Sending the results
After your results are ready you have to send them for evaluation. The function that does that is defined below. If the submission is correct the response status code will be 200. After a while you should see how your solution socre on the main GDSC page.

In [None]:
# Use the send_results function from our utilities
from src.utils import send_results

# This is cleaner than having submission logic in notebooks
# See src/utils.py for the implementation

In [None]:
results = read_json(results_path)

# Submit using our shared utility function
response = send_results(results)

if response and response.status_code == 200:
    try:
        print(response.json()['message'])
    except:
        print("Submission successful!")

# Conclusion and Next Steps <a id='conclusion'></a>

Congratulations! You've built a sophisticated AI agent system that can conduct intelligent conversations, extract structured information, and make contextual recommendations for career matching.

## Key Concepts You've Mastered

### AI Agent Architecture
- **Specialized Agents**: Each agent focuses on specific tasks (conversation, extraction, matching)
- **Collaborative Intelligence**: Multiple agents work together to solve complex problems
- **Structured Output**: Pydantic models ensure consistent, processable data

### Production-Ready Patterns
- **Resource Management**: Working within API limits and conversation constraints
- **Error Handling**: Robust retry mechanisms and graceful failure handling  
- **Batch Processing**: Efficient handling of large datasets with caching and progress tracking
- **Cost Optimization**: Strategic model selection and token usage management

### Intelligent Matching Beyond Keywords
- **Semantic Understanding**: Agents grasp meaning and relationships between concepts
- **Context-Aware Decisions**: Multiple factors considered simultaneously for better recommendations
- **Personalized Pathways**: Each recommendation tailored to individual circumstances

## Best Practices for Competitive Success

1. **Optimize Conversation Efficiency**: Design agents that gather maximum information in minimum turns
2. **Enhance Matching Intelligence**: Find connections and opportunities that simpler systems miss
3. **Implement Robust Error Handling**: Ensure your system works reliably under all conditions
4. **Monitor Performance Metrics**: Track token usage, response times, and recommendation quality
5. **Test Edge Cases**: Validate behavior with unusual inputs and constraint scenarios

## Preparing for Challenge Submission

Your AI agent system is now ready to process the full challenge dataset. Remember:
- **Results Format**: Ensure your output matches the required JSON Lines structure
- **Quality over Quantity**: Focus on making intelligent recommendations rather than maximizing matches
- **Resource Management**: Monitor API usage and optimize for cost-effectiveness
- **Testing Validation**: Verify your system with diverse persona profiles before final submission

## Advanced Techniques to Explore

- **Ensemble Methods**: Combining multiple agent recommendations for improved accuracy
- **Dynamic Prompt Engineering**: Adapting agent behavior based on persona responses
- **Hierarchical Agent Systems**: Using supervisor agents to coordinate specialist agents
- **Feedback Loops**: Learning from persona interactions to improve future conversations

You now have the foundation to build intelligent, scalable AI agent systems. The techniques you've learned extend far beyond job matching - they're applicable to any domain requiring intelligent automation and human-AI collaboration.

Good luck with the challenge!