# **Tutorial 3** - Introduction to AI Agents: Green Agents of Change

Welcome to the exciting world of AI agents! In this tutorial, we'll dive deep into building intelligent systems that can help solve real-world problems, specifically focusing on our challenge: connecting young people in Brazil with green and future-proof career opportunities.

## What You'll Learn

After completing this tutorial, you'll understand:
- **How to build multi-agent systems** using the Strands Agents framework
- **Real-world application** of AI agents for job matching and career guidance
- **Best practices** for developing sustainable and efficient AI solutions

## The Challenge Context

This year's Global Data Science Challenge focuses on empowering youth across Brazil to explore, discover, and pursue green and future-proof jobs. We're collaborating with AI agents to:

- Sift through thousands of job descriptions and training opportunities
- Match people to roles that fit their preferences and potential
- Recommend concrete learning paths to help them reach their goals

This involves sophisticated retrieval over real data, thoughtful matching and ranking algorithms, and smart prompt engineering - all powered by AI agents working together.

## Why AI Agents Matter

AI agents represent the next evolution in artificial intelligence. Unlike simple chatbots or traditional AI systems that follow fixed rules, AI agents can:

- **Adapt and Learn**: They can adjust their behavior based on new information and changing circumstances
- **Work Collaboratively**: Multiple agents can work together, each with specialized roles and expertise
- **Use Tools**: They can interact with external systems, databases, and APIs to accomplish complex tasks
- **Make Decisions**: They can evaluate options and choose the best course of action autonomously

In our green careers challenge, we'll see how these capabilities come together to create a powerful system for career guidance and opportunity matching.

Let's get started!

## Agenda

1. [Environment Setup](#environment-setup)
   - Installing required libraries and dependencies
   - Setting up API keys and environment variables
   - Data download and sanity checks

2. [Understanding AI Agents](#understanding-ai-agents)
   - What are AI agents and how do they work?
   - Key components: Models, Tools, and Structured Output
   - The Strands Agents framework

3. [Core Components and Utilities](#core-utilities)
   - Essential functions for agent interaction
   - Data loading and processing utilities
   - Understanding the codebase structure

4. [Data Structures and Models](#data-structures)
   - PersonaInfo, JobInfo, and TrainingInfo models
   - Structured data extraction with Pydantic
   - Why structured output matters

5. [Building Your First AI Agent](#first-agent)
   - Creating conversation agents
   - Extracting information from conversations
   - Testing agent interactions

6. [Multi-Agent Collaboration](#multi-agent-system)
   - Information extraction agents
   - Matching and recommendation agents
   - Orchestrating multiple agents together

7. [Real-World Application](#real-world-application)
   - Processing job descriptions and training programs
   - Matching personas to opportunities
   - Generating final recommendations

8. [Testing and Optimization](#testing-optimization)
   - Testing your matching algorithm
   - Token usage and efficiency considerations
   - Debugging AI agent systems

9. [Conclusion and Next Steps](#conclusion)
   - Summary of key learnings
   - Best practices for AI agent development
   - Preparing for the challenge submission

10. [Appendix](#appendix)
    - Framework comparisons and alternatives
    - Advanced techniques and resources
    - Troubleshooting guide

---

# Environment Setup <a id='environment-setup'></a>

Before we can start working with AI agents, we need to set up our development environment. This includes installing the necessary libraries, configuring API keys, and ensuring we have access to the challenge data.

## Installing Required Libraries

The most important library for this year's challenge is [Strands Agents](https://strandsagents.com/latest/), which provides a powerful framework for creating and managing AI agents. We'll also need several supporting libraries for data handling and API interactions.

The cell below needs to be run just once on every JupyterLab restart. To run it, click on it and press "Shift + Enter" or press the "▷" symbol on the top bar.

In [None]:
# Install required packages for AI agent development
# This may take a few minutes - be patient!
!pip install python-dotenv strands-agents[mistral] strands-agents-tools tqdm 

Let's understand what each library does:

- **`python-dotenv`**: Manages environment variables securely, keeping sensitive information like API keys out of your code
- **`strands-agents[mistral]`**: The core framework for building AI agents, with Mistral LLM integration
- **`strands-agents-tools`**: Additional tools and utilities for agent development
- **`tqdm`**: Provides progress bars for long-running operations - very helpful when processing large datasets

The installation process may take a few minutes. You'll see various dependency installations, which is normal.

## Downloading Challenge Data

Next, we need to download the challenge dataset, which includes job descriptions, training programs, and persona profiles. You may have already done this in previous tutorials - if so, you can skip this step.

The dataset contains three main components:
- **Jobs**: Structured job offers in JSON format stored in Markdown files
- **Training Programs**: Relevant training opportunities with skills and levels

If you haven't downloaded the data yet, run the cell below:

In [None]:
# Download the challenge dataset from AWS S3
# This command downloads all necessary data files quietly
!aws s3 cp s3://gdsc-25-data-bucket/ . --recursive --quiet

## Setting Up Environment Variables

Security is crucial when working with AI agents and APIs. We use environment variables to store sensitive information like API keys, ensuring they never appear directly in our code.

Create an `env` file in your working directory to store sensitive variables. This file should contain your API keys and other configuration settings.

Method on how to get the Mistral API Key was covered in the previous tutorials.

Your `env` file should look like this:

```bash
MISTRAL_API_KEY=your_actual_api_key_here
```

**Important Security Notes:**
- Never commit the `env` file to version control
- Keep your API keys secret and secure
- If you suspect your key has been compromised, regenerate it immediately

**Note**: After updating the env file, you may need to restart your notebook for changes to take effect. Click the "⟳" symbol on the top bar to restart.

## Understanding Token Usage and Costs

When working with AI agents, it's important to understand that each interaction with the language model consumes "tokens" - the basic units of text processing. This has implications for both performance and cost:

- **Tokens**: Words and parts of words that the model processes
- **Cost**: Most AI models charge per token used
- **Efficiency**: Better prompts and structured approaches can reduce token usage

Throughout this tutorial, we'll show you how to build efficient AI agents that accomplish more with fewer tokens, making your solutions both faster and more cost-effective.

---

# Understanding AI Agents <a id='understanding-ai-agents'></a>

Now that our environment is set up, let's dive into the core concepts behind AI agents and understand what makes them so powerful for solving complex problems like career matching.

# Basic solution
After the environment setup is completed we can move forward and get to know the code. As mentioned before this year we are using [Strands Agents](https://strandsagents.com/latest/) as our main library used for interacting with LLMs and creating AI agents. The models that we are going to use come from [Mistral](https://mistral.ai/).

You can modify anything in the code you'd like as long as you provide the final data in the required format (more about the required format in upcomng sections).

## AI Agents

AI agents are intelligent systems that can perceive their environment, make decisions, and take actions to achieve specific goals. Unlike traditional software that follows predefined rules, AI agents can adapt, learn, and work autonomously.

### Key Characteristics of AI Agents:

1. **Autonomy**: They can operate independently without constant human oversight
2. **Reactivity**: They respond to changes in their environment
3. **Pro-activeness**: They take initiative to achieve their goals
4. **Social Ability**: They can interact with other agents and humans

### How AI Agents Differ from Traditional AI:

- **Traditional AI**: Processes input → produces output (like a function)
- **AI Agents**: Continuously interact with environment → make decisions → take actions → learn from results

### Components of Our AI Agent System:

1. **Language Models**: The "brain" that provides reasoning and language understanding
2. **Structured Output**: Ensures consistent, processable responses using Pydantic models
3. **Tools and APIs**: Allow agents to interact with external systems
4. **Conversation Management**: Handles multi-turn interactions with personas

In our career matching challenge, we'll use these components to create agents that can:
- Conduct natural conversations with job seekers
- Extract structured information from unstructured text
- Match people to appropriate opportunities
- Generate personalized recommendations

Let's start by importing the necessary libraries and setting up our foundation:

In [None]:
# Core libraries for AI agent development
import json
import os
import boto3
import dotenv
import requests

from pathlib import Path
from typing import Dict, List, Optional, Tuple, Type, TypeVar, Any
from tqdm import tqdm

# Structured data models
from pydantic import BaseModel, Field

# Strands Agents framework - our main AI agent library
from strands.agent import Agent
from strands.models.mistral import MistralModel

# AWS integration for API calls
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

# Type hints for better code quality
T = TypeVar('T')
M = TypeVar('M', bound=BaseModel)

# Load environment variables from our env file
dotenv.load_dotenv("env")

True

## System Health Check

Before we start building our AI agents, let's verify that our connection to the challenge infrastructure is working correctly. This sanity check ensures we can communicate with the persona API endpoints.

In [None]:
def sanity_check():
    """
    Verify connection to the challenge API infrastructure.
    
    This function tests our AWS credentials and API access by making a health check
    request to the challenge endpoints. This ensures we can interact with the persona
    agents that represent our job seekers.
    """
    base_url = "https://cygeoykm2i.execute-api.us-east-1.amazonaws.com/main/health"
    
    # Set up AWS session and credentials
    session = boto3.Session(region_name='us-east-1')
    credentials = session.get_credentials()
    region = 'us-east-1'
    headers = {'Content-Type': 'application/json'}
    
    # Create and sign the AWS request
    request = AWSRequest(method='GET', url=base_url, data=None, headers=headers)
    SigV4Auth(credentials, 'execute-api', region).add_auth(request)
    
    # Make the request
    response = requests.request(
        method=request.method,
        url=request.url,
        headers=dict(request.headers),
        data=request.body
    )

    print(f"Status: {response.status_code}")
    if response.status_code == 200:
        print("✅ Sanity check passed! You're ready to interact with personas.")
    else:
        print("❌ Sanity check failed!")
        print(f"Response: {response.text}")
        print("Please check your AWS credentials and network connection.")

In [None]:
# Run the sanity check to verify our setup
sanity_check()

Status: 200
Sanity check passed!


# Core Components and Utilities <a id='core-utilities'></a>

This section defines the essential building blocks for our AI agent system. These utility functions handle everything from creating agents to managing data files. Understanding these components is crucial for building effective AI solutions.

## Key Functions Overview:

### Agent Management
- **`get_agent()`**: Creates and configures AI agents with specific roles and capabilities
- **`send_message_to_chat()`**: Handles communication with persona agents
- **`get_conversation()`**: Manages multi-turn conversations with personas

### Data Management
- **`load_file_content()`**: Reads raw content from files (job descriptions, training programs)
- **`read_json()` / `save_json()`**: Handle JSON data persistence for caching results
- **`get_job_paths()` / `get_training_paths()`**: Discover available data files

### Information Extraction
- **`extract_info()`**: Uses AI agents to extract structured data from unstructured text
- **`extract_info_from_conversation()`**: Processes conversation transcripts
- **`extract_info_to_json()`**: Batch processes files with caching and error handling

Let's examine the implementation details:

In [None]:
def get_agent(
    system_prompt: str = "",
    model_id: str = "mistral-medium-latest"
) -> Agent:
    """
    Create and configure an AI agent with specified capabilities.
    
    This is the core function for creating AI agents. The system prompt defines
    the agent's role, expertise, and behavior patterns. Different model IDs
    offer different capabilities and cost profiles.
    
    Args:
        system_prompt: Instructions defining the agent's role and behavior
        model_id: Mistral model to use (e.g., 'mistral-medium-latest', 'mistral-small-latest')
    
    Returns:
        Configured Agent ready for interaction
    """
    model = MistralModel(
        api_key=os.environ["MISTRAL_API_KEY"],
        model_id=model_id,
        stream=False  # Non-streaming for better control in batch operations
    )
    return Agent(model=model, system_prompt=system_prompt, callback_handler=None)


def load_file_content(path: str | Path) -> str:
    """
    Load raw text content from a file.
    
    Used for reading job descriptions, training programs, and other text files
    that need to be processed by our AI agents.
    """
    path = Path(path)
    with path.open('r', encoding='utf-8') as file:
        return file.read()


def read_json(path: str | Path):
    """
    Load JSON data from a file.
    
    Essential for loading cached results, configuration files, and structured data.
    """
    path = Path(path)
    with path.open('r', encoding='utf-8') as file:
        data = json.load(file)
    return data


def save_json(path: str | Path, data: Dict | List):
    """
    Save data to a JSON file.
    
    Used for caching extracted information to avoid redundant API calls and
    preserve results between sessions.
    """
    path = Path(path)
    with path.open('w', encoding='utf-8') as file:
        json.dump(data, file)


def get_job_paths() -> List[Path]:
    """
    Discover all job description files in the dataset.
    
    Scans the jobs directory for Markdown files containing structured job information.
    Each file represents one job opportunity in our challenge dataset.
    """
    data_dir = Path('./data/jobs')
    paths = []
    for file in data_dir.iterdir():
        if file.suffix == '.md':
            paths.append(file)
    return paths


def get_training_paths() -> List[Path]:
    """
    Discover all training program files in the dataset.
    
    Scans the trainings directory for Markdown files containing training opportunities.
    Each file describes skills, levels, and learning outcomes.
    """
    data_dir = Path('./data/trainings')
    paths = []
    for file in data_dir.iterdir():
        if file.suffix == '.md':
            paths.append(file)
    return paths

# Data Structures and Models <a id='data-structures'></a>

One of the key advantages of modern AI agent systems is their ability to produce **structured output**. Instead of just generating free-form text, our agents can create data that follows specific schemas, making it easy to process programmatically.

We use **Pydantic models** to define these data structures. Pydantic provides:

- **Type validation**: Ensures data meets our requirements
- **Automatic parsing**: Converts AI-generated text into Python objects
- **Documentation**: Clear field descriptions help agents understand what to generate
- **Error handling**: Graceful failures when data doesn't match expected format

## Our Core Data Models

### PersonaInfo
Represents information about job seekers extracted from conversations. This structured approach ensures we capture all essential details consistently.

### JobInfo  
Contains the key requirements and characteristics of job opportunities, extracted from job descriptions.

### TrainingInfo
Describes training programs including skills acquired and proficiency levels.

### IDList
A simple container for lists of identifiers, used when agents need to return multiple recommendations.

## Why Structured Output Matters

In traditional AI systems, you might get responses like "This person seems interested in environmental work and has some experience with sustainability." Although sentences like this hold the necessary information, it is tough to use it in a program with strict data structures. By using structured output calls, we get precise, actionable data:
```python
PersonaInfo(
    name="Maria Silva",
    skills=[("sustainability", "intermediate"), ("project_management", "beginner")],
    location="São Paulo",
    age="24",
    years_of_experience="2"
)
```

This structured approach enables:
- **Consistent processing**: Every persona has the same data fields
- **Easy matching**: We can compare skills, locations, and experience levels programmatically  
- **Reliable integration**: Other systems can depend on the data format
- **Quality assurance**: We can validate that all required information was captured

Let's examine our data models:

In [None]:
class JobInfo(BaseModel):
    """
    Structured representation of job requirements and characteristics.
    
    This model captures the essential information needed to match candidates
    to job opportunities. The AI agent extracts this information from
    unstructured job descriptions.
    """
    required_skills: List[str] = Field(
        default_factory=list,
        description="List of required skills for the job."
    )
    location: str = Field(
        default="",
        description="Job location."
    )
    years_of_experience_required: str = Field(
        default="",
        description="Years of experience required to get this job."
    )

    def describe(self) -> str:
        """Create a human-readable description for AI agents to process."""
        skills = ', '.join(self.required_skills)
        return (
            f"Required skills: {skills}\nLocation: {self.location}\n"
            f"Years of experience required: {self.years_of_experience_required}"
        )


class TrainingInfo(BaseModel):
    """
    Represents training programs and the skills they provide.
    
    Each training program teaches specific skills at defined proficiency levels.
    This information is crucial for recommending learning paths to candidates.
    """
    skill_acquired_and_level: Tuple[str, str] = Field(
        default=("not specified", "not specified"),
        description="A pair of skill name and level. This tuple contains only 2 elements!"
    )

    def describe(self) -> str:
        """Create a human-readable description for AI agents to process."""
        skill = f'{self.skill_acquired_and_level[0]}: level {self.skill_acquired_and_level[1]}'
        return f"Acquired skills: {skill}"


class PersonaInfo(BaseModel):
    """
    Comprehensive profile of a job seeker extracted from conversations.
    
    This model captures all the essential information needed to match
    candidates with appropriate opportunities and training programs.
    """
    name: str = Field(
        default="",
        description="Persona's name"
    )
    skills: List[Tuple[str, str]] = Field(
        default_factory=list,
        description="List of pairs representing skills and its level."
    )
    location: str = Field(
        default="unknown",
        description="Current location"
    )
    age: str = Field(
        default="unknown",
        description="Age of the persona"
        )
    years_of_experience: str = Field(
        default="unknown",
        description="Years of experience in a field."
    )

    def describe(self) -> str:
        """Create a comprehensive human-readable profile for AI agents."""
        skills = ', '.join([
            f'{skill}: {level}'
            for skill, level in self.skills
        ])
        return (
            f"Name: {self.name}\n"
            f"Skills: {skills}\n"
            f"Location: {self.location}\n"
            f"Age: {self.age}\n"
            f"Years of experience: {self.years_of_experience}"
        )


class IDList(BaseModel):
    """
    Simple container for lists of identifiers.
    
    Used when AI agents need to return multiple recommendations,
    such as a list of suitable job IDs or training program IDs.
    """
    values: List[str] = Field(default_factory=list, description="A list of string IDs (job IDs, training IDs, etc.)")

# Building Your First AI Agent <a id='first-agent'></a>

Now we'll create our first AI agents that can have conversations with job seekers (personas) and extract structured information from those conversations. This is where the magic happens - we're moving from static data processing to dynamic, intelligent interactions.

## The Persona System

In our challenge, "personas" are AI agents that represent fictional job seekers. Each persona has:
- A unique background and career interests
- Specific skills and experience levels  
- Location preferences and constraints
- Personal goals and motivations

Our job is to build agents that can:
1. **Conduct natural conversations** with these personas
2. **Extract key information** about their profiles and preferences
3. **Match them** to appropriate jobs and training opportunities

## Conversation Management

The `get_conversation()` function orchestrates multi-turn dialogues with personas. It demonstrates several key AI agent principles:

- **Goal-oriented behavior**: The agent has a clear objective (gather specific information)
- **Adaptive questioning**: Questions evolve based on persona responses
- **Context management**: The agent maintains conversation history
- **Efficiency awareness**: Limited turns prevent infinite conversations

### Conversation ID
⚠️ Important Note: There is a strict limit on the number of conversations each team can conduct with a persona. Teams are allowed up to 5 conversations per day, per persona, with each conversation capped at 20 messages. This limitation ensures fair usage, prevents brute-forcing solutions, and helps manage token consumption effectively. While this may seem restrictive, it is essential for maintaining the integrity and efficiency of the system. **Conversation ID** was introduced for that reason. When calling the persona endpoint for the first time you will receive a conversation ID that need to be passed in the payload in order to get back to the right conversation.


Let's examine how this works:

In [None]:
def send_message_to_chat(message: str, persona_id: str, conversation_id: str = None) -> Optional[Tuple[str, str]]:
    """
    Send a message to a persona agent and receive their response.
    
    This function handles the low-level communication with the challenge API,
    managing AWS authentication and conversation state. Each persona maintains
    their own conversation context across multiple turns.
    
    Args:
        message: The message to send to the persona
        persona_id: Unique identifier for the persona (e.g., 'persona_001')
        conversation_id: Optional conversation ID for maintaining context
        
    Returns:
        Tuple of (persona_response, conversation_id) or None if failed
    """
    url = "https://cygeoykm2i.execute-api.us-east-1.amazonaws.com/main/chat"
    
    # Set up AWS authentication
    session = boto3.Session(region_name='us-east-1')
    credentials = session.get_credentials()
    region = 'us-east-1'
    headers = {'Content-Type': 'application/json'}
    
    # Prepare the message payload
    payload = {
        "persona_id": persona_id,
        "conversation_id": conversation_id,
        "message": message
    }

    # Create and sign the AWS request
    request = AWSRequest(
        method='POST',
        url=url,
        data=json.dumps(payload),
        headers=headers
    )
    SigV4Auth(credentials, 'execute-api', region).add_auth(request)

    # Send the request
    response = requests.request(
        method=request.method,
        url=request.url,
        headers=dict(request.headers),
        data=request.body
    )

    if response.status_code != 200:
        return None
    
    response_json = response.json()
    return response_json['response'], response_json['conversation_id']


def get_conversation(persona_id: str, max_turns: int = 5, print_conversation: bool = False, print_token_no: bool = False) -> List[str]:
    """
    Conduct a structured conversation with a persona to gather career information.
    
    This function demonstrates key AI agent capabilities:
    - Goal-oriented dialogue management
    - Adaptive questioning based on responses  
    - Context maintenance across conversation turns
    - Efficient information extraction within token limits
    
    Args:
        persona_id: Unique identifier for the persona to interview
        max_turns: Maximum conversation turns to prevent infinite loops
        print_conversation: Whether to display the full conversation
        print_token_no: Whether to show token usage statistics
        
    Returns:
        List of conversation messages for further processing
    """
    # Define the agent's role and objectives
    system_prompt = """
    Continue to ask questions about this person - do not provide the jobs, trainings or anything yet.
    You are a helpful and empathetic assistant. Your goal is to engage in a natural conversation with a persona to gather the following information:
    - Their name
    - Their skills and **level of this skill**
    - Their current location
    - Their age
    - Their preferences
    - Years of experience

    Remember to always gather all of those information!
    Ask open-ended questions to encourage detailed responses. Be polite, patient, and adapt your questions based on their answers.
    If the persona is unsure or vague, gently probe for more details. Do not ask all questions at once; let the conversation flow naturally.
    **Do not comment on whatever the response is. Just ask questions to retrieve the information.**
    """
    conversation = []
    current_turn = 0
    total_tokens = 0
    conversation_agent = get_agent(system_prompt)
    conversation_id = None

    # greeting
    agent_message = "Hello! I'm here to help you find the best job or training opportunities. Can you tell me your name?"
    conversation_agent.messages = [{
        "role": "assistant",
        "content": [{
            "text": agent_message
        }]
    }]
    conversation.append(f"Assistant: {agent_message}")

    # Conduct the conversation
    while current_turn < max_turns:
        # Send message to persona and get response
        resp = send_message_to_chat(
            agent_message,
            persona_id,
            conversation_id
        )
        
        if resp is None:
            print(f"⚠️ Persona {persona_id} did not respond - ending conversation")
            break
            
        user_response, conversation_id = resp
        conversation.append(f"User: {user_response}")
        
        # Generate agent's next question/response
        agent_response = conversation_agent(user_response)
        total_tokens = agent_response.metrics.accumulated_usage['totalTokens']
        agent_message = str(agent_response)
        conversation.append(f"Assistant: {agent_message}")
        
        current_turn += 1
    
    # Optional debugging output
    if print_conversation:
        print('\n=== CONVERSATION ===')
        print('\n'.join(conversation))
        print('===================\n')
        
    if print_token_no:
        print(f'💡 Total tokens used: {total_tokens}')
        
    return conversation


# Multi-Agent Collaboration <a id='multi-agent-system'></a>

One of the most powerful aspects of AI agent systems is their ability to work together. In our career matching system, different agents specialize in different tasks:

## Information Extraction Agents

These agents take unstructured text (conversations, job descriptions, training materials) and convert them into structured data using our Pydantic models.

### Why Structured Extraction Matters

Consider this raw conversation excerpt:
> "Hi, I'm João from Rio. I've been working with sustainable farming for about 3 years now. I'm pretty good at organic cultivation and decent at project management..."

An extraction agent converts this into:
```python
PersonaInfo(
    name="João",
    location="Rio", 
    skills=[("organic_cultivation", "advanced"), ("project_management", "intermediate")],
    years_of_experience="3"
)
```

This transformation enables:
- **Consistent data formats** across all personas
- **Programmatic matching** based on skills and requirements
- **Quality validation** to ensure complete information capture
- **Efficient processing** of large datasets

## Batch Processing with Intelligence

The `extract_info_to_json()` function demonstrates enterprise-grade AI agent design:

- **Caching**: Avoids redundant API calls by storing results
- **Error handling**: Retries failed extractions with exponential backoff
- **Progress tracking**: Shows processing status for large datasets
- **Incremental processing**: Resumes from where it left off

This approach is crucial when processing hundreds or thousands of documents efficiently and reliably.

In [None]:
def extract_info(model: Type[M], text: str) -> M:
    extraction_agent = get_agent()
    return extraction_agent.structured_output(output_model=model, prompt=text)


def extract_info_from_conversation(conversation: List[str]) -> PersonaInfo:
    text = '\n'.join(conversation)
    return extract_info(PersonaInfo, text)


def extract_info_from_job_path(path: str | Path) -> JobInfo:
    text = load_file_content(path)
    return extract_info(JobInfo, text)


def extract_info_from_training_path(path: str | Path) -> TrainingInfo:
    text = load_file_content(path)
    return extract_info(TrainingInfo, text)


def extract_info_to_json(
    model: BaseModel,
    description_paths: List[str | Path],
    save_path: str | Path,
    cache_period: int = 20,
    max_retries: int = 3
):
    save_path = Path(save_path)

    if not save_path.exists():
        save_path.touch()
        save_json(save_path, {})

    extracted_data = read_json(save_path)
    description_paths = [Path(path) for path in description_paths]
    print(f'Total descriptions for {model.__name__}: {len(description_paths)}')
    print(f'Extracted infos: {len(extracted_data)}')

    counter = 0
    for path in description_paths:
        id_ = path.stem
        retries = 0
        err = None
        if id_ not in extracted_data:
            text = load_file_content(path)
            while retries < max_retries:
                try:
                    info = extract_info(model, text)
                    extracted_data[id_] = info.model_dump_json()
                    counter += 1
                    break
                except ValueError as e:
                    retries += 1
                    err = e
            else:
                print(f'Error for id: {id_}', err)
        if counter % cache_period == 1:
            save_json(save_path, extracted_data)
            print(len(extracted_data))
    save_json(save_path, extracted_data)


def extract_jobs_info_to_json(
    save_path: str | Path,
    cache_period: int = 20,
    max_retries: int = 3
):
    job_paths = get_job_paths()
    extract_info_to_json(
        model=JobInfo,
        description_paths=job_paths,
        save_path=save_path,
        cache_period=cache_period,
        max_retries=max_retries
    )


def extract_trainings_info_to_json(
    save_path: str | Path,
    cache_period: int = 20,
    max_retries: int = 3
):
    training_paths = get_training_paths()
    extract_info_to_json(
        model=TrainingInfo,
        description_paths=training_paths,
        save_path=save_path,
        cache_period=cache_period,
        max_retries=max_retries
    )


# Real-World Application: Intelligent Matching <a id='real-world-application'></a>

This is where our AI agents solve the core challenge: matching people to opportunities. Our matching system uses specialized agents that understand both human preferences and job requirements.

## The Matching Challenge

Traditional job matching often relies on simple keyword matching. Our AI agent approach is far more sophisticated:

### Traditional Approach:
- Person has "sustainability" skill
- Job requires "environmental" knowledge  
- ❌ **No Match** (different keywords)

### AI Agent Approach:
- Agent understands semantic relationships
- Recognizes that sustainability ↔ environmental knowledge
- ✅ **Match Found** with confidence score

## Multi-Layer Matching Strategy

Our system employs multiple specialized agents:

### 1. Job Matching Agent
- Analyzes persona skills vs. job requirements
- Considers location preferences and constraints
- Evaluates experience levels and career growth potential
- Returns ranked list of suitable opportunities

### 2. Training Matching Agent  
- Identifies skill gaps between current abilities and career goals
- Recommends learning paths to bridge those gaps
- Considers prerequisite skills and learning progression
- Suggests training programs that align with career objectives

### 3. Career Path Agent
- Combines job and training recommendations
- Creates comprehensive development plans
- Balances immediate opportunities with long-term growth

## Why This Approach Works

1. **Contextual Understanding**: Agents grasp the meaning behind requirements, not just keywords
2. **Holistic Evaluation**: Multiple factors are considered simultaneously
3. **Personalized Recommendations**: Each match is tailored to individual circumstances
4. **Scalable Intelligence**: The same agents can handle thousands of personas efficiently

Let's see how these matching agents work in practice:

In [None]:
def find_job_matches_for_persona(
    persona_info: PersonaInfo,
    jobs_data: Dict[str, JobInfo],
) -> List[str]:
    jobs_text = "\n".join([
        f'{job_id}: {job_info.describe()}'
        for job_id, job_info in jobs_data.items()
    ])
    system_prompt = f"""
    You have a list of all available jobs. Given a candidate info provide
    a list of up to 4 job IDs that would match that candidate.
    The list might be empty if no match is found:
    {jobs_text}
    """
    agent = get_agent(system_prompt=system_prompt)
    res = agent.structured_output(IDList, persona_info.describe())
    return res.values


def find_training_matches_for_persona(
    persona_info: PersonaInfo,
    trainings_data: Dict[str, TrainingInfo]
) -> List[str]:
    trainings_text = "\n".join([
        f'{training_id}: {training_info.describe()}'
        for training_id, training_info in trainings_data.items()
    ])
    system_prompt = f"""
    You have a list of all available trainings. Given a candidate info provide
    a list of up to 4 training IDs that would match that candidate.
    The list might be empty if no match is found:\n
    {trainings_text}
    """
    agent = get_agent(system_prompt=system_prompt)
    res = agent.structured_output(IDList, persona_info.describe())
    return res.values


def find_training_matches_for_job(
    job_info: JobInfo,
    trainings_data: Dict[str, TrainingInfo],
):
    trainings_text = "\n".join([
        f'{training_id}: {training_info.describe()}'
        for training_id, training_info in trainings_data.items()
    ])
    system_prompt = f"""
    You have a list of all available trainings. Given a job info provide
    a list of up to 4 training IDs that would be nice to have before
    taking that job. The list may be empty if no training fit:\n
    {trainings_text}
    """
    agent = get_agent(system_prompt=system_prompt)
    res = None
    err = None
    max_retries = 3
    retries = 0
    while retries < max_retries:
        try:
            res = agent.structured_output(IDList, job_info.describe())
            break
        except ValueError as e:
            retries += 1
            err = e
    if res is None:
        raise ValueError(f'Agent could not get matches for job {job_info}. Err: {err}')
    return res.values

# Extracting job infos

In [None]:
jobs_save_path = './extracted_jobs_info.json'
extract_jobs_info_to_json(jobs_save_path)
jobs_info = read_json(jobs_save_path)
jobs_info = {
    job_id: JobInfo.model_validate_json(job_data)
    for job_id, job_data in jobs_info.items()
}

Total descriptions for JobInfo: 200
Extracted infos: 200


# Extracting trainings info

In [None]:
trainings_save_path = './extracted_trainings_info.json'
extract_trainings_info_to_json(trainings_save_path)
trainings_info = read_json(trainings_save_path)
trainings_info = {
    training_id: TrainingInfo.model_validate_json(training_data)
    for training_id, training_data in trainings_info.items()
}

Total descriptions for TrainingInfo: 497
Extracted infos: 497


# Match trainings to jobs

In [None]:
job_training_matches_path = Path('./job_training_matches.json')
job_training_matches = {}
for job_id, job_info in tqdm(jobs_info.items()):
    training_ids = find_training_matches_for_job(job_info, trainings_info)
    job_training_matches[job_id] = training_ids
save_json(job_training_matches_path, job_training_matches)

100%|██████████| 200/200 [04:50<00:00,  1.45s/it]


# Testing and Optimization <a id='testing-optimization'></a>

Testing AI agent systems requires a different approach than traditional software testing. We need to verify not just that the code runs, but that the agents make intelligent decisions and provide valuable recommendations.

## Testing Your Matching Algorithm

Let's test our matching system with a sample persona to see how well it works:

### Test Case: Environmental Sustainability Professional

We'll create a persona representing someone interested in sustainable agriculture and environmental work, then see what opportunities our agents recommend.

**Key Testing Principles:**
1. **Diverse test cases**: Try personas with different backgrounds, skill levels, and locations
2. **Edge cases**: Test with minimal information, conflicting preferences, or unusual skill combinations  
3. **Quality validation**: Verify that recommendations make sense and are relevant
4. **Efficiency monitoring**: Track token usage and response times

### What to Look For:
- ✅ **Relevant matches**: Do recommended jobs align with the persona's skills and interests?
- ✅ **Appropriate difficulty**: Are experience requirements realistic for the candidate?
- ✅ **Geographic feasibility**: Do location constraints make sense?
- ✅ **Learning pathways**: Do training recommendations help bridge skill gaps?

Let's run our test:

In [None]:
# Create a test persona with environmental/sustainability focus
persona = PersonaInfo(
    name='Pedro Araújo',
    skills=[('Food Safety', 'Intermediate'), ('Food Sustainability', 'Intermediate')],
    location='Brasília',
    age='16',  # Note: Young age may affect job eligibility
    years_of_experience='1'
)

print("🧪 Testing with Sample Persona:")
print("=" * 40)
print(persona.describe())
print("=" * 40)

# Test job matching
print("\n🎯 Finding Job Matches...")
persona_jobs = find_job_matches_for_persona(persona, jobs_info)
print(f"Recommended Jobs: {persona_jobs}")

# Test training matching  
print("\n📚 Finding Training Matches...")
persona_trainings = find_training_matches_for_persona(persona, trainings_info)
print(f"Recommended Trainings: {persona_trainings}")

print("\n💡 Analysis:")
print("- Are the job recommendations appropriate for someone with intermediate food safety/sustainability skills?")
print("- Do the training suggestions help develop complementary skills?") 
print("- How does the young age (16) affect job eligibility?")
print("- Are there regional opportunities available in Brasília?")

# Collecting conversations

In [None]:
persona_ids = [f'persona_{i:03}' for i in range(1, 101)]
cache_period = 4

personas_save_path = Path('./extracted_personas_info.json')
if not personas_save_path.exists():
    personas_save_path.touch()
    save_json(personas_save_path, {})

persona_infos = read_json(personas_save_path)
print(f'Total conversations for personas: {len(persona_infos)}')
print(f'Collected conversations: {len(persona_infos)}')

counter = 0
for persona_id in persona_ids:
    if persona_id not in persona_infos:
        conversation = get_conversation(persona_id, max_turns=2)
        persona_info = extract_info_from_conversation(conversation)
        persona_infos[persona_id] = persona_info.model_dump_json()
        counter += 1
    if counter % cache_period == 1:
        save_json(personas_save_path, persona_infos)
        print(len(persona_infos))
save_json(personas_save_path, persona_infos)

persona_infos = {
    persona_id: PersonaInfo.model_validate_json(persona_info)
    for persona_id, persona_info in persona_infos.items()
}


Total conversations for personas: 100
Collected conversations: 100


# Generating final data

Each line in the sample represents a single prediction result for a persona, formatted as a JSON object (JSON Lines format). The fields include:

- `persona_id`: The unique identifier for the persona.
- `predicted_type`: The type of recommendation made. It can be `"jobs+trainings"`, `"trainings_only"`, or `"awareness"`.
- Depending on the `predicted_type`, additional fields are included:
    - For `"jobs+trainings"`: a `jobs` list, where each item contains a `job_id` and a list of `suggested_trainings` for that job.
    - For `"trainings_only"`: a `trainings` list with recommended training IDs.
    - For `"awareness"`: a `predicted_items` field, e.g., `"too_young"`.

**Why this format?**

- **Consistent structure** ensures our endpoint can reliably parse and validate each prediction.
- **Flexible fields** support different recommendation types while keeping the schema simple and machine-readable.
- **Automation-ready**: This format enables direct ingestion into evaluation or deployment systems without manual intervention.

Participants must use this format to ensure compatibility with the challenge's automated result validation and scoring systems.

Example result:
```json
{"persona_id": "persona_001", "predicted_type": "trainings_only", "trainings": ["t1", "t2"]}
{"persona_id": "persona_002", "predicted_type": "jobs+trainings", "jobs": [{"job_id": "j1", "suggested_trainings": ["t3"]},{"job_id": "j2", "suggested_trainings": ["t34"]},{"job_id": "j7", "suggested_trainings": ["t1", "t33"]}]}
{"persona_id": "persona_127", "predicted_type": "awareness", "predicted_items": "too_young"}
```

In [None]:
results = []
for persona_id, persona_info in tqdm(persona_infos.items()):
    jobs = find_job_matches_for_persona(persona_info, jobs_info)
    data = {'persona_id': persona_id}
    if jobs and any(job_training_matches.get(job_id) for job_id in jobs):
        data['predicted_type'] = 'jobs+trainings'
        data['jobs'] = [
            {
                'job_id': job_id,
                'suggested_trainings': job_training_matches[job_id]
            }
            for job_id in jobs
        ]
    elif not jobs:
        trainings = find_training_matches_for_persona(persona_info, jobs_info)
        data['predicted_type'] = 'trainings_only'
        data['trainings'] = trainings
    else:
        data['predicted_type'] = 'awareness'
        data['predicted_items'] = ''
    results.append(data)

In [None]:
results[0]

In [None]:
results_path = Path('./results.jsonl')
with results_path.open('w', encoding='utf-8') as file:
    for res in results:
        line = json.dumps(res) + '\n'
        file.write(line)