# Introduction to Large Language Models (LLMs)

This notebook bridges the gap between rule-based agents and AI-powered agents by introducing the fundamentals of Large Language Models (LLMs). We'll cover what LLMs are, how to use them, and how they differ from the pattern-matching approach we saw in the SimpleAgent.

## Learning Objectives

By the end of this notebook, you'll understand:
- What LLMs are and how they differ from rule-based systems
- How to set up and use LLM APIs
- Basic prompting techniques
- Key concepts: tokens, temperature, and context windows
- How to build simple LLM-powered agents

## Prerequisites

- Completed the `agent_demo.ipynb` notebook
- Basic Python knowledge
- API keys for OpenAI or OpenRouter (we'll use OpenRouter for flexibility)

## 1. From Pattern Matching to Language Understanding

### Rule-Based Approach (What We've Seen)

In the SimpleAgent, we used pattern matching:
```python
if "weather" in user_input:
    return self.get_weather()
elif "time" in user_input:
    return self.get_time()
```

**Limitations:**
- Only understands exact patterns
- Can't handle variations or context
- Requires explicit programming for each case
- No real "understanding" of language

### LLM Approach (What We're Learning)

LLMs understand language naturally:
- "What's the weather?" ✓
- "How's the weather looking?" ✓
- "Is it going to rain?" ✓
- "Do I need an umbrella?" ✓

All these variations are understood without explicit programming!

## 2. What Are Large Language Models?

**Large Language Models (LLMs)** are AI systems trained on vast amounts of text data to understand and generate human-like text. Think of them as incredibly sophisticated pattern recognition systems that have "read" much of the internet.

### Key Characteristics:

1. **Trained on Text**: Learned from billions of web pages, books, and articles
2. **Probabilistic**: Generate responses based on patterns they've learned
3. **Contextual**: Understand meaning based on surrounding context
4. **Generative**: Can create new, coherent text
5. **Versatile**: Can perform many tasks without specific programming

### Popular LLMs for Educational Use:
- **Claude Haiku** (Anthropic): Fast, reliable, and helpful - our default choice
- **DeepSeek** (DeepSeek): Excellent balance of cost and capability, strong at coding
- **Llama 3.3** (Meta): Open-source, multilingual, 70B parameters
- **GPT-4o-mini** (OpenAI): Efficient and reliable for general tasks
- **Gemma** (Google): Good for resource-constrained deployments

## 3. Setting Up LLM Access

We'll use OpenRouter, which provides access to multiple LLMs through a single API. This gives us flexibility to try different models.

**Note:** Make sure you have installed all dependencies from `requirements.txt` before running this notebook:
```bash
pip install -r requirements.txt
```

## 3.1 Understanding the LLM API Call Components

Before we make our first API call, let's understand each component of an LLM API request:

```python
response = client.chat.completions.create(
    model="anthropic/claude-3.5-haiku",    # Which LLM to use
    messages=[                             # The conversation history
        {
            "role": "system",              # Who is speaking
            "content": "You are helpful"   # What they're saying
        },
        {
            "role": "user",
            "content": "Hello!"
        }
    ],
    temperature=0.7,                       # Creativity level (0-2)
    max_tokens=100,                        # Maximum response length
    top_p=1.0,                            # Nucleus sampling (advanced)
    frequency_penalty=0,                   # Reduce repetition
    presence_penalty=0,                    # Encourage new topics
    stream=False                          # Get response all at once
)
```

### Key Components Explained:

1. **client**: The OpenAI client configured with OpenRouter's endpoint
   - Handles authentication and API communication
   - Routes requests to different model providers

2. **model**: Specifies which LLM to use
   - Format: `provider/model-name` (e.g., "anthropic/claude-3.5-haiku")
   - Different models have different strengths, costs, and speeds

3. **messages**: The conversation context as a list of message objects
   - Each message has a `role` and `content`
   - Maintains conversation history for context

4. **temperature** (0.0 - 2.0): Controls randomness
   - 0: Deterministic, always picks most likely word
   - 0.7: Balanced creativity (default)
   - 2.0: Very creative, might be incoherent

5. **max_tokens**: Maximum length of the response
   - 1 token ≈ 0.75 words
   - Prevents runaway responses and controls costs
   - Set based on your needs (e.g., 50 for short answers, 500 for essays)

6. **Optional Parameters**:
   - **top_p**: Alternative to temperature for controlling randomness
   - **frequency_penalty** (-2.0 to 2.0): Reduces word repetition
   - **presence_penalty** (-2.0 to 2.0): Encourages talking about new topics
   - **stream**: If True, receive response in chunks (like ChatGPT typing)

### Response Object Structure:

```python
response = {
    "id": "chatcmpl-abc123",              # Unique identifier
    "object": "chat.completion",           # Response type
    "created": 1677858242,                 # Timestamp
    "model": "anthropic/claude-3.5-haiku", # Model used
    "usage": {                             # Token usage info
        "prompt_tokens": 13,
        "completion_tokens": 7,
        "total_tokens": 20
    },
    "choices": [{                          # Array of responses (usually 1)
        "message": {
            "role": "assistant",
            "content": "Hello! How can I help you today?"
        },
        "finish_reason": "stop",           # Why generation stopped
        "index": 0
    }]
}
```

To extract the actual response text:
```python
response_text = response.choices[0].message.content
```

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI
import json

# Load environment variables from .env file
load_dotenv("../../.env")

# Initialize OpenAI client with OpenRouter configuration
client = OpenAI(
    base_url=os.getenv("OPENROUTER_BASE_URL"),
    api_key=os.getenv("OPENROUTER_API_KEY")
)

# Check if credentials are loaded
print(f"API Base URL: {os.getenv('OPENROUTER_BASE_URL')}")
print(f"API Key: {'Loaded successfully' if os.getenv('OPENROUTER_API_KEY') else 'Not found'}")

In [None]:
# Let's see a complete API call with all components labeled
def detailed_api_call_example():
    """Demonstrate a complete API call with all components"""
    
    # Configure the request
    request_config = {
        "model": "anthropic/claude-3.5-haiku",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant who explains things clearly."
            },
            {
                "role": "user",
                "content": "What is photosynthesis?"
            }
        ],
        "temperature": 0.7,      # Balanced creativity
        "max_tokens": 150,       # Enough for a paragraph
        "top_p": 1.0,           # Use all vocabulary (default)
        "frequency_penalty": 0,  # No penalty for repeated words
        "presence_penalty": 0    # No penalty for staying on topic
    }
    
    print("📤 REQUEST CONFIGURATION:")
    print("-" * 50)
    for key, value in request_config.items():
        if key != "messages":
            print(f"{key}: {value}")
    print("\nMessages:")
    for msg in request_config["messages"]:
        print(f"  [{msg['role']}]: {msg['content']}")
    
    # Make the API call
    try:
        response = client.chat.completions.create(**request_config)
        
        print("\n📥 RESPONSE DETAILS:")
        print("-" * 50)
        print(f"Response ID: {response.id}")
        print(f"Model used: {response.model}")
        print(f"Created at: {response.created}")
        
        print("\n💰 TOKEN USAGE:")
        print(f"Prompt tokens: {response.usage.prompt_tokens}")
        print(f"Response tokens: {response.usage.completion_tokens}")
        print(f"Total tokens: {response.usage.total_tokens}")
        
        print("\n🤖 ASSISTANT'S RESPONSE:")
        print("-" * 50)
        print(response.choices[0].message.content)
        
        print("\n🛑 FINISH REASON:", response.choices[0].finish_reason)
        print("(Common reasons: 'stop' = natural end, 'length' = hit max_tokens)")
        
    except Exception as e:
        print(f"\n❌ Error: {str(e)}")

# Run the example
detailed_api_call_example()

## 4. Your First LLM Call

Let's start with the simplest possible LLM interaction - asking it a question:

In [None]:
# Simple completion example
def ask_llm(question, model="anthropic/claude-3.5-haiku"):
    """Ask a question to an LLM and get a response"""
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": question}
            ]
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

# Try it out
response = ask_llm("What is the capital of France?")
print(response)

Notice how different this is from our SimpleAgent! We didn't program any facts about France - the LLM knows this from its training.

Let's compare with our SimpleAgent approach:

In [None]:
# Comparison: Rule-based vs LLM approach

# Rule-based approach (like SimpleAgent)
def answer_capital_rule_based(country):
    capitals = {
        "france": "Paris",
        "spain": "Madrid",
        "italy": "Rome"
        # We'd need to add every country manually!
    }
    return capitals.get(country.lower(), "I don't know that capital")

# LLM approach
def answer_capital_llm(country):
    return ask_llm(f"What is the capital of {country}?")

# Test both approaches
test_countries = ["France", "Spain", "Japan", "Kenya", "Brazil"]

print("Rule-based approach:")
for country in test_countries:
    print(f"  {country}: {answer_capital_rule_based(country)}")

print("\nLLM approach:")
for country in test_countries:
    print(f"  {country}: {answer_capital_llm(country)}")

## 5. Understanding the Chat Format

As we saw in the API components section, modern LLMs use a "chat" format. Let's explore how this works in practice with different message combinations and see how the roles affect the AI's behavior.

In [None]:
def chat_with_llm(user_message, system_message=None, model="anthropic/claude-3.5-haiku"):
    """Chat with an LLM using system and user messages"""
    messages = []
    
    # Add system message if provided
    if system_message:
        messages.append({"role": "system", "content": system_message})
    
    # Add user message
    messages.append({"role": "user", "content": user_message})
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

# Example 1: Without system message
print("Without system message:")
print(chat_with_llm("Tell me a joke"))
print("\n" + "="*50 + "\n")

# Example 2: With system message defining behavior
print("With system message (pirate):")
print(chat_with_llm(
    "Tell me a joke",
    system_message="You are a friendly pirate. Always speak like a pirate would."
))
print("\n" + "="*50 + "\n")

# Example 3: With system message for a specific task
print("With system message (teacher):")
print(chat_with_llm(
    "What is 2+2?",
    system_message="You are a patient math teacher. Explain your answers step by step."
))

## 6. Key Concept: Tokens in Practice

We learned that tokens are the basic units LLMs process. Now let's see how token usage affects your costs and performance in real scenarios.

In [None]:
# Let's see how token usage works
def analyze_token_usage(message, model="anthropic/claude-3.5-haiku"):
    """Analyze token usage for a message"""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": message}],
        max_tokens=50  # Limit response length
    )
    
    # Get token usage from response
    usage = response.usage
    
    return {
        "response": response.choices[0].message.content,
        "prompt_tokens": usage.prompt_tokens,
        "completion_tokens": usage.completion_tokens,
        "total_tokens": usage.total_tokens
    }

# Test with different message lengths
messages = [
    "Hi",
    "What is the weather like?",
    "Can you explain quantum computing in simple terms that a child could understand?"
]

for msg in messages:
    result = analyze_token_usage(msg)
    print(f"Message: '{msg}'")
    print(f"Prompt tokens: {result['prompt_tokens']}")
    print(f"Response tokens: {result['completion_tokens']}")
    print(f"Total tokens: {result['total_tokens']}")
    print(f"Response: {result['response'][:1000]}...")
    print("\n" + "-"*50 + "\n")

## 7. Key Concept: Temperature in Action

We know temperature controls creativity (0 = deterministic, 2 = very creative). Let's experiment to see exactly how different temperature values affect various types of prompts.

In [None]:
def test_temperature(prompt, temperatures=[0, 0.5, 1.0, 1.5]):
    """Test how temperature affects responses"""
    print(f"Prompt: '{prompt}'\n")
    
    for temp in temperatures:
        response = client.chat.completions.create(
            model="anthropic/claude-3.5-haiku",
            messages=[{"role": "user", "content": prompt}],
            temperature=temp,
            max_tokens=50
        )
        
        print(f"Temperature {temp}: {response.choices[0].message.content}")
        print("-" * 70)

# Test with a creative prompt
test_temperature("Write a one-sentence story about a robot:")

print("\n" + "="*70 + "\n")

# Test with a factual prompt
test_temperature("What is 2+2?")

Notice how:
- Creative prompts benefit from higher temperature (more variety)
- Factual prompts should use lower temperature (consistent accuracy)

## 8. Basic Prompting Techniques

Good prompting is key to getting useful responses from LLMs. Here are fundamental techniques:

In [None]:
# Technique 1: Be Specific
print("❌ Vague prompt:")
print(ask_llm("Tell me about dogs")[:200] + "...\n")

print("✅ Specific prompt:")
print(ask_llm("List 3 interesting facts about golden retrievers' behavior"))
print("\n" + "="*70 + "\n")

In [None]:
# Technique 2: Provide Examples (Few-shot prompting)
few_shot_prompt = """
Classify the sentiment of these movie reviews as positive or negative:

Review: "This movie was absolutely fantastic! Best film of the year."
Sentiment: positive

Review: "Terrible waste of time. I want my money back."
Sentiment: negative

Review: "The cinematography was breathtaking and the acting was superb."
Sentiment:
"""

print("Few-shot prompting result:")
print(ask_llm(few_shot_prompt))

In [None]:
# Technique 3: Specify Output Format
format_prompt = """
Extract information about this product and return it as JSON:

"The new iPhone 15 Pro features a titanium design and starts at $999. 
It comes in 128GB, 256GB, and 512GB storage options and is available 
in Natural Titanium, Blue Titanium, White Titanium, and Black Titanium colors."

Return only valid JSON with keys: product_name, starting_price, storage_options, colors
"""

print("Structured output:")
response = ask_llm(format_prompt)
print(response)

# Try to parse it as JSON
try:
    data = json.loads(response)
    print("\n✅ Successfully parsed as JSON!")
    print(json.dumps(data, indent=2))
except:
    print("\n❌ Failed to parse as JSON")

In [None]:
# Technique 4: Role-playing / Personas
def ask_with_persona(question, persona):
    return chat_with_llm(
        question,
        system_message=persona
    )

question = "How do I make a sandwich?"

personas = [
    "You are Gordon Ramsay, the famous chef. Be passionate but helpful.",
    "You are a nutritionist focused on healthy eating.",
    "You are explaining to a 5-year-old child. Use simple words."
]

for persona in personas:
    print(f"Persona: {persona}")
    print(ask_with_persona(question, persona)[:200] + "...\n")
    print("-" * 70 + "\n")

## 9. Building Your First LLM-Powered Agent

Now let's create an agent that's smarter than our SimpleAgent but simpler than the complex reasoning models we'll see later:

In [None]:
class LLMAgent:
    """A simple LLM-powered agent that can handle various tasks"""
    
    def __init__(self, model="anthropic/claude-3.5-haiku"):
        self.model = model
        self.system_prompt = """
You are a helpful AI assistant. You can help with various tasks including:
- Answering questions
- Providing weather information (you should mention you'd need current data)
- Doing calculations
- Setting reminders (you should acknowledge the reminder)
- General conversation

Be concise but friendly in your responses.
"""
        self.conversation_history = []
    
    def process_input(self, user_input):
        """Process user input and return a response"""
        # Add user message to history
        self.conversation_history.append({"role": "user", "content": user_input})
        
        # Prepare messages for API
        messages = [
            {"role": "system", "content": self.system_prompt}
        ] + self.conversation_history
        
        try:
            # Get response from LLM
            response = client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0.7
            )
            
            # Extract the response
            assistant_response = response.choices[0].message.content
            
            # Add to history
            self.conversation_history.append({
                "role": "assistant", 
                "content": assistant_response
            })
            
            # Keep history manageable (last 10 messages)
            if len(self.conversation_history) > 10:
                self.conversation_history = self.conversation_history[-10:]
            
            return assistant_response
            
        except Exception as e:
            return f"I encountered an error: {str(e)}"
    
    def reset(self):
        """Reset conversation history"""
        self.conversation_history = []

# Create and test the agent
agent = LLMAgent()

# Test the same inputs we used with SimpleAgent
test_inputs = [
    "What's the weather like?",
    "What time is it?",
    "Remind me to buy groceries",
    "Calculate 15 * 3 + 7",
    "How do I make coffee?"
]

print("LLM Agent Responses:\n")
for user_input in test_inputs:
    print(f"User: {user_input}")
    print(f"Agent: {agent.process_input(user_input)}")
    print("-" * 70 + "\n")

The LLM agent remembers the context (your name, party details) across messages - something our SimpleAgent couldn't do!

## 10. Comparing Different Models

Different LLMs have different strengths. Let's compare a few:

In [None]:
def compare_models(prompt, models):
    """Compare responses from different models"""
    print(f"Prompt: '{prompt}'\n")
    
    for model_name in models:
        try:
            response = client.chat.completions.create(
                model=model_name,
                messages=[{"role": "user", "content": prompt}],
                max_tokens=100,
                temperature=0.7
            )
            
            print(f"Model: {model_name}")
            print(f"Response: {response.choices[0].message.content}")
            print("-" * 70 + "\n")
            
        except Exception as e:
            print(f"Model: {model_name}")
            print(f"Error: {str(e)}")
            print("-" * 70 + "\n")

# Compare different models
models_to_test = [
    "anthropic/claude-3.5-haiku",  # Fast and efficient (default)
    "deepseek/deepseek-chat",  # Very cost-effective, strong performance
    "meta-llama/llama-3.3-70b-instruct",  # Open source, excellent general purpose
    "openai/gpt-4o-mini",  # OpenAI's efficient model
]

# Note: Some models might require different API setup or credits
# For educational purposes, we'll test the first two which work well with OpenRouter
compare_models("Explain quantum computing in one sentence.", models_to_test[:])

## 11. Common Pitfalls and Best Practices

### Common Pitfalls:

In [None]:
# Pitfall 1: Assuming the LLM always tells the truth
print("Pitfall 1: Hallucination")
response = ask_llm("What happened in the 2027 World Cup?")
print(f"Response: {response}")
print("Note: The LLM might make up events that haven't happened!\n")

# Pitfall 2: Not handling API errors
print("Pitfall 2: Not handling errors")
try:
    # This might fail due to rate limits, network issues, etc.
    response = ask_llm("Test", model="invalid-model-name")
except Exception as e:
    print(f"Error occurred: {e}")
    print("Always wrap API calls in try-except blocks!\n")

# Pitfall 3: Sending sensitive information
print("Pitfall 3: Security concerns")
print("NEVER send:")
print("- Passwords or API keys")
print("- Personal identification (SSN, credit cards)")
print("- Confidential business data")
print("- Private user information")

### Best Practices:

In [None]:
class SafeLLMAgent:
    """An LLM agent with best practices implemented"""
    
    def __init__(self, model="anthropic/claude-3.5-haiku", max_retries=3):
        self.model = model
        self.max_retries = max_retries
        self.system_prompt = """
You are a helpful AI assistant. Always:
- Be honest about what you know and don't know
- Refuse to process sensitive information like passwords
- Provide helpful, accurate responses
- Acknowledge when you're not certain about something
"""
    
    def process_input(self, user_input, temperature=0.7, max_tokens=None):
        """Process input with error handling and retries"""
        
        # Input validation
        if not user_input or not isinstance(user_input, str):
            return "Please provide a valid text input."
        
        # Check for sensitive information (basic check)
        sensitive_keywords = ['password', 'ssn', 'credit card', 'api key']
        if any(keyword in user_input.lower() for keyword in sensitive_keywords):
            return "I cannot process requests containing sensitive information."
        
        # Retry logic for API calls
        for attempt in range(self.max_retries):
            try:
                response = client.chat.completions.create(
                    model=self.model,
                    messages=[
                        {"role": "system", "content": self.system_prompt},
                        {"role": "user", "content": user_input}
                    ],
                    temperature=temperature,
                    max_tokens=max_tokens
                )
                
                return response.choices[0].message.content
                
            except Exception as e:
                if attempt < self.max_retries - 1:
                    print(f"Attempt {attempt + 1} failed, retrying...")
                    continue
                else:
                    return f"I'm having trouble processing your request. Error: {str(e)}"
    
    def estimate_cost(self, prompt, max_tokens=100):
        """Estimate the cost of a request (example for Claude Haiku)"""
        # Rough estimation: ~4 characters per token
        prompt_tokens = len(prompt) / 4
        total_tokens = prompt_tokens + max_tokens
        
        # Claude Haiku pricing (check current prices on OpenRouter)
        # As of 2025, Claude Haiku is fast and cost-effective
        cost_per_1k_tokens = 0.00025  # $0.25 per 1M input tokens
        estimated_cost = (total_tokens / 1000) * cost_per_1k_tokens
        
        return {
            "estimated_tokens": int(total_tokens),
            "estimated_cost": f"${estimated_cost:.6f}"
        }

# Test the safe agent
safe_agent = SafeLLMAgent()

# Test normal input
print("Normal input:")
print(safe_agent.process_input("What is machine learning?"))
print("\n" + "-"*70 + "\n")

# Test sensitive input
print("Sensitive input:")
print(safe_agent.process_input("My password is 12345"))
print("\n" + "-"*70 + "\n")

# Test cost estimation
print("Cost estimation:")
prompt = "Explain the theory of relativity"
print(f"Prompt: '{prompt}'")
print(safe_agent.estimate_cost(prompt))

## 12. From Simple to Advanced: What's Next?

We've covered the basics of LLMs. Here's how this connects to what's coming:

### What We've Learned:
1. **LLMs vs Rule-Based**: LLMs understand language naturally
2. **API Usage**: How to call LLMs and handle responses
3. **Key Concepts**: Tokens, temperature, prompting
4. **Basic Agent**: Built a simple LLM-powered agent

### What's Coming Next:
1. **Thinking Models**: Models that show their reasoning process
2. **Advanced Prompting**: Chain of thought, ReAct, Tree of thoughts
3. **Tool Use**: Agents that can use external tools and APIs
4. **Memory Systems**: Long-term memory and context management
5. **Multi-Agent Systems**: Multiple agents working together

### Quick Comparison:

In [None]:
# Let's create a visual comparison of the three approaches
import pandas as pd

comparison_data = {
    "Aspect": [
        "Language Understanding",
        "Flexibility",
        "Context Awareness",
        "Learning Ability",
        "Transparency",
        "Cost",
        "Speed",
        "Reliability"
    ],
    "Rule-Based (SimpleAgent)": [
        "❌ Only exact patterns",
        "❌ Rigid, needs programming",
        "❌ No context memory",
        "❌ Cannot learn",
        "✅ Fully transparent",
        "✅ Free to run",
        "✅ Instant",
        "✅ 100% predictable"
    ],
    "Basic LLM (This notebook)": [
        "✅ Natural language",
        "✅ Very flexible",
        "✅ Short-term context",
        "❌ No in-session learning",
        "⚠️  Black box",
        "⚠️  Pay per use",
        "⚠️  ~1-2 seconds",
        "⚠️  Mostly reliable"
    ],
    "Advanced AI Agent (Next)": [
        "✅ Deep understanding",
        "✅ Extremely flexible",
        "✅ Long-term memory",
        "✅ Can adapt and improve",
        "✅ Shows reasoning",
        "❌ More expensive",
        "❌ Slower (3-10s)",
        "⚠️  Complex behavior"
    ]
}

df = pd.DataFrame(comparison_data)
print("Evolution of AI Agents:")
print("=" * 100)
print(df.to_string(index=False))

## Practice Exercises

Try these exercises to solidify your understanding:

### Exercise 1: Prompt Engineering
Create prompts to make the LLM:
1. Extract key points from a paragraph
2. Translate text to another language
3. Generate test data in a specific format

### Exercise 2: Build a Specialized Agent
Modify the LLMAgent class to create:
1. A customer service agent
2. A coding assistant
3. A creative writing helper

### Exercise 3: Experiment with Parameters
1. Test different temperature values for various tasks
2. Compare response quality across different models
3. Measure token usage for different prompt styles

In [None]:
# Space for your practice code

# Exercise 1 Example:
def extract_key_points(text):
    prompt = f"""
Extract the 3 most important points from this text.
Return them as a numbered list.

Text: {text}
"""
    return ask_llm(prompt)

# Try it out!
sample_text = """
Large Language Models have revolutionized natural language processing. 
They can understand context, generate human-like text, and perform 
various tasks without specific programming. However, they require 
careful prompting and have limitations like hallucination and 
high computational costs.
"""

print(extract_key_points(sample_text))

## 13. Core Agentic Components

Now that you understand how to work with LLMs, let's explore the four foundational components that make AI agents truly powerful:

1. **Planning** - Breaking down complex tasks into manageable steps
2. **Reasoning** - Thinking through problems logically 
3. **Tools** - Using external capabilities beyond text generation
4. **Memory** - Retaining and using information across interactions

We've already seen Planning and Reasoning in our examples. Now let's explore Tools and Memory, which transform simple LLMs into capable agents.

### 13.1 Tools: Extending Agent Capabilities

**Tools** allow AI agents to interact with the external world beyond just generating text. Instead of being limited to what they learned during training, agents can:

- **Calculate** precise mathematical results
- **Search** the web for current information  
- **Query** databases for specific data
- **Call APIs** to access external services
- **Execute code** to perform computations
- **Generate images** or process multimedia

#### Why Tools Matter:

1. **Accuracy**: LLMs can hallucinate facts or make calculation errors. Tools provide reliable, verifiable results.
2. **Currency**: Training data has a cutoff date. Tools enable access to real-time information.
3. **Capabilities**: Tools extend what agents can do beyond text processing.

Let's explore how this works in practice:

#### Tool Examples You've Already Seen

In the `agent_demo.ipynb` notebook, the SimpleAgent already demonstrated basic tool usage:
- **Weather API**: Getting real-time weather data from external services
- **Calculator**: Performing mathematical operations reliably  
- **Time Functions**: Accessing system information
- **File Operations**: Reading from CSV databases

#### Advanced Tool Concepts for AI Agents

Modern AI agents can use much more sophisticated tools:

**1. Function Calling**: LLMs can be trained to generate structured function calls
```json
{
  "function": "get_weather",
  "parameters": {"city": "San Francisco", "units": "celsius"}
}
```

**2. Tool Chaining**: Using output from one tool as input to another
- Search web → Extract URLs → Summarize content → Generate report

**3. Conditional Tool Use**: Using reasoning to decide when tools are needed
- "If the user asks about current events, use web search"
- "If the user asks for calculations, use calculator"

**4. Error Handling**: Gracefully handling tool failures
- Retry with different parameters
- Fall back to alternative tools
- Inform user about limitations

The key insight: **Tools transform LLMs from text generators into capable agents that can interact with the real world.**

### 13.2 Memory: Enabling Context and Learning

**Memory** allows AI agents to retain and use information across interactions, making them more context-aware and capable of learning from experience.

#### Types of Memory in AI Agents:

**1. Short-term Memory (Working Memory)**
- Conversation history within a single session
- Recent context and user preferences
- Temporary variables and state

**2. Long-term Memory (Persistent Memory)** 
- Information that persists across sessions
- User profiles and learned preferences
- Knowledge base of facts and procedures
- Historical interactions and outcomes

**3. Episodic Memory**
- Specific events and experiences
- "Remember when we discussed project X?"
- Context about past conversations

**4. Semantic Memory**
- General knowledge and concepts
- Rules, procedures, and methodologies
- Domain-specific expertise

In [None]:
# Demonstrating Memory Concepts with our LLM Agent
class MemoryAwareLLMAgent:
    """Enhanced LLM agent that demonstrates different types of memory"""
    
    def __init__(self, model="anthropic/claude-3.5-haiku"):
        self.model = model
        
        # Short-term memory: conversation history
        self.conversation_history = []
        
        # Long-term memory: persistent information
        self.user_preferences = {}
        self.learned_facts = {}
        self.interaction_count = 0
        
        # System prompt that includes memory capabilities
        self.system_prompt = """
You are a helpful AI assistant with memory capabilities. You can:
- Remember information from this conversation (short-term memory)
- Learn and recall user preferences (long-term memory)
- Reference past interactions when relevant

When users share personal information, preferences, or important facts, 
respond with: "I'll remember that [summarize what you learned]"

If you detect that the user is sharing information worth remembering, 
start your response with "MEMORY_STORE:" followed by key-value pairs in the format:
MEMORY_STORE: preference_type=value, fact_name=fact_value

Example: "MEMORY_STORE: name=John, food_preference=vegetarian"
"""
    
    def extract_memory_from_response(self, response):
        """Extract memory instructions from LLM response"""
        if "MEMORY_STORE:" in response:
            memory_line = response.split("MEMORY_STORE:")[1].split("\n")[0].strip()
            # Parse key=value pairs
            pairs = memory_line.split(",")
            for pair in pairs:
                if "=" in pair:
                    key, value = pair.split("=", 1)
                    key = key.strip()
                    value = value.strip()
                    
                    if key.startswith("preference_"):
                        self.user_preferences[key[11:]] = value  # Remove "preference_" prefix
                        print(f"💾 Stored preference: {key[11:]} = {value}")
                    else:
                        self.learned_facts[key] = value
                        print(f"💾 Stored fact: {key} = {value}")
    
    def build_memory_context(self):
        """Build memory context for the system prompt"""
        memory_context = ""
        if self.user_preferences:
            memory_context += f"Known user preferences: {self.user_preferences}\n"
        if self.learned_facts:
            memory_context += f"Known facts about user: {self.learned_facts}\n"
        return memory_context
    
    def process_with_memory(self, user_input):
        """Process input while leveraging memory"""
        self.interaction_count += 1
        
        # Build memory context
        memory_context = self.build_memory_context()
        
        # Create enhanced system prompt with memory
        enhanced_system_prompt = self.system_prompt
        if memory_context:
            enhanced_system_prompt += f"\nCurrent memory about this user:\n{memory_context}"
        
        # Add to conversation history
        self.conversation_history.append({"role": "user", "content": user_input})
        
        # Prepare messages with memory context
        messages = [{"role": "system", "content": enhanced_system_prompt}]
        messages.extend(self.conversation_history[-6:])  # Last 6 messages for context
        
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0.7,
                max_tokens=200
            )
            
            assistant_response = response.choices[0].message.content
            
            # Let the LLM decide what to remember and extract it
            self.extract_memory_from_response(assistant_response)
            
            # Clean up the response (remove MEMORY_STORE lines from user-facing response)
            clean_response = assistant_response.split("MEMORY_STORE:")[0].strip()
            
            # Add assistant response to history
            self.conversation_history.append({
                "role": "assistant", 
                "content": clean_response
            })
            
            return clean_response
            
        except Exception as e:
            return f"I encountered an error: {str(e)}"
    
    def show_memory_state(self):
        """Display current memory state"""
        print(f"\n📊 Memory State - Interactions: {self.interaction_count}")
        print(f"   Preferences: {self.user_preferences}")
        print(f"   Facts: {self.learned_facts}")

# Demonstrate memory capabilities
memory_agent = MemoryAwareLLMAgent()

print("MEMORY-AWARE AGENT DEMONSTRATION:")
print("(The LLM decides what to remember, not keyword matching)")
print("=" * 70)

# Conversation that demonstrates memory
conversation_turns = [
    "Hi, my name is Sarah and I prefer vegetarian food",
    "What do you remember about me?", 
    "Can you suggest a restaurant for dinner?",
    "I also really love spicy food and I'm from Italy",
    "What do you know about my food preferences now?"
]

for turn in conversation_turns:
    print(f"\n👤 User: {turn}")
    response = memory_agent.process_with_memory(turn)
    print(f"🤖 Agent: {response}")
    memory_agent.show_memory_state()

#### Key Insights from Memory Demonstration

**What We Learned:**

1. **LLM-Driven Memory**: Instead of keyword matching, we let the LLM decide what information is worth remembering
2. **Structured Memory Storage**: Using a simple protocol (`MEMORY_STORE:`) to extract structured data from natural language responses  
3. **Context Injection**: Feeding remembered information back into the system prompt for future conversations
4. **Memory Types**: Demonstrating both facts (name, location) and preferences (food choices)

#### Memory in Production Systems

**Real-world memory systems use:**

- **Vector Databases**: Store conversation embeddings for semantic search
- **Knowledge Graphs**: Model relationships between entities (user → likes → spicy food)
- **Retrieval-Augmented Generation (RAG)**: Search relevant memories before generating responses
- **Memory Summarization**: Compress long conversation histories into key facts
- **Privacy Controls**: Allow users to view/edit/delete their stored information

#### Memory Challenges

- **Context Window Limits**: LLMs can only process limited tokens at once
- **Relevance Filtering**: Not all memories are relevant to current conversation
- **Memory Conflicts**: Handling contradictory information over time
- **Privacy & Security**: Protecting sensitive personal information

### 13.3 The Four Pillars Working Together

The true power of AI agents emerges when all four components work in harmony:

**Example: A Personal Assistant Agent**

1. **Planning**: "User wants to plan a vacation - I need to break this into steps: budget, dates, preferences, booking"

2. **Reasoning**: "Based on their previous trips (memory) and stated budget constraints, I should recommend mid-range options in Europe rather than luxury resorts"

3. **Tools**: "Let me search flight prices (web search tool), check weather forecasts (weather API), and calculate total costs (calculator tool)"

4. **Memory**: "I remember they prefer cultural experiences over beaches, are vegetarian, and had a great time in Italy last year"

**The Result**: A capable agent that can handle complex, multi-step tasks while personalizing the experience based on learned preferences and real-time information.

## 14. From Simple to Advanced: What's Next?

We've now covered the complete foundation of agentic AI systems. Here's how this connects to what's coming:

### What We've Learned:
1. **LLMs vs Rule-Based**: LLMs understand language naturally
2. **API Usage**: How to call LLMs and handle responses
3. **Key Concepts**: Tokens, temperature, prompting
4. **Basic Agent**: Built a simple LLM-powered agent
5. **Four Pillars of Agentic AI**: Planning, Reasoning, Tools, Memory

### What's Coming Next:
1. **Thinking Models**: Models that show their reasoning process (you've seen this in the DeepSeek notebook)
2. **Advanced Prompting**: Chain of thought, ReAct, Tree of thoughts
3. **Production Tools**: Real API integrations and tool frameworks
4. **Advanced Memory**: Vector databases, RAG systems, knowledge graphs
5. **Complex Workflows**: Agents handling multi-step business processes