# Agentic Artificial Intelligence
## Exercise - Unit 02: Large Language Models

Welcome to the second unit of the Agentic Artificial Intelligence course! 

### Learning Objectives
By the end of this unit, you will:
1. understand tokenizers for language
2. understand chat templates and prompt templates
3. understand how to build simple LLM applications using chains
4. build your first (simple) LLM powered AI agent

## What are LLMs?
Large Language Models (LLMs) are AI models proficient in comprehending and producing human language. Trained on extensive text datasets, they acquire linguistic patterns, structures, and nuances, and are typically characterized by millions of parameters.

The majority of contemporary LLMs leverage the Transformer architecture, a deep learning model rooted in the "Attention" mechanism, which has garnered considerable attention since Google's introduction of BERT in 2018.

LLMs operate on a straightforward, highly effective principle: predicting the subsequent token based on a preceding sequence. A "token" is the fundamental unit of information an LLM processes, analogous to a word but optimized for efficiency, often representing sub-word units. For example, an LLM's vocabulary might be significantly smaller than the total number of words in a language (e.g., Llama 2 with ~32,000 tokens), utilizing combinable sub-word tokens like "interest" and "ing" to form "interesting," or appending "ed" for "interested." You can explore various tokenizers in the interactive playground below:

### Experimenting with tokenizers

In [1]:
# You might need to install the libraries if you haven't already
# pip install transformers torch

from transformers import AutoTokenizer

text_to_tokenize = "Tokenization is fascinating and the course Agentic Artificial Intelligence is exiting!"


def tokenize_text(text_to_tokenize, tokenizer_name = "bert-base-uncased"):

    # Load a tokenizer
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)

    # 1. Encode the text into token IDs
    encoded_ids = tokenizer.encode(text_to_tokenize)
    print(f"Original Text: '{text_to_tokenize}'")
    print(f"Token IDs: {encoded_ids}")

    # 2. Convert the IDs back into tokens (the actual text pieces)
    tokens = tokenizer.convert_ids_to_tokens(encoded_ids)
    print(f"Tokens: {tokens}")

    # 3. Decode the IDs back to the original string
    decoded_text = tokenizer.decode(encoded_ids)
    print(f"Decoded Text: '{decoded_text}'")

tokenize_text(text_to_tokenize)

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


Original Text: 'Tokenization is fascinating and the course Agentic Artificial Intelligence is exiting!'
Token IDs: [101, 19204, 3989, 2003, 17160, 1998, 1996, 2607, 4005, 2594, 7976, 4454, 2003, 22371, 999, 102]
Tokens: ['[CLS]', 'token', '##ization', 'is', 'fascinating', 'and', 'the', 'course', 'agent', '##ic', 'artificial', 'intelligence', 'is', 'exiting', '!', '[SEP]']
Decoded Text: '[CLS] tokenization is fascinating and the course agentic artificial intelligence is exiting! [SEP]'


In [2]:
tokenize_text(text_to_tokenize, 'gpt2')

Original Text: 'Tokenization is fascinating and the course Agentic Artificial Intelligence is exiting!'
Token IDs: [30642, 1634, 318, 13899, 290, 262, 1781, 15906, 291, 35941, 9345, 318, 33895, 0]
Tokens: ['Token', 'ization', 'Ġis', 'Ġfascinating', 'Ġand', 'Ġthe', 'Ġcourse', 'ĠAgent', 'ic', 'ĠArtificial', 'ĠIntelligence', 'Ġis', 'Ġexiting', '!']
Decoded Text: 'Tokenization is fascinating and the course Agentic Artificial Intelligence is exiting!'


Large Language Models (LLMs) utilize special, model-specific tokens to structure their generated output and input prompts. These tokens delineate components like sequences, messages, and responses, with the End-of-Sequence (EOS) token being particularly crucial. The exact form and variety of these special tokens differ significantly across different LLM providers, as further illustrated in the table below.

| Model | Provider | EOS Token | Functionality |
| :--- | :--- | :--- | :--- |
| **GPT4** | OpenAI | `<endoftext>` | End of message text |
| **Llama 3** | Meta (Facebook AI Research) | `<\|eot_id\|>` | End of sequence |
| **Deepseek-R1** | DeepSeek | `<\|end_of_sentence\|>` | End of message text |
| **SmolLM2** | Hugging Face | `<\|im_end\|>` | End of instruction or message |
| **Gemma** | Google | `<end_of_turn>` | End of conversation turn |

### Next token prediction

Large Language Models (LLMs) operate autoregressively, where each generated token serves as input for predicting the subsequent token. This iterative process continues until the model generates an End-of-Sequence (EOS) token, signaling the completion of the output.

<img src="AutoregressionSchema.gif" alt="Alt text" width="800">

An LLM decodes text iteratively until it encounters the End-of-Sequence (EOS) token. During each decoding loop:

1.  **Input Processing:** The input text is first tokenized. The model then generates a comprehensive representation of this token sequence, encoding both the meaning and positional information of each token.
2.  **Likelihood Scoring:** This sequence representation is fed into the model, which subsequently outputs scores indicating the probability of every token in its vocabulary being the next in the sequence.

<img src="DecodingFinal.gif" alt="Alt text" width="800">

In [3]:
import gradio as gr
from gradio_client import Client

def get_decoding_visualization(input_text):
    """
    Calls the remote Gradio app and returns the HTML visualization.
    """
    client = Client("agents-course/decoding_visualizer")
    result = client.predict(
        input_text=input_text,
        api_name="/get_beam_search_html"
    )
    
    # The client.predict() result may be a tuple; handle it to get the string.
    if isinstance(result, tuple):
        html_string = result[0]
    else:
        html_string = result
        
    return gr.HTML(value=html_string)

# Create the Gradio Interface
demo = gr.Interface(
    fn=get_decoding_visualization,
    inputs=gr.Textbox(label="Input Text", value="The Capital of France is"),
    outputs=gr.HTML(label="Decoding Visualization"),
    title="Decoding Visualizer",
    description="Visualize decoding steps from a remote model by entering text below."
)

# Launch the Gradio app
demo.launch()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




Loaded as API: https://agents-course-decoding-visualizer.hf.space ✔


If you want to learn more about Natural Language Processing, I recommend you to check out the Hugginfaces NLP course: https://huggingface.co/learn/llm-course/chapter1/1

## Attention is all you need

In the Transformer architecture, a crucial element is "Attention." This mechanism recognizes that when predicting the next word, not all words in a sentence contribute equally to the meaning. For instance, in the sentence "The capital of France is …", words such as "France" and "capital" are most significant for determining the subsequent word.

<img src="AttentionSceneFinal.gif" alt="Alt text" width="800">

The ability to pinpoint the most relevant words for predicting the next token has significantly boosted LLM effectiveness. While the fundamental principle of next-token prediction remains, substantial progress since GPT-2 has focused on scaling neural networks and extending the attention mechanism to handle increasingly longer sequences. This has led to the concept of "context length," which defines the maximum number of tokens an LLM can process and its corresponding attention span.

## How can I use LLMs?

You have two primary methods for utilizing models:

1.  **Local Execution:** This option is viable if your hardware meets the necessary specifications.
2.  **Cloud/API Access:** You can leverage cloud services, such as the Hugging Face Serverless Inference API.

In this course, we will primarily interact with models through APIs, with a later focus on deploying and running these models on your local machine.

## Making API Calls to LLMs with LangChain and Gemini

Now that we understand how LLMs work internally, let's learn how to actually use them in practice through API calls. We'll use **LangChain**, a popular framework for building applications with LLMs, and **Google's Gemini** as our LLM provider.

### What is LangChain?

LangChain is a framework designed to simplify the development of applications using large language models. It provides:

- **Unified API**: Work with different LLM providers (OpenAI, Google, Anthropic, etc.) using a consistent interface
- **Prompt Templates**: Structure and manage prompts effectively
- **Memory**: Maintain conversation context across multiple interactions
- **Chains**: Combine multiple LLM calls and operations
- **Tools**: Integrate LLMs with external APIs and services

### Why Use APIs Instead of Local Models?

1. **No Hardware Requirements**: Don't need powerful GPUs
2. **Latest Models**: Access to state-of-the-art models like Gemini Pro
3. **Scalability**: Handle multiple requests without resource constraints
4. **Maintenance-Free**: No need to manage model updates or infrastructure


### Setting Up Your Environment

Before we can make API calls, we need to set up our environment with the necessary API keys.


In [1]:
# First, let's install the required packages (if not already installed)
from agentic_ai.utils.helpers import check_api_setup

# Check if we have the required API key
api_configured = check_api_setup()


✅ Google API key found!
Key ends with:****eg


### Your First LLM API Call

Let's make our first API call to Google's Gemini model using LangChain!

In [2]:
from langchain.chat_models import init_chat_model

# Initialize the Gemini model
llm = init_chat_model("gemini-2.5-flash-lite", model_provider="google_genai", temperature=0.7, max_tokens=2000)

# Make your first API call!
response = llm.invoke("Explain what a Large Language Model is in simple terms.")

response.pretty_print()


Imagine a giant brain that's been trained on an enormous amount of text – like all the books, articles, websites, and conversations you can imagine. That's essentially what a **Large Language Model (LLM)** is.

Here's a breakdown in simple terms:

*   **"Large":** It's called "large" because it has a massive number of parameters (think of these like the connections in a brain) and it's been trained on an incredibly huge amount of data. The more data and parameters, the more complex and capable it is.

*   **"Language":** Its primary job is to understand, generate, and work with human language. It learns the patterns, grammar, facts, and even nuances of how we communicate.

*   **"Model":** It's a "model" because it's a mathematical representation that has learned from the data. It's not a conscious being, but rather a sophisticated program that can predict what word is most likely to come next in a sequence.

**What can it do? Think of it like a super-powered autocomplete or a very kn

### Understanding the Response Object

The LLM doesn't just return a string - it returns a rich response object with metadata.

In [3]:
for chunk in response:
    print(chunk)

('content', 'Imagine a giant brain that\'s been trained on an enormous amount of text – like all the books, articles, websites, and conversations you can imagine. That\'s essentially what a **Large Language Model (LLM)** is.\n\nHere\'s a breakdown in simple terms:\n\n*   **"Large":** It\'s called "large" because it has a massive number of parameters (think of these like the connections in a brain) and it\'s been trained on an incredibly huge amount of data. The more data and parameters, the more complex and capable it is.\n\n*   **"Language":** Its primary job is to understand, generate, and work with human language. It learns the patterns, grammar, facts, and even nuances of how we communicate.\n\n*   **"Model":** It\'s a "model" because it\'s a mathematical representation that has learned from the data. It\'s not a conscious being, but rather a sophisticated program that can predict what word is most likely to come next in a sequence.\n\n**What can it do? Think of it like a super-pow

In [4]:
print("📋 Response Object Details:")
print("=" * 40)
print(f"Type: {type(response)}")
print(f"Content: {response.content[:100]}...")
print(f"Response metadata: {response.response_metadata}")
print(f"Usage metadata: {response.usage_metadata}")

📋 Response Object Details:
Type: <class 'langchain_core.messages.ai.AIMessage'>
Content: Imagine a giant brain that's been trained on an enormous amount of text – like all the books, articl...
Response metadata: {'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}
Usage metadata: {'input_tokens': 12, 'output_tokens': 596, 'total_tokens': 608, 'input_token_details': {'cache_read': 0}}


🔍 Available Attributes:
- content: The actual text response
- response_metadata: Model-specific metadata
- usage_metadata: Token usage information

### Understanding with Chat Templates

Recap: A chat template's job is to convert a list of messages into a single, formatted string.

Let's define a sample conversation as a list of dictionaries. This is a standard format you'll encounter frequently.

In [5]:
conversation = [
    {"role": "system", "content": "You are Gemini, a helpful AI assistant built by Google."},
    {"role": "user", "content": "Hello! Can you write a short, 3-line poem about programming?"},
    {"role": "assistant", "content": "Sure, here's a poem about programming:..."},
    {"role": "user", "content": "That's great! Can you explain chat templates in LLMs?"}
]

If we were to guess the format, we might just join the content together. However, this would be the wrong approach.

In [6]:
# This is NOT the correct way to do it!
manual_prompt = ""
for message in conversation:
    manual_prompt += f"{message['role']}: {message['content']}\n"

print(manual_prompt)

system: You are Gemini, a helpful AI assistant built by Google.
user: Hello! Can you write a short, 3-line poem about programming?
assistant: Sure, here's a poem about programming:...
user: That's great! Can you explain chat templates in LLMs?



Feeding this string to a model does not result in the optimal response because it's not the format the model was trained on.<br>
I won't demonstrate it here as langchain applies the respective chat template for us automatically.

Instead, we should use a chat template to format the conversation.

The good thing about using langchain is that it handles this for us automatically

In [7]:
# showing how langchain applies the respective chat template for us automatically
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(conversation)
print(prompt.invoke({"role": "user", "content": "Hello! Can you write a short, 3-line poem about programming?"}))

messages=[SystemMessage(content='You are Gemini, a helpful AI assistant built by Google.', additional_kwargs={}, response_metadata={}), HumanMessage(content='Hello! Can you write a short, 3-line poem about programming?', additional_kwargs={}, response_metadata={}), AIMessage(content="Sure, here's a poem about programming:...", additional_kwargs={}, response_metadata={}), HumanMessage(content="That's great! Can you explain chat templates in LLMs?", additional_kwargs={}, response_metadata={})]


- As you can see, langchain automatically interpreted our conversation json and put it into the correct format.
- It does this by using its own classes SystemMessage, HumanMessage, AIMessage which are then applied to the chat template of a specific model.
- Note: This only works with models supported by the langchain framework.

In [8]:
from langchain.schema import SystemMessage, HumanMessage, AIMessage

system_message = SystemMessage(content="You are a helpful assistant that can answer questions and help with tasks.")
human_message_1 = HumanMessage(content="What is the capital of France?")
ai_message = AIMessage(content="The capital of France is Paris.")
human_message_2 = HumanMessage(content="What is the capital of Germany?")

# Now let's put this into a chat template
prompt = ChatPromptTemplate.from_messages([system_message, human_message_1, ai_message, human_message_2])

prompt

ChatPromptTemplate(input_variables=[], input_types={}, partial_variables={}, messages=[SystemMessage(content='You are a helpful assistant that can answer questions and help with tasks.', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is the capital of France?', additional_kwargs={}, response_metadata={}), AIMessage(content='The capital of France is Paris.', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is the capital of Germany?', additional_kwargs={}, response_metadata={})])

In [9]:
# First invoke the prompt template to get formatted messages, then pass to LLM
prompt.invoke({})

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant that can answer questions and help with tasks.', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is the capital of France?', additional_kwargs={}, response_metadata={}), AIMessage(content='The capital of France is Paris.', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is the capital of Germany?', additional_kwargs={}, response_metadata={})])

In [10]:
# Invoke the prompt template to get formatted messages, then pass to LLM
llm.invoke(prompt.invoke({}))

AIMessage(content='The capital of Germany is Berlin.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run--57798d64-c66c-4ee3-8b8b-b8d872406107-0', usage_metadata={'input_tokens': 38, 'output_tokens': 7, 'total_tokens': 45, 'input_token_details': {'cache_read': 0}})

### Working with Prompt Templates

- Not to be confused with chat templates!
- Instead of hardcoding prompts, we can use templates to make our prompts dynamic and reusable.

In [11]:
from langchain.prompts import PromptTemplate

# Create a prompt template
template = """You are an expert {role}. 

Question: {question}

Please provide a detailed answer that includes:
1. A clear explanation
2. Real-world examples
3. Practical implications

Answer:"""

prompt = PromptTemplate(
    input_variables=["role", "question"],
    template=template
)

# Use the template with different inputs
formatted_prompt = prompt.format(
    role="machine learning engineer",
    question="How do transformer models work?"
)

print("📝 Generated Prompt:")
print("=" * 40)
print(formatted_prompt)
print("\n" + "=" * 40)

# Get the response
response = llm.invoke(formatted_prompt)
print("\n🤖 Response:")
print(response.content)

📝 Generated Prompt:
You are an expert machine learning engineer. 

Question: How do transformer models work?

Please provide a detailed answer that includes:
1. A clear explanation
2. Real-world examples
3. Practical implications

Answer:


🤖 Response:
As an expert machine learning engineer, I'm thrilled to dive into the fascinating world of Transformer models. They have revolutionized Natural Language Processing (NLP) and are increasingly making their mark in other domains.

## How Transformer Models Work: A Deep Dive

At its core, a Transformer model is a neural network architecture designed to handle sequential data, most notably text. Its breakthrough lies in its ability to process entire sequences in parallel, unlike previous recurrent neural networks (RNNs) or convolutional neural networks (CNNs) that processed data step-by-step. This parallel processing, combined with a novel attention mechanism, allows Transformers to capture long-range dependencies and contextual relationships

### Specifying data types with pydantics BaseModel

Sometimes we want to specify the data type of sth. the LLM works with or returns.

For example, when you use a LLM to read PDFs and want to get the first name, last name and mobile number of all persons appearing in these PDFs.<br>
In your script you make an API call to the LLM and want to save the response to a .csv.<br>
Unfortunately sometimes the LLM does not return the data in the format you asked for, e.g. {first_name: <first_name>}.

In this case you could and should use pydantic's BaseModel not to be confused with the base model underlying an instruct model!

#### What is Pydantic?

**Pydantic** is a data validation library for Python that uses type hints to validate data structures. It's particularly useful when working with LLMs because it allows you to:

1. **Define structured outputs**: Specify exactly what format you want data in
2. **Automatic validation**: Ensure data matches expected types
3. **Parse complex data**: Convert dictionaries into structured Python objects
4. **Generate JSON schemas**: Create clear specifications for LLM outputs

The core of Pydantic is the `BaseModel` class, which you inherit from to define your data structures.


In [12]:
from pydantic import BaseModel

# Define a simple person data structure
class Person(BaseModel):
    first_name: str
    last_name: str
    age: int
    email: str

# Create an instance
person = Person(
    first_name="Alice",
    last_name="Smith",
    age=30,
    email="alice@example.com"
)

print("✅ Person object created successfully!")
print(f"Full name: {person.first_name} {person.last_name}")
print(f"Age: {person.age}")
print(f"Email: {person.email}")

# Convert to dictionary
print("\n📋 As dictionary:")
print(person.model_dump())

# Convert to JSON
print("\n📝 As JSON:")
print(person.model_dump_json(indent=2))


✅ Person object created successfully!
Full name: Alice Smith
Age: 30
Email: alice@example.com

📋 As dictionary:
{'first_name': 'Alice', 'last_name': 'Smith', 'age': 30, 'email': 'alice@example.com'}

📝 As JSON:
{
  "first_name": "Alice",
  "last_name": "Smith",
  "age": 30,
  "email": "alice@example.com"
}


### Pydantic's Type Validation

One of the most powerful features of Pydantic is automatic type validation. Let's see what happens when we try to create a Person with invalid data:

In [13]:
# Pydantic will try to coerce types when possible
print("Example 1: Type coercion")
person2 = Person(
    first_name="Bob",
    last_name="Jones",
    age="25",  # String will be converted to int
    email="bob@example.com"
)
print(f"✅ Age '25' (string) was converted to {person2.age} (int)")
print(f"Type of age: {type(person2.age)}\n")

# But it will raise an error for invalid data
print("Example 2: Invalid data")
try:
    invalid_person = Person(
        first_name="Charlie",
        last_name="Brown",
        age="not a number",  # This can't be converted to int
        email="charlie@example.com"
    )
except Exception as e:
    print(f"❌ Validation error: {e}")

Example 1: Type coercion
✅ Age '25' (string) was converted to 25 (int)
Type of age: <class 'int'>

Example 2: Invalid data
❌ Validation error: 1 validation error for Person
age
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not a number', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/int_parsing


### Advanced Pydantic Features

Pydantic provides additional features for more complex data structures:

In [14]:
from typing import Optional, List
from pydantic import BaseModel, Field

class ContactInfo(BaseModel):
    """Contact information for a person extracted from a document."""
    first_name: str = Field(description="Person's first name")
    last_name: str = Field(description="Person's last name")
    mobile: str = Field(description="Mobile phone number")
    email: Optional[str] = Field(default=None, description="Email address if available")
    
class DocumentExtraction(BaseModel):
    """Results from extracting contacts from a document."""
    document_name: str
    contacts: List[ContactInfo]
    extraction_date: str

# Example: Simulating extraction from a PDF
extraction_result = DocumentExtraction(
    document_name="business_cards.pdf",
    contacts=[
        ContactInfo(
            first_name="Sarah",
            last_name="Johnson",
            mobile="+1-555-0123",
            email="sarah.j@company.com"
        ),
        ContactInfo(
            first_name="Michael",
            last_name="Chen",
            mobile="+1-555-0124"
            # email is optional, so we can omit it
        )
    ],
    extraction_date="2025-10-05"
)

print("📄 Document Extraction Result:")
print("=" * 50)
print(f"Document: {extraction_result.document_name}")
print(f"Found {len(extraction_result.contacts)} contacts:\n")

for i, contact in enumerate(extraction_result.contacts, 1):
    print(f"{i}. {contact.first_name} {contact.last_name}")
    print(f"   Mobile: {contact.mobile}")
    if contact.email:
        print(f"   Email: {contact.email}")
    print()

# This structured data can easily be converted to CSV, JSON, or database records
print("💾 As JSON (ready for saving to file or database):")
print(extraction_result.model_dump_json(indent=2))

📄 Document Extraction Result:
Document: business_cards.pdf
Found 2 contacts:

1. Sarah Johnson
   Mobile: +1-555-0123
   Email: sarah.j@company.com

2. Michael Chen
   Mobile: +1-555-0124

💾 As JSON (ready for saving to file or database):
{
  "document_name": "business_cards.pdf",
  "contacts": [
    {
      "first_name": "Sarah",
      "last_name": "Johnson",
      "mobile": "+1-555-0123",
      "email": "sarah.j@company.com"
    },
    {
      "first_name": "Michael",
      "last_name": "Chen",
      "mobile": "+1-555-0124",
      "email": null
    }
  ],
  "extraction_date": "2025-10-05"
}


### Using Pydantic with LLMs

Now that we understand Pydantic's BaseModel, let's see how it helps when working with LLMs. LangChain has built-in support for Pydantic models, allowing you to get structured outputs from LLMs!

In [None]:
from pydantic import BaseModel, Field
from typing import List

# Define the structure we want the LLM to return
class MovieReview(BaseModel):
    """A movie review with structured information."""
    title: str = Field(description="The movie title")
    year: int = Field(description="The year the movie was released")
    genre: List[str] = Field(description="List of genres (e.g., Action, Comedy, Drama)")
    rating: float = Field(description="Rating from 0.0 to 10.0")
    summary: str = Field(description="Brief one-sentence summary")
    recommendation: str = Field(description="Would you recommend it? (Yes/No/Maybe)")

# Use LangChain's structured output feature
# This ensures the LLM returns data in the exact format we specified
structured_llm = llm.with_structured_output(MovieReview)

# Ask the LLM to analyze a movie
prompt = """Analyze the movie 'The Matrix' and provide a review with the following information:
- Title
- Release year
- Genres
- Your rating (0-10)
- Brief one-sentence summary
- Whether you'd recommend it"""

print("🎬 Requesting structured movie review from LLM...")
print("=" * 50)

# The LLM will return a MovieReview object, not just text!
review = structured_llm.invoke(prompt)

print(f"✅ Received structured output!\n")
print(f"Title: {review.title}")
print(f"Year: {review.year}")
print(f"Genre: {', '.join(review.genre)}")
print(f"Rating: {review.rating}/10.0")
print(f"Summary: {review.summary}")
print(f"Recommendation: {review.recommendation}")

print(f"\n📊 Data type: {type(review)}")
print(f"✅ This is a validated MovieReview object, not just text!")

# We can now easily work with this data
print("\n💾 Converting to different formats:")
print("\n1. As dictionary:")
print(review.model_dump())

print("\n2. As JSON:")
print(review.model_dump_json(indent=2))

🎬 Requesting structured movie review from LLM...
✅ Received structured output!

Title: The Matrix
Year: 1999
Genre: Action, Sci-Fi
Rating: 9.5/10.0
Summary: A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its creators.
Recommendation: Yes

📊 Data type: <class '__main__.MovieReview'>
✅ This is a validated MovieReview object, not just text!

💾 Converting to different formats:

1. As dictionary:
{'title': 'The Matrix', 'year': 1999, 'genre': ['Action', 'Sci-Fi'], 'rating': 9.5, 'summary': 'A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its creators.', 'recommendation': 'Yes'}

2. As JSON:
{
  "title": "The Matrix",
  "year": 1999,
  "genre": [
    "Action",
    "Sci-Fi"
  ],
  "rating": 9.5,
  "summary": "A computer hacker learns from mysterious rebels about the true nature of his reality and his role in the war against its creators.",
  "recommendatio

### Why Use Structured Outputs with Pydantic?

Using Pydantic models with LLMs provides several key advantages:

1. **Consistency**: The LLM will always return data in the exact format you specify, making your code more reliable
2. **Type Safety**: Pydantic validates types automatically - if the LLM returns invalid data, you'll get a clear error
3. **Easy Integration**: The structured output can be directly saved to databases, CSV files, or used in your application
4. **No Parsing Needed**: You don't need to write regex or parsing code to extract information from text
5. **Self-Documenting**: The Field descriptions help the LLM understand what you want

This is especially valuable for production applications where you need reliable, predictable outputs!

### Practical Example: Extracting Contact Information from Text

Remember the PDF extraction scenario we mentioned earlier? Let's see how Pydantic makes this easy:

In [16]:
from pydantic import BaseModel, Field
from typing import List, Optional

# Define the exact structure we want
class Contact(BaseModel):
    """A single contact extracted from text."""
    first_name: str = Field(description="Person's first name")
    last_name: str = Field(description="Person's last name")
    phone: str = Field(description="Phone number in any format")
    email: Optional[str] = Field(default=None, description="Email address if mentioned")
    company: Optional[str] = Field(default=None, description="Company name if mentioned")

class ContactList(BaseModel):
    """List of contacts extracted from a document."""
    contacts: List[Contact] = Field(description="All contacts found in the text")

# Create a structured LLM
contact_extractor = llm.with_structured_output(ContactList)

# Sample text that might come from a PDF or document
sample_text = """
From the business meeting notes:

Sarah Johnson from TechCorp reached out regarding the partnership. 
Her contact details are sarah.johnson@techcorp.com and 555-0123.

Also met with Michael Chen, mobile: (555) 0124. He's with DataSystems Inc.

Follow up with Jennifer Lopez at 555.0125, jlopez@example.com
"""

print("📄 Extracting contacts from text...")
print("=" * 50)
print(f"Original text:\n{sample_text}\n")
print("=" * 50)

# Extract contacts with structured output
prompt = f"""Extract all person contact information from this text. 
Include first name, last name, phone number, email (if present), and company (if mentioned).

Text:
{sample_text}
"""

result = contact_extractor.invoke(prompt)

print(f"\n✅ Extracted {len(result.contacts)} contacts:\n")

for i, contact in enumerate(result.contacts, 1):
    print(f"{i}. {contact.first_name} {contact.last_name}")
    print(f"   📞 Phone: {contact.phone}")
    if contact.email:
        print(f"   📧 Email: {contact.email}")
    if contact.company:
        print(f"   🏢 Company: {contact.company}")
    print()

# Now we can easily save this to CSV or database
print("💾 Ready to save to CSV/database:")
print("-" * 50)
import json
print(json.dumps(result.model_dump(), indent=2))

📄 Extracting contacts from text...
Original text:

From the business meeting notes:

Sarah Johnson from TechCorp reached out regarding the partnership. 
Her contact details are sarah.johnson@techcorp.com and 555-0123.

Also met with Michael Chen, mobile: (555) 0124. He's with DataSystems Inc.

Follow up with Jennifer Lopez at 555.0125, jlopez@example.com



✅ Extracted 3 contacts:

1. Sarah Johnson
   📞 Phone: 555-0123
   📧 Email: sarah.johnson@techcorp.com
   🏢 Company: TechCorp

2. Michael Chen
   📞 Phone: (555) 0124
   🏢 Company: DataSystems Inc.

3. Jennifer Lopez
   📞 Phone: 555.0125
   📧 Email: jlopez@example.com

💾 Ready to save to CSV/database:
--------------------------------------------------
{
  "contacts": [
    {
      "first_name": "Sarah",
      "last_name": "Johnson",
      "phone": "555-0123",
      "email": "sarah.johnson@techcorp.com",
      "company": "TechCorp"
    },
    {
      "first_name": "Michael",
      "last_name": "Chen",
      "phone": "(555) 0124",
     

### Creating LLM Chains

Chains allow us to combine prompts and LLMs into reusable components. This is the foundation of more complex AI applications. At its core, it's a way to sequence a series of calls, not just to an LLM, but also to other components like prompt templates and output parsers.

You can think of it as a way to create a reusable and structured interaction with an LLM for a specific task.

At its core, a simple chain does the following:
1. Receives input variables.
2. Uses a PromptTemplate to format those variables into a complete prompt string.
3. Sends the formatted prompt to an LLM.
4. Returns the LLM's output.

#### Why use LLM Chains?
The main purpose of using chains is to create more complex applications by linking different components together in a sequence. Instead of writing repetitive code to handle prompts and LLM calls, you can encapsulate that logic into a chain.
This has several benefits:
- Modularity: Chains are self-contained and can be easily reused across your application.
- Composition: You can link multiple chains together to create more sophisticated workflows. For example, the output of one chain can be the input to another.
- Simplicity: They provide a high-level, easy-to-understand interface for working with LLMs.

In [17]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Create a prompt template for generating creative content
creative_template = """You are a creative writing assistant.

Topic: {topic}
Style: {style}
Length: {length}

Write a {style} piece about {topic} that is approximately {length} long.
Make it engaging and original.

Content:"""

creative_prompt = PromptTemplate(
    input_variables=["topic", "style", "length"],
    template=creative_template
)

# Create a chain that combines the prompt and LLM
creative_chain = LLMChain(
    llm=llm,
    prompt=creative_prompt,
    verbose=True  # This will show us what's happening behind the scenes
)

# Use the chain by calling .run()
result = creative_chain.run(
    topic="artificial intelligence in daily life",
    style="short story",
    length="2-3 paragraphs"
)

print("📖 Generated Creative Content:")
print("=" * 50)
print(result)

  creative_chain = LLMChain(
  result = creative_chain.run(




[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a creative writing assistant.

Topic: artificial intelligence in daily life
Style: short story
Length: 2-3 paragraphs

Write a short story piece about artificial intelligence in daily life that is approximately 2-3 paragraphs long.
Make it engaging and original.

Content:[0m

[1m> Finished chain.[0m
📖 Generated Creative Content:
The morning alarm didn't blare; it hummed a gentle, personalized melody, a composition crafted overnight by my home AI, "Aura," based on my sleep cycle data. As I stretched, the smart blinds silently parted, revealing a sky painted in soft pre-dawn hues. Aura had already brewed my coffee to the exact temperature and strength I preferred, and the news feed, curated to my interests, scrolled across the kitchen counter display, highlighting only the truly relevant stories. It was a symphony of seamless efficiency, where every need, anticipated and met before I even consciou

#### When to Use a Simple LLMChain
A standard LLMChain is your go-to for any task that can be accomplished with a single, stateless call to an LLM. Think of it as a "one-shot" operation.
Use Cases:
- Summarization: You provide a piece of text and ask the LLM to summarize it.
- Question-Answering (without external knowledge): Answering a question based only on the information you provide in the prompt.
- Text Transformation: Rephrasing a sentence, changing the tone of a paragraph (e.g., from formal to casual), or translating text.
- Simple Extraction: Pulling out specific pieces of information from a block of text, like a name, date, or company from an email.
- Brainstorming/Generation: Generating ideas, product names, marketing copy, or a simple piece of code based on a description.

The key characteristic is that the task doesn't require memory of past interactions or multiple logical steps.

#### When You Need Different, More Complex Chains
You need to move beyond a simple LLMChain when your task involves multiple steps, requires external data, or needs to make decisions.

Example 1: Question-Answering Over Your Own Documents
When you need the LLM to answer questions based on specific data it wasn't trained on (e.g., a PDF, a database, or a website).
- Chain Type: RetrievalQA Chain
- Example: You have a 100-page technical manual for a product. You want to build a chatbot that can answer user questions about it.
1. The RetrievalQA chain first takes the user's question ("How do I reset the device?").
2. It searches your document for the most relevant chunks of text (the "retrieval" step).
3. It then feeds those relevant chunks, along with the original question, to the LLM to generate a final answer.

This prevents the LLM from making things up and grounds its answer in your specific data.

Example 2: Choosing a Path Based on Input
When you have multiple different chains (with different prompts) and you want to dynamically choose which one to run based on the user's query.
- Chain Type: RouterChain
- Example: A customer service bot that can handle different types of queries.
1. If the user's input is about a "billing issue," the RouterChain sends it to the BillingChain.
2. If the input is about a "technical problem," it routes it to the TechnicalSupportChain.

To start a bit easier let's first build a converation chain to enable actual conversations with our LLM.

### Adding Memory to Conversations

One of the most important features for AI agents is the ability to remember previous parts of a conversation. Let's implement conversation memory!

In [18]:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

# Create memory to store conversation history
memory = ConversationBufferMemory()

# Create a conversation chain with memory
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

# Let's have a conversation!
print("🗣️ Starting a conversation with memory:")
print("=" * 50)

# First message
response1 = conversation.predict(input="Hi! My name is Alex and I'm learning about AI agents.")
print(f"User: Hi! My name is Alex and I'm learning about AI agents.")
print(f"AI: {response1}\n")

# Second message - the AI should remember the name
response2 = conversation.predict(input="What are the key components I should focus on?")
print(f"User: What are the key components I should focus on?")
print(f"AI: {response2}\n")

# Third message - test if it remembers the context
response3 = conversation.predict(input="Can you remind me what my name is?")
print(f"User: Can you remind me what my name is?")
print(f"AI: {response3}\n")

# Let's examine what's stored in memory
print("🧠 Memory Contents:")
print("=" * 30)
print(memory.buffer)

  memory = ConversationBufferMemory()
  conversation = ConversationChain(


🗣️ Starting a conversation with memory:


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi! My name is Alex and I'm learning about AI agents.
AI:[0m

[1m> Finished chain.[0m
User: Hi! My name is Alex and I'm learning about AI agents.
AI: Hello Alex! It's so wonderful to meet you! I'm delighted you're interested in AI agents. That's a topic I find absolutely fascinating, and I'm more than happy to chat with you about it.

You see, I'm a large language model, an AI myself, and I've been trained by Google. My "brain" is essentially a massive neural network, and I've processed an enormous amount of text and code. This allows me to understand and generate human-like text, translate languag

### Streaming Responses

For better user experience, especially with long responses, we can stream the response as it's being generated instead of waiting for the complete response.

In [19]:
import time
import sys

# Create a streaming LLM
streaming_llm = init_chat_model("gemini-2.5-flash-lite", model_provider="google_genai", temperature=0.7, max_tokens=2000, streaming=True)

print("🌊 Streaming Response Example:")
print("=" * 40)
print("Question: Explain the concept of attention mechanism in transformers in detail.\n")
print("Response (streaming):")
print("-" * 20)

# Stream the response
for chunk in streaming_llm.stream("Explain the concept of attention mechanism in transformers in detail."):
    print(chunk.content, end='', flush=True)
    time.sleep(0.02)  # Small delay to make streaming visible

print("\n" + "-" * 20)
print("✅ Streaming complete!")


🌊 Streaming Response Example:
Question: Explain the concept of attention mechanism in transformers in detail.

Response (streaming):
--------------------
Let's dive deep into the concept of the **attention mechanism** in Transformers. It's the core innovation that allows these models to excel at sequential data processing, particularly in natural language processing (NLP).

## The Problem Attention Solves: Limitations of Traditional RNNs/LSTMs

Before attention, Recurrent Neural Networks (RNNs) and their variants like Long Short-Term Memory (LSTM) were the go-to for sequence modeling. They process information sequentially, maintaining a "hidden state" that summarizes past information. However, they had a few key limitations:

1.  **Information Bottleneck:** As sequences get longer, the hidden state becomes a bottleneck. It has to compress all the relevant information from the entire past sequence into a fixed-size vector. This leads to forgetting information from earlier parts of the s

#### Limits of LLM Chains - From Chains to Agents
Chains are powerful, but they have limitations:
- **Rigidity**: Chains follow a predetermined path. They execute steps A -> B -> C. They are not good at handling unexpected inputs or dynamically changing their course of action. This is the primary reason to use an Agent instead, which can make decisions on the fly.
- **Error Propagation***: In a long SequentialChain, an error or a poorly-formed output from an early step will negatively impact all subsequent steps. A small "hallucination" in step 1 can become a major factual error by step 5.
- **Complexity and Debugging**: Very long and complex chains can become difficult to manage, debug, and optimize. It can turn into "prompt engineering hell," where tweaking one prompt breaks another one down the line.
- **Latency and Cost**: Every step in a chain that involves an LLM is another API call. This increases the total time it takes to get a final answer and increases the cost, as you're using more tokens.

Example: Interacting with APIs or External Tools
When the LLM needs to take action or get information from the outside world (e.g., check the weather, perform a calculation, search the web).
- This is where you move from Chains to Agents. An Agent uses an LLM not just to generate text, but to decide which "tool" to use next. While not strictly a "chain," it's the logical next step.
- Example: A user asks, "What's the weather like in Paris right now, and can you write a short poem about it?"
The Agent's LLM decides it first needs to use the Weather API tool.
It calls the tool with "Paris" as the input and gets the current weather data.
It then uses that data as context for a second LLM call to generate the poem.

## How are LLMs used in AI Agents?

LLMs are the core intelligence of AI Agents, enabling them to comprehend and produce human language. They are responsible for interpreting user instructions, maintaining conversational context, formulating plans, and selecting appropriate tools. For now, it's essential to understand that the LLM serves as the agent's "brain," a concept we will explore in greater detail later in this Unit.

### Practical Exercise: Building a Simple AI Assistant using a LLMChain

Let's combine everything we've learned to build a simple AI assistant that can help with different tasks!

In [20]:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferWindowMemory

class SimpleAIAssistant:
    """A simple AI assistant that can help with various tasks."""
    
    def __init__(self, llm):
        self.llm = llm
        # Use window memory to keep only recent conversation history
        self.memory = ConversationBufferWindowMemory(k=5)  # Keep last 5 exchanges
        
        # Define different prompt templates for different tasks
        self.templates = {
            'general': """You are a helpful AI assistant. You are knowledgeable, friendly, and concise.
            
            Previous conversation:
            {history}
            
            Human: {input}
            Assistant:""",
            
            'explain': """You are an expert educator. Explain complex topics in simple, easy-to-understand terms.
            
            Previous conversation:
            {history}
            
            Topic to explain: {input}
            
            Explanation:""",
            
            'code': """You are a coding assistant. Help with programming questions and provide clean, well-commented code.
            
            Previous conversation:
            {history}
            
            Coding request: {input}
            
            Response:"""
        }
        
    def chat(self, message, task_type='general'):
        """Chat with the assistant."""
        template = self.templates.get(task_type, self.templates['general'])
        prompt = PromptTemplate(
            input_variables=['history', 'input'],
            template=template
        )
        
        chain = LLMChain(
            llm=self.llm,
            prompt=prompt,
            memory=self.memory
        )
        
        response = chain.predict(input=message)
        return response
    
    def explain(self, topic):
        """Ask the assistant to explain a topic."""
        return self.chat(topic, task_type='explain')
    
    def help_with_code(self, coding_request):
        """Ask for coding help."""
        return self.chat(coding_request, task_type='code')
    
    def print_memory(self):
        """Print the conversation history."""
        print("🧠 Conversation History:")
        print("=" * 30)
        print(self.memory.buffer)
        print("=" * 30)
    
    def clear_memory(self):
        """Clear the conversation memory."""
        self.memory.clear()
        print("🧠 Memory cleared!")

# Create our AI assistant
assistant = SimpleAIAssistant(llm)

print("🤖 AI Assistant Created!")
print("You can now use:")
print("- assistant.chat('your message')")
print("- assistant.explain('topic')")
print("- assistant.help_with_code('coding question')")
print("- assistant.clear_memory()")

🤖 AI Assistant Created!
You can now use:
- assistant.chat('your message')
- assistant.explain('topic')
- assistant.help_with_code('coding question')
- assistant.clear_memory()


  self.memory = ConversationBufferWindowMemory(k=5)  # Keep last 5 exchanges


In [21]:
assistant.chat('Hi please help me with my course on agentic artificial intelligence.')

"Hi there! I'd be happy to help you with your agentic AI course. What can I assist you with today? Do you have specific questions about concepts, algorithms, or perhaps some assignments?"

In [22]:
assistant.explain('I dont know how to explain this to my 5 year old.')

'Okay, imagine you have a super-smart toy robot, right?\n\nThis robot isn\'t just like a regular toy that only does what you tell it to do when you press a button. This robot is special because it can **think a little bit for itself** and **decide what to do** to achieve a goal.\n\nThink about it like this:\n\n*   **Your regular toy robot:** If you press the "walk forward" button, it walks forward. If you press "turn left," it turns left. It only does *exactly* what you tell it.\n\n*   **An "agentic" robot (or AI):** This robot might have a goal, like "clean up the toys." Instead of you telling it "pick up this block," then "put it in the box," then "pick up that car," then "put it in the box," the agentic robot can **figure out how to do it on its own.**\n\nIt might:\n\n1.  **See** the toys.\n2.  **Decide** which toy to pick up first (maybe the closest one).\n3.  **Move** to the toy.\n4.  **Pick up** the toy.\n5.  **Find** the toy box.\n6.  **Put** the toy in the box.\n7.  Then, it **

In [23]:
assistant.print_memory()

🧠 Conversation History:
Human: Hi please help me with my course on agentic artificial intelligence.
AI: Hi there! I'd be happy to help you with your agentic AI course. What can I assist you with today? Do you have specific questions about concepts, algorithms, or perhaps some assignments?
Human: I dont know how to explain this to my 5 year old.
AI: Okay, imagine you have a super-smart toy robot, right?

This robot isn't just like a regular toy that only does what you tell it to do when you press a button. This robot is special because it can **think a little bit for itself** and **decide what to do** to achieve a goal.

Think about it like this:

*   **Your regular toy robot:** If you press the "walk forward" button, it walks forward. If you press "turn left," it turns left. It only does *exactly* what you tell it.

*   **An "agentic" robot (or AI):** This robot might have a goal, like "clean up the toys." Instead of you telling it "pick up this block," then "put it in the box," then "

In [24]:
assistant.clear_memory()

🧠 Memory cleared!


In [25]:
assistant.print_memory()

🧠 Conversation History:

