# Langchain Workshop 3: Tools, Agents, and Vectorstores

Welcome back! Today we're diving into some of the most powerful features of Langchain:

1. **Tools** - Custom functions that extend what LLMs can do
2. **Agents** - LLMs that can decide which tools to use and when
3. **Vectorstores** - Semantic search for finding relevant information

Let's get started!

In [11]:
# Install required packages
!pip install langchain langchain-openai langchain-community faiss-cpu python-dotenv langgraph requests beautifulsoup4 wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (pyproject.toml) ... [?25ldone
[?25h  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11757 sha256=78d4746bca55cd5c7af3975407daa6c6401f621fa6f63809d4d268ae13508efe
  Stored in directory: /home/mdvmlhtr/.cache/pip/wheels/63/47/7c/a9688349aa74d228ce0a9023229c6c0ac52ca2a40fe87679b8
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


## Setup: Loading Our API Keys Securely 🔐

First, let's load our environment variables. 

In real life, **NEVER** put API keys directly in your code! For this Kaggle workshop only, you can set your API key in the `KAGGLE_BACKUP` variable below. Though locally, you should use a `.env` file with the following content: 
```.env
OPENAI_API_KEY="your-key"
```

In [25]:
# Only if running this on Kaggle
KAGGLE_BACKUP = "sk-..."  # Replace with your OpenAI key for Kaggle only

In [28]:
import os
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.tools import tool
from langchain_community.vectorstores import FAISS
from langgraph.prebuilt import create_react_agent
from dotenv import load_dotenv

load_dotenv()

# Load API key with Kaggle backup
api_key = os.getenv("OPENAI_API_KEY", KAGGLE_BACKUP)
if api_key:
    print(f"✅ API key loaded successfully: {api_key[:12]}...")
else:
    print("❌ No API key found. Make sure you have a .env file with OPENAI_API_KEY=")

llm = ChatOpenAI(model="gpt-5-nano", temperature=0.3, api_key=api_key)

✅ API key loaded successfully: sk-proj-Cdxa...


## 1. Tools: Giving LLMs Superpowers

LLMs are great at text, but they can't do math reliably or access external data. That's where **tools** come in!

A tool is just a Python function with a special decorator that tells the LLM:
- What the function does
- What inputs it needs
- When to use it

Let's create a simple calculator tool:

In [30]:
@tool
def calculator(expression: str) -> str:
    """Evaluates a mathematical expression with python syntax. Use this for any math calculations.
    
    Args:
        expression: A mathematical expression with python operators (e.g., "2 * 2" or "10 ** 2")
    """
    try:
        # WARNING: this is REALLY dangerous in production code! Do not use eval() with untrusted input!
        result = eval(expression)
        return f"The answer is {result}"
    except Exception as e:
        return f"Error calculating with input '{expression}': {str(e)}"

# Test it out!
print(calculator.invoke("15 * 7"))
print(calculator.invoke("15 x 2"))
print(f"Tool name: {calculator.name}")
print(f"Tool description: {calculator.description}")

The answer is 105
Error calculating with input '15 x 2': invalid syntax (<string>, line 1)
Tool name: calculator
Tool description: Evaluates a mathematical expression with python syntax. Use this for any math calculations.

    Args:
        expression: A mathematical expression with python operators (e.g., "2 * 2" or "10 ** 2")


In [31]:
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

@tool
def get_wikipedia_summary(topic: str) -> str:
    """Fetches a summary from Wikipedia for a given topic. Use this to get factual information about people, places, events, or concepts.
    
    Args:
        topic: The topic to search for on Wikipedia (e.g., "Albert Einstein", "Python programming")
    """
    try:
        wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
        summary = wikipedia.run(topic)
        return summary
    except Exception as e:
        return f"Error fetching Wikipedia data: {str(e)}"

# Test it!
print(get_wikipedia_summary.invoke("Python programming"))

Page: Python (programming language)
Summary: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically type-checked and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language. Python 3.0, released in 2008, was a major revision and not completely backward-compatible with earlier versions. Recent versions, such as Python 3.13, 3.12 and older (and 3.14), have added capabilities and keywords for typing (and more; e.g. increasing speed); helping with (optional) static typing. Currently only versions in the 3.x series are supported.
Python consistently ranks as one of the most popular programming languages, and it has gained widespread use in the machine learning community. 

## 2. Agents: LLMs That Make Decisions

An **agent** is an LLM that can:
1. **Reason** about what to do
2. **Act** by calling tools
3. **Observe** the results
4. **Repeat** until it solves the problem

This is called the **ReAct** pattern (Reason + Act). 

What makes agents powerful is combining multiple tools - like using Wikipedia for facts AND a calculator for math. Let's see this in action!

In [32]:
# Create our agent with multiple tools
tools = [calculator, get_wikipedia_summary]

# LangGraph's create_react_agent makes this super easy!
agent = create_react_agent(llm, tools)

print("Agent created with tools:", [tool.name for tool in tools])

Agent created with tools: ['calculator', 'get_wikipedia_summary']


In [33]:
# Now let's ask a question that requires BOTH tools!
question = "What is the population and GDP of Tokyo according to Wikipedia? Assume 5 percent of people in Tokyo are tourists on the average day. What is the GDP per tourist of Tokyo?"

# Run the agent and watch it work
print(f"Question: {question}\n")
print("=" * 80)

messages = [{"role": "user", "content": question}]
result = agent.invoke({"messages": messages})

# The agent returns all messages - let's see its reasoning
for msg in result["messages"]:
    if hasattr(msg, "content") and msg.content:
        role = "User" if msg.type == "human" else "Agent" if msg.type == "ai" else "Tool"
        print(f"\n{role}: {msg.content}")

Question: What is the population and GDP of Tokyo according to Wikipedia? Assume 5 percent of people in Tokyo are tourists on the average day. What is the GDP per tourist of Tokyo?


User: What is the population and GDP of Tokyo according to Wikipedia? Assume 5 percent of people in Tokyo are tourists on the average day. What is the GDP per tourist of Tokyo?

Tool: Page: Tokyo
Summary: Tokyo, officially the Tokyo Metropolis, is the capital and most populous city in Japan. With a population of over 14 million in the city proper in 2023, it is one of the most populous urban areas in the world. The Greater Tokyo Area, which includes Tokyo and parts of six neighboring prefectures, is the most populous metropolitan area in the world, with 41 million residents as of 2024.
Lying at the head of Tokyo Bay, Tokyo is part of the Kantō region, on the central coast of Honshu, Japan's largest island. It is Japan's economic center and the seat of the Japanese government and the Emperor of Japan. The T

### What Just Happened? 🤯

The agent just:
1. **Understood** the question required two steps
2. **Called** `get_wikipedia_summary` to fetch Tokyo's population
3. **Extracted** the population number from the Wikipedia summary
4. **Called** `calculator` to multiply by 0.05
5. **Returned** a final answer with context

This is the power of agents - they can chain together multiple tools to solve complex problems!

## 🎯 Exercise 1: Build Your Own Multi-Tool Agent

Create an agent that can answer geography and math questions! Here's what to build:

1. **Create a tool** called `get_country_info` that fetches information about a country from Wikipedia
2. **Use the calculator tool** we already have
3. **Create an agent** with both tools using `create_react_agent`
4. **Ask it this question**: "What is the area of Canada in square kilometers? If you drove across it at 100 km/h non-stop, how many hours would that take?"

Hints:
- Copy the pattern from `get_wikipedia_summary` but search for country names
- The agent will need to extract the area from Wikipedia, then calculate drive time
- Remember: distance = speed × time, so time = distance / speed
- Use `create_react_agent(llm, [your_tools])` to create the agent

In [None]:
# Your code here!



## 3. Vectorstores and Embeddings: Finding Relevant Information

Imagine you have a huge library of documents. How do you find the most relevant ones for a question?

Traditional search uses keywords, but **semantic search** understands meaning. Here's how:

### What are Embeddings?

**Embeddings** turn text into lists of numbers (vectors) that capture meaning:
- Similar texts → similar vectors
- "dog" and "puppy" → close together
- "dog" and "spaceship" → far apart

Think of it like a map where similar concepts are near each other!

### What are Vectorstores?

A **vectorstore** is a database that:
1. Stores text alongside its embedding
2. Can find similar texts super fast
3. Returns the most relevant documents for a query

We're using **FAISS** (Facebook AI Similarity Search) as our vectorstore

### Loading a FAISS Vectorstore

We've prepped a FAISS vectorstore with some fun data from Kaggle. Let's load it:

In [None]:
# Initialize embeddings (same model used to create the vectorstore)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Load the pre-built vectorstore
vectorstore = FAISS.load_local(
    "faiss_index",  # Path to your vectorstore folder
    embeddings,
    allow_dangerous_deserialization=True  # Only use with trusted data!
)

print("Vectorstore loaded successfully!")

### Similarity Search

Now we can ask questions and find relevant documents:

In [None]:
# Find the 3 most relevant documents
query = "Tell me something funny"
docs = vectorstore.similarity_search(query, k=3)

for i, doc in enumerate(docs, 1):
    print(f"\n--- Document {i} ---")
    print(doc.page_content)
    print(f"Metadata: {doc.metadata}")

### RAG: Retrieval-Augmented Generation

**RAG** combines vectorstores with LLMs:
1. **Retrieve** relevant documents from the vectorstore
2. **Augment** the prompt with those documents
3. **Generate** an answer using the LLM

This lets the LLM answer questions about data it wasn't trained on!

In [None]:
from langchain.chains import RetrievalQA

# Create a RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # "stuff" all docs into the prompt
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3}),
    return_source_documents=True
)

# Ask a question!
question = "What's the funniest thing in the dataset?"
result = qa_chain.invoke({"query": question})

print("Answer:", result["result"])
print("\nSources used:")
for doc in result["source_documents"]:
    print(f"- {doc.page_content[:100]}...")  # First 100 chars

## 🎯 Exercise 2: Experiment with Vectorstores

**Part A:** Try different queries with `similarity_search()` and see what documents get returned. Notice how it finds semantically similar content, not just keyword matches.

**Part B:** Modify the RAG chain to retrieve 5 documents instead of 3, then ask it a complex question that requires synthesizing information from multiple sources.

**Bonus Challenge:** Use `similarity_search_with_score()` to see how confident the vectorstore is about each match (lower scores = better matches).

In [None]:
# Part A: Your queries here



In [None]:
# Part B: Modified RAG chain



In [None]:
# Bonus: Similarity search with scores



## Wrap Up

Today you learned:

- **Tools** let LLMs call Python functions to extend their capabilities
- **Agents** use the ReAct pattern to reason and decide which tools to use
- **Embeddings** turn text into vectors that capture semantic meaning
- **Vectorstores** enable fast similarity search over large datasets
- **RAG** combines retrieval with generation for answering questions about custom data

These are the building blocks for powerful AI applications like chatbots, research assistants, and more! We'll get started on that with a project next time.