# Wikipedia Agent

This notebook creates a Wikipedia agent that can:
1. Search Wikipedia for relevant pages using the Wikipedia API
2. Retrieve and analyze page content
3. Answer questions by searching first, then reading relevant pages

Based on week 2 Pydantic AI examples.


## Setup and Imports


In [1]:
import requests
from pydantic_ai import Agent
from typing import List, Dict, Any
import urllib.parse


## Define Tools


In [2]:
def search_wikipedia(query: str) -> List[Dict[str, Any]]:
    """
    Search Wikipedia for pages matching the query.
    
    Args:
        query: Search term (spaces will be converted to "+" for URL)
        
    Returns:
        List of search results, each containing:
        - title: Page title
        - snippet: Text snippet from the page
        - pageid: Wikipedia page ID
    """
    try:
        # Convert spaces to "+" for URL encoding
        search_term = query.replace(" ", "+")
        
        # Wikipedia API search endpoint
        url = f"https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srsearch={search_term}"
        
        # Add User-Agent header (best practice for Wikipedia API)
        headers = {
            'User-Agent': 'WikipediaAgent/1.0 (Educational Agent)'
        }
        
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        
        data = response.json()
        search_results = data.get('query', {}).get('search', [])
        
        # Format results
        results = []
        for item in search_results:
            results.append({
                "title": item.get('title', ''),
                "snippet": item.get('snippet', ''),
                "pageid": item.get('pageid', ''),
                "size": item.get('size', 0),
                "wordcount": item.get('wordcount', 0)
            })
        
        return results
    except Exception as e:
        return [{"error": f"Error searching Wikipedia: {str(e)}"}]


In [3]:
def get_wikipedia_page(title: str) -> str:
    """
    Get the raw content of a Wikipedia page.
    
    Args:
        title: Wikipedia page title (exact match - spaces are converted to underscores for URL)
        
    Returns:
        Raw page content in wikitext format, or error message if page not found
    """
    try:
        # Wikipedia expects underscores for spaces in URLs
        # Convert spaces to underscores (standard Wikipedia URL format)
        encoded_title = title.replace(" ", "_")
        
        # Wikipedia raw content endpoint
        # Format: https://en.wikipedia.org/w/index.php?title=PAGE_TITLE&action=raw
        url = f"https://en.wikipedia.org/w/index.php?title={encoded_title}&action=raw"
        
        # Add User-Agent header (best practice for Wikipedia API)
        headers = {
            'User-Agent': 'WikipediaAgent/1.0 (Educational Agent)'
        }
        
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        
        return response.text
    except requests.exceptions.HTTPError as e:
        if e.response.status_code == 404:
            return f"Error: Wikipedia page '{title}' not found. Please verify the page title from search results."
        return f"Error fetching page '{title}': HTTP {e.response.status_code}"
    except Exception as e:
        return f"Error fetching page '{title}': {str(e)}"


## Test Tools Individually


In [4]:
# Test the search tool
test_search = search_wikipedia("capybara")
print("Search Results:")
print("=" * 60)
for i, result in enumerate(test_search[:3], 1):
    print(f"\n{i}. {result.get('title', 'N/A')}")
    print(f"   Snippet: {result.get('snippet', 'N/A')[:100]}...")
    print(f"   Page ID: {result.get('pageid', 'N/A')}")


Search Results:

1. Capybara
   Snippet: The <span class="searchmatch">capybara</span> or greater <span class="searchmatch">capybara</span> (...
   Page ID: 6776

2. Capybara (disambiguation)
   Snippet: Look up <span class="searchmatch">capybara</span> or <span class="searchmatch">Capybara</span> in Wi...
   Page ID: 69085306

3. Lesser capybara
   Snippet: The lesser <span class="searchmatch">capybara</span> (Hydrochoerus isthmius) is a large semi-aquatic...
   Page ID: 23188846


In [5]:
# Test the get_page tool with a known page
if test_search:
    test_title = test_search[0]['title']
    print(f"\nFetching page: {test_title}")
    print("=" * 60)
    page_content = get_wikipedia_page(test_title)
    print(f"Content length: {len(page_content)} characters")
    print(f"\nFirst 500 characters:")
    print("-" * 60)
    print(page_content[:500])
    print("...")



Fetching page: Capybara
Content length: 36016 characters

First 500 characters:
------------------------------------------------------------
{{Short description|Largest species of rodents}}
{{Other uses}}
{{Good article}}
{{pp|small=yes}}
{{Use dmy dates|date=July 2022}}
{{Speciesbox
| status            = LC
| status_system     = IUCN3.1
| status_ref        = <ref name="iucn status 19 November 2021">{{cite iucn |author=Reid, F. |date=2016 |title=''Hydrochoerus hydrochaeris'' |volume=2016 |article-number=e.T10300A22190005 |doi=10.2305/IUCN.UK.2016-2.RLTS.T10300A22190005.en |access-date=19 November 2021}}</ref>
| image             = Hy
...


## Create the Agent


In [6]:
wikipedia_agent_instructions = """
You are a helpful Wikipedia research assistant.

Your workflow when answering questions:
1. FIRST, use search_wikipedia(query) to find relevant Wikipedia pages
2. Review the search results to identify the most relevant pages
3. Use get_wikipedia_page(title) to retrieve the content of the most relevant pages
4. Analyze the page content to answer the user's question
5. Cite the Wikipedia pages you used in your answer

IMPORTANT GUIDELINES:
- Always start with a search - never try to answer without searching first
- Use specific search terms that match what the user is asking about
- Read multiple relevant pages if needed to provide a comprehensive answer
- When you get page content, look for specific information that answers the question
- Cite your sources by mentioning the Wikipedia page titles
- If search returns no results, try alternative search terms
- If a page is not found, try variations of the title or search again
"""

wikipedia_agent = Agent(
    name='wikipedia_agent',
    instructions=wikipedia_agent_instructions,
    tools=[search_wikipedia, get_wikipedia_page],
    model='openai:gpt-4o-mini'
)


## Test the Agent


In [7]:
# Test the agent with a question
question = "What is a capybara and where do they live?"

print(f"Question: {question}\n")
print("=" * 80)
print("AGENT PROCESSING...")
print("=" * 80)

result = await wikipedia_agent.run(user_prompt=question)

print("\n" + "=" * 80)
print("AGENT RESPONSE:")
print("=" * 80)
print(result.output)
print("=" * 80)


Question: What is a capybara and where do they live?

AGENT PROCESSING...

AGENT RESPONSE:
The **capybara** (*Hydrochoerus hydrochaeris*) is the largest living rodent in the world, native to South America. It is part of the genus *Hydrochoerus* and is closely related to guinea pigs and rock cavies. The name "capybara" comes from the Tupi language, meaning "one who eats slender leaves."

Capybaras primarily inhabit **savannas** and **dense forests**, often located near bodies of water such as lakes, rivers, swamps, and marshes. They thrive in environments that provide ample aquatic vegetation for their diet, which consists mainly of grasses and aquatic plants. These animals are social creatures and typically form groups of 10 to 20, although larger groups of up to 100 can occur, especially around reliable water sources during dry seasons. In terms of geographic distribution, capybaras can be found throughout almost all of South America, with the notable exception of Chile. They are also

## Try Your Own Questions

Test the agent with different questions:


In [None]:
p# Try your own question here
your_question = "What is the history of artificial intelligence?"

print(f"Question: {your_question}\n")
print("=" * 80)

result = await wikipedia_agent.run(user_prompt=your_question)

print("\n" + "=" * 80)
print("AGENT RESPONSE:")
print("=" * 80)
print(result.output)
print("=" * 80)


Question: What is the history of artificial intelligence?


AGENT RESPONSE:
The history of artificial intelligence (AI) is a rich and complex narrative that spans several centuries, beginning with early myths about intelligent automata and progressing through major technological and conceptual breakthroughs.

### Antiquity to the Early 20th Century
The concept of artificial beings with intelligence can be traced back to ancient myths, such as the Greek automaton Talos. Interest in logic and reasoning by figures like Aristotle laid the groundwork for formal reasoning. The invention of the programmable digital computer in the 1940s marked a significant technological turning point, enabling discussions about building machines capable of thought.

### Birth of AI (1950s)
The field of AI research was officially established at the Dartmouth workshop in 1956, where pioneers like John McCarthy and Marvin Minsky aimed to explore whether machines could exhibit intelligent behavior. The optimism 