# Lesson 3: Chatbot Example

In this lesson, you will familiarize yourself with the chatbot example you will work on during this course. The example includes the tool definitions and execution, as well as the chatbot code. Make sure to interact with the chatbot at the end of this notebook.

## Import Libraries

In [1]:
import arxiv
import json
import os
from typing import List
from dotenv import load_dotenv
import anthropic

## Tool Functions

In [2]:
PAPER_DIR = "papers"

The first tool searches for relevant arXiv papers based on a topic and stores the papers' info in a JSON file (title, authors, summary, paper url and the publication date). The JSON files are organized by topics in the `papers` directory. The tool does not download the papers.  

In [3]:
def search_papers(topic: str, max_results: int = 5) -> List[str]:
    """
    Search for papers on arXiv based on a topic and store their information.
    
    Args:
        topic: The topic to search for
        max_results: Maximum number of results to retrieve (default: 5)
        
    Returns:
        List of paper IDs found in the search
    """
    
    # Use arxiv to find the papers 
    client = arxiv.Client()

    # Search for the most relevant articles matching the queried topic
    search = arxiv.Search(
        query = topic,
        max_results = max_results,
        sort_by = arxiv.SortCriterion.Relevance
    )

    papers = client.results(search)
    
    # Create directory for this topic
    path = os.path.join(PAPER_DIR, topic.lower().replace(" ", "_"))
    os.makedirs(path, exist_ok=True)
    
    file_path = os.path.join(path, "papers_info.json")

    # Try to load existing papers info
    try:
        with open(file_path, "r") as json_file:
            papers_info = json.load(json_file)
    except (FileNotFoundError, json.JSONDecodeError):
        papers_info = {}

    # Process each paper and add to papers_info  
    paper_ids = []
    for paper in papers:
        paper_ids.append(paper.get_short_id())
        paper_info = {
            'title': paper.title,
            'authors': [author.name for author in paper.authors],
            'summary': paper.summary,
            'pdf_url': paper.pdf_url,
            'published': str(paper.published.date())
        }
        papers_info[paper.get_short_id()] = paper_info
    
    # Save updated papers_info to json file
    with open(file_path, "w") as json_file:
        json.dump(papers_info, json_file, indent=2)
    
    print(f"Results are saved in: {file_path}")
    
    return paper_ids

In [4]:
search_papers("computers")

Results are saved in: papers/computers/papers_info.json


['1310.7911v2',
 'math/9711204v1',
 '2208.00733v1',
 '2504.07020v1',
 '2403.03925v1']

In [5]:
search_papers("intelligence")

Results are saved in: papers/intelligence/papers_info.json


['2304.02924v1',
 '1712.06440v1',
 '1912.09571v1',
 '2403.06591v1',
 '2409.14496v1']

In [6]:
search_papers("mammals")

Results are saved in: papers/mammals/papers_info.json


['2304.09238v1', '1705.02727v1', '0808.4014v2', '2210.02183v1', '2502.18214v1']

The second tool looks for information about a specific paper across all topic directories inside the `papers` directory.

In [7]:
def extract_info(paper_id: str) -> str:
    """
    Search for information about a specific paper across all topic directories.
    
    Args:
        paper_id: The ID of the paper to look for
        
    Returns:
        JSON string with paper information if found, error message if not found
    """
 
    for item in os.listdir(PAPER_DIR):
        item_path = os.path.join(PAPER_DIR, item)
        if os.path.isdir(item_path):
            file_path = os.path.join(item_path, "papers_info.json")
            if os.path.isfile(file_path):
                try:
                    with open(file_path, "r") as json_file:
                        papers_info = json.load(json_file)
                        if paper_id in papers_info:
                            return json.dumps(papers_info[paper_id], indent=2)
                except (FileNotFoundError, json.JSONDecodeError) as e:
                    print(f"Error reading {file_path}: {str(e)}")
                    continue
    
    return f"There's no saved information related to paper {paper_id}."

In [8]:
extract_info('1310.7911v2')

'{\n  "title": "Compact manifolds with computable boundaries",\n  "authors": [\n    "Zvonko Iljazovic"\n  ],\n  "summary": "We investigate conditions under which a co-computably enumerable closed set\\nin a computable metric space is computable and prove that in each locally\\ncomputable computable metric space each co-computably enumerable compact\\nmanifold with computable boundary is computable. In fact, we examine the notion\\nof a semi-computable compact set and we prove a more general result: in any\\ncomputable metric space each semi-computable compact manifold with computable\\nboundary is computable. In particular, each semi-computable compact\\n(boundaryless) manifold is computable.",\n  "pdf_url": "http://arxiv.org/pdf/1310.7911v2",\n  "published": "2013-10-29"\n}'

In [9]:
extract_info('2304.02924v1')

'{\n  "title": "The Governance of Physical Artificial Intelligence",\n  "authors": [\n    "Yingbo Li",\n    "Anamaria-Beatrice Spulber",\n    "Yucong Duan"\n  ],\n  "summary": "Physical artificial intelligence can prove to be one of the most important\\nchallenges of the artificial intelligence. The governance of physical\\nartificial intelligence would define its responsible intelligent application in\\nthe society.",\n  "pdf_url": "http://arxiv.org/pdf/2304.02924v1",\n  "published": "2023-04-06"\n}'

In [10]:
extract_info('1705.02727v1')

'{\n  "title": "Automatic Recognition of Mammal Genera on Camera-Trap Images using Multi-Layer Robust Principal Component Analysis and Mixture Neural Networks",\n  "authors": [\n    "Jhony-Heriberto Giraldo-Zuluaga",\n    "Augusto Salazar",\n    "Alexander Gomez",\n    "Ang\\u00e9lica Diaz-Pulido"\n  ],\n  "summary": "The segmentation and classification of animals from camera-trap images is due\\nto the conditions under which the images are taken, a difficult task. This work\\npresents a method for classifying and segmenting mammal genera from camera-trap\\nimages. Our method uses Multi-Layer Robust Principal Component Analysis (RPCA)\\nfor segmenting, Convolutional Neural Networks (CNNs) for extracting features,\\nLeast Absolute Shrinkage and Selection Operator (LASSO) for selecting features,\\nand Artificial Neural Networks (ANNs) or Support Vector Machines (SVM) for\\nclassifying mammal genera present in the Colombian forest. We evaluated our\\nmethod with the camera-trap images fro

## Tool Schema

Here are the schema of each tool which you will provide to the LLM.

In [11]:
tools = [
    {
        "name": "search_papers",
        "description": "Search for papers on arXiv based on a topic and store their information.",
        "input_schema": {
            "type": "object",
            "properties": {
                "topic": {
                    "type": "string",
                    "description": "The topic to search for"
                }, 
                "max_results": {
                    "type": "integer",
                    "description": "Maximum number of results to retrieve",
                    "default": 5
                }
            },
            "required": ["topic"]
        }
    },
    {
        "name": "extract_info",
        "description": "Search for information about a specific paper across all topic directories.",
        "input_schema": {
            "type": "object",
            "properties": {
                "paper_id": {
                    "type": "string",
                    "description": "The ID of the paper to look for"
                }
            },
            "required": ["paper_id"]
        }
    }
]

## Tool Mapping

This code handles tool mapping and execution.

In [12]:
mapping_tool_function = {
    "search_papers": search_papers,
    "extract_info": extract_info
}

def execute_tool(tool_name, tool_args):
    
    result = mapping_tool_function[tool_name](**tool_args)

    if result is None:
        result = "The operation completed but didn't return any results."
        
    elif isinstance(result, list):
        result = ', '.join(result)
        
    elif isinstance(result, dict):
        # Convert dictionaries to formatted JSON strings
        result = json.dumps(result, indent=2)
    
    else:
        # For any other type, convert using str()
        result = str(result)
    return result

## Chatbot Code

The chatbot handles the user's queries one by one, but it does not persist memory across the queries.

In [13]:
load_dotenv() 
client = anthropic.Anthropic()

### Query Processing

In [14]:
def process_query(query):
    
    messages = [{'role': 'user', 'content': query}]
    
    response = client.messages.create(max_tokens = 2024,
                                  model = 'claude-3-7-sonnet-20250219', 
                                  tools = tools,
                                  messages = messages)
    
    process_query = True
    while process_query:
        assistant_content = []

        for content in response.content:
            if content.type == 'text':
                
                print(content.text)
                assistant_content.append(content)
                
                if len(response.content) == 1:
                    process_query = False
            
            elif content.type == 'tool_use':
                
                assistant_content.append(content)
                messages.append({'role': 'assistant', 'content': assistant_content})
                
                tool_id = content.id
                tool_args = content.input
                tool_name = content.name
                print(f"Calling tool {tool_name} with args {tool_args}")
                
                result = execute_tool(tool_name, tool_args)
                messages.append({"role": "user", 
                                  "content": [
                                      {
                                          "type": "tool_result",
                                          "tool_use_id": tool_id,
                                          "content": result
                                      }
                                  ]
                                })
                response = client.messages.create(max_tokens = 2024,
                                  model = 'claude-3-7-sonnet-20250219', 
                                  tools = tools,
                                  messages = messages) 
                
                if len(response.content) == 1 and response.content[0].type == "text":
                    print(response.content[0].text)
                    process_query = False

### Chat Loop

In [15]:
def chat_loop():
    print("Type your queries or 'quit' to exit.")
    while True:
        try:
            query = input("\nQuery: ").strip()
            if query.lower() == 'quit':
                break
    
            process_query(query)
            print("\n")
        except Exception as e:
            print(f"\nError: {str(e)}")

Feel free to interact with the chatbot. Here's an example query: 

- Search for 2 papers on "LLM interpretability"

To access the `papers` folder: 1) click on the `File` option on the top menu of the notebook and 2) click on `Open` and then 3) click on `L3`.

In [None]:
chat_loop()

Type your queries or 'quit' to exit.



Query:  What papers about evolution biology do you know?


I can search for papers on evolutionary biology from the arXiv database for you. Let me do that now.
Calling tool search_papers with args {'topic': 'evolution biology', 'max_results': 5}
Results are saved in: papers/evolution_biology/papers_info.json
I've found 5 papers related to evolutionary biology. Let me get more detailed information about each of them for you:
Calling tool extract_info with args {'paper_id': '0706.3101v1'}
Calling tool extract_info with args {'paper_id': '0804.3945v1'}
Calling tool extract_info with args {'paper_id': 'q-bio/0511007v1'}
Calling tool extract_info with args {'paper_id': 'cond-mat/0001006v1'}
Calling tool extract_info with args {'paper_id': '1003.1888v1'}
Here are five papers on evolutionary biology from the arXiv database:

1. **The Penna Model of Biological Aging** (2007) by D. Stauffer
   - A review of computer simulations of biological aging, focusing on the Penna model from 1995.

2. **Dynamic origin of species** (2008) by Michael G. Sadovsky
  


Query:  What is Penna model mentioned in the first article?


I'll need to search for information about the Penna model. However, your question references "the first article" but I don't have context about which article you're referring to. To help you effectively, I'll need to search for papers related to the Penna model.
Calling tool search_papers with args {'topic': 'Penna model', 'max_results': 5}
Results are saved in: papers/penna_model/papers_info.json
Let me get more information about these papers to understand what the Penna model is and identify which might be "the first article" you're referring to:
Calling tool extract_info with args {'paper_id': 'q-bio/0607015v1'}
Calling tool extract_info with args {'paper_id': '0706.3101v1'}
This second paper appears to be a review of the Penna model. Let me check the remaining papers to get a more complete picture:
Calling tool extract_info with args {'paper_id': '1008.3246v1'}
Calling tool extract_info with args {'paper_id': 'q-bio/0411007v1'}
Calling tool extract_info with args {'paper_id': 'q-bi

<p style="background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px"> 🚨
&nbsp; <b>Different Run Results:</b> The output generated by AI chat models can vary with each execution due to their dynamic, probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> 💻 &nbsp; <b> To Access the <code>requirements.txt</code> file or the <code>papers</code> folder: </b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em> and finally 3) click on <em>"L3"</em>.
</div>

In the next lessons, you will take out the tool definitions to wrap them in an MCP server. Then you will create an MCP client inside the chatbot to make the chatbot MCP compatible.  

## Resources

[Guide on how to implement tool use](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#how-to-implement-tool-use)

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">


<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

</div>