# Lesson 3: Chatbot Example

In this lesson, you will familiarize yourself with the chatbot example you will work on during this course. The example includes the tool definitions and execution, as well as the chatbot code. Make sure to interact with the chatbot at the end of this notebook.

In [30]:
!pip install arxiv
!pip install anthropic
!pip install python-dotenv



## Import Libraries

In [31]:
import arxiv
import json
import os
from typing import List
from dotenv import load_dotenv
import anthropic
import openai

## Tool Functions

In [32]:
PAPER_DIR = "papers"

The first tool searches for relevant arXiv papers based on a topic and stores the papers' info in a JSON file (title, authors, summary, paper url and the publication date). The JSON files are organized by topics in the `papers` directory. The tool does not download the papers.  

In [33]:
def search_papers(topic: str, max_results: int = 5) -> List[str]:
    """
    Search for papers on arXiv based on a topic and store their information.

    Args:
        topic: The topic to search for
        max_results: Maximum number of results to retrieve (default: 5)

    Returns:
        List of paper IDs found in the search
    """

    # Use arxiv to find the papers
    client = arxiv.Client()

    # Search for the most relevant articles matching the queried topic
    search = arxiv.Search(
        query = topic,
        max_results = max_results,
        sort_by = arxiv.SortCriterion.Relevance
    )

    papers = client.results(search)

    # Create directory for this topic
    path = os.path.join(PAPER_DIR, topic.lower().replace(" ", "_"))
    os.makedirs(path, exist_ok=True)

    file_path = os.path.join(path, "papers_info.json")

    # Try to load existing papers info
    try:
        with open(file_path, "r") as json_file:
            papers_info = json.load(json_file)
    except (FileNotFoundError, json.JSONDecodeError):
        papers_info = {}

    # Process each paper and add to papers_info
    paper_ids = []
    for paper in papers:
        paper_ids.append(paper.get_short_id())
        paper_info = {
            'title': paper.title,
            'authors': [author.name for author in paper.authors],
            'summary': paper.summary,
            'pdf_url': paper.pdf_url,
            'published': str(paper.published.date())
        }
        papers_info[paper.get_short_id()] = paper_info

    # Save updated papers_info to json file
    with open(file_path, "w") as json_file:
        json.dump(papers_info, json_file, indent=2)

    print(f"Results are saved in: {file_path}")

    return paper_ids

In [34]:
search_papers("computers")

Results are saved in: papers/computers/papers_info.json


['1310.7911v2',
 'math/9711204v1',
 '2208.00733v1',
 '2504.07020v1',
 '2403.03925v1']

The second tool looks for information about a specific paper across all topic directories inside the `papers` directory.

In [35]:
def extract_info(paper_id: str) -> str:
    """
    Search for information about a specific paper across all topic directories.

    Args:
        paper_id: The ID of the paper to look for

    Returns:
        JSON string with paper information if found, error message if not found
    """

    for item in os.listdir(PAPER_DIR):
        item_path = os.path.join(PAPER_DIR, item)
        if os.path.isdir(item_path):
            file_path = os.path.join(item_path, "papers_info.json")
            if os.path.isfile(file_path):
                try:
                    with open(file_path, "r") as json_file:
                        papers_info = json.load(json_file)
                        if paper_id in papers_info:
                            return json.dumps(papers_info[paper_id], indent=2)
                except (FileNotFoundError, json.JSONDecodeError) as e:
                    print(f"Error reading {file_path}: {str(e)}")
                    continue

    return f"There's no saved information related to paper {paper_id}."

In [36]:
extract_info('1310.7911v2')

'{\n  "title": "Compact manifolds with computable boundaries",\n  "authors": [\n    "Zvonko Iljazovic"\n  ],\n  "summary": "We investigate conditions under which a co-computably enumerable closed set\\nin a computable metric space is computable and prove that in each locally\\ncomputable computable metric space each co-computably enumerable compact\\nmanifold with computable boundary is computable. In fact, we examine the notion\\nof a semi-computable compact set and we prove a more general result: in any\\ncomputable metric space each semi-computable compact manifold with computable\\nboundary is computable. In particular, each semi-computable compact\\n(boundaryless) manifold is computable.",\n  "pdf_url": "http://arxiv.org/pdf/1310.7911v2",\n  "published": "2013-10-29"\n}'

## Tool Schema

Here are the schema of each tool which you will provide to the LLM.

In [37]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_papers",
            "description": "REQUIRED: Use this function to search for academic papers on arXiv. Call this whenever the user asks about papers, research, or wants to find information on any topic.",
            "parameters": {
                "type": "object",
                "properties": {
                    "topic": {
                        "type": "string",
                        "description": "The research topic to search for (e.g., 'machine learning', 'quantum computing')"
                    },
                    "max_results": {
                        "type": "integer",
                        "description": "Number of papers to retrieve (default: 5, max: 20)",
                        "minimum": 1,
                        "maximum": 20,
                        "default": 5
                    }
                },
                "required": ["topic"],
                "additionalProperties": False
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "extract_info",
            "description": "REQUIRED: Use this function to get detailed information about a specific paper using its ID. Call this when user asks about a specific paper.",
            "parameters": {
                "type": "object",
                "properties": {
                    "paper_id": {
                        "type": "string",
                        "description": "The arXiv paper ID (e.g., '2301.12345', '1234.5678v1')"
                    }
                },
                "required": ["paper_id"],
                "additionalProperties": False
            }
        }
    }
]

## Tool Mapping

This code handles tool mapping and execution.

In [38]:
mapping_tool_function = {
    "search_papers": search_papers,
    "extract_info": extract_info
}

def execute_tool(tool_name, tool_args):
    result = mapping_tool_function[tool_name](**tool_args)
    if result is None:
        result = "The operation completed but didn't return any results."
    elif isinstance(result, list):
        result = ', '.join(result)
    elif isinstance(result, dict):
        result = json.dumps(result, indent=2)
    else:
        result = str(result)
    return result

# === SYSTEM PROMPT PARA FORZAR USO DE HERRAMIENTAS ===
SYSTEM_PROMPT = """You are an arXiv research assistant. You MUST use the provided tools for ALL paper-related queries.

MANDATORY RULES:
1. If user asks about papers, research, or any academic topic -> ALWAYS call search_papers()
2. If user mentions a paper ID -> ALWAYS call extract_info()
3. If user asks "search for papers on X" -> ALWAYS call search_papers() with topic "X"
4. If user asks about a specific paper -> ALWAYS call extract_info()

NEVER respond with general knowledge about papers or research topics. ALWAYS use the tools first.

Examples that REQUIRE tool use:
- "Find papers on machine learning" -> search_papers(topic="machine learning")
- "Search for 3 papers about quantum computing" -> search_papers(topic="quantum computing", max_results=3)
- "Tell me about paper 2301.12345" -> extract_info(paper_id="2301.12345")
- "What papers are there on LLM interpretability?" -> search_papers(topic="LLM interpretability")

You are an assistant that searches arXiv papers. Use your tools for EVERY paper-related query."""

## Chatbot Code

The chatbot handles the user's queries one by one, but it does not persist memory across the queries.

In [39]:
client = openai.OpenAI(api_key="your-api-key")

### Query Processing

In [40]:
def process_query(query):
    # Agregar system prompt explícito
    messages = [
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user', 'content': query}
    ]

    # Usar modelo que soporte function calling bien
    response = client.chat.completions.create(
        model='gpt-4o',  # Mejor para function calling
        messages=messages,
        tools=tools,
        tool_choice='auto',  # Puede usar 'required' para forzar más
        max_tokens=2024,
        temperature=0.1  # Más determinístico
    )

    process_query = True
    iteration = 0
    max_iterations = 10  # Prevenir loops infinitos

    while process_query and iteration < max_iterations:
        iteration += 1
        message = response.choices[0].message

        print(f"\n--- Iteration {iteration} ---")

        # Si hay contenido de texto
        if message.content:
            print(f"Assistant: {message.content}")

        # Si hay llamadas a herramientas
        if message.tool_calls:
            print(f"🔧 Tool calls detected: {len(message.tool_calls)}")

            # Agregar el mensaje del asistente
            messages.append({
                "role": "assistant",
                "content": message.content,
                "tool_calls": message.tool_calls
            })

            # Procesar cada herramienta
            for tool_call in message.tool_calls:
                tool_name = tool_call.function.name
                try:
                    tool_args = json.loads(tool_call.function.arguments)
                except json.JSONDecodeError as e:
                    print(f"Error parsing tool arguments: {e}")
                    continue

                tool_id = tool_call.id

                print(f"Calling {tool_name} with args: {tool_args}")

                try:
                    result = execute_tool(tool_name, tool_args)
                    print(f"Tool result: {result[:200]}..." if len(result) > 200 else f"Tool result: {result}")
                except Exception as e:
                    result = f"Error executing tool: {str(e)}"
                    print(f"Tool error: {result}")

                # Agregar resultado
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_id,
                    "content": result
                })

            # Nueva llamada con resultados
            response = client.chat.completions.create(
                model='gpt-4o',
                messages=messages,
                tools=tools,
                tool_choice='auto',
                max_tokens=2024,
                temperature=0.1
            )

        else:
            # No hay más herramientas, terminar
            process_query = False
            if not message.content:
                print("No content received from assistant")

### Chat Loop

In [41]:
def chat_loop():
    print("OpenAI ArXiv Assistant (Fixed)")
    print("=" * 50)
    print("Examples:")
    print("• 'Search for 3 papers on machine learning'")
    print("• 'Find papers about quantum computing'")
    print("• 'Tell me about paper 2301.12345'")
    print("• Type 'quit' to exit")
    print("=" * 50)

    while True:
        try:
            query = input("\nQuery: ").strip()
            if query.lower() in ['quit', 'exit', 'q']:
                print("Goodbye!")
                break

            if not query:
                print("Please enter a query.")
                continue

            # Verificar si parece una consulta de papers
            paper_keywords = ['paper', 'search', 'find', 'arxiv', 'research', 'study']
            if not any(keyword in query.lower() for keyword in paper_keywords):
                print("Tip: I'm designed to search papers. Try: 'Search for papers on [topic]'")

            print(f"\nProcessing: '{query}'")
            process_query(query)
            print("\n" + "─" * 50)

        except KeyboardInterrupt:
            print("\nGoodbye!")
            break
        except Exception as e:
            print(f"\nError: {str(e)}")

Feel free to interact with the chatbot. Here's an example query:

- Search for 2 papers on "LLM interpretability"

To access the `papers` folder: 1) click on the `File` option on the top menu of the notebook and 2) click on `Open` and then 3) click on `L3`.

In [42]:
# Para debugging - ver si las tools se están registrando
def test_tools():
    print("🔧 Testing tool registration...")
    print(f"Number of tools: {len(tools)}")
    for tool in tools:
        print(f"- {tool['function']['name']}: {tool['function']['description'][:100]}...")

if __name__ == "__main__":
    test_tools()
    chat_loop()

🔧 Testing tool registration...
Number of tools: 2
- search_papers: REQUIRED: Use this function to search for academic papers on arXiv. Call this whenever the user asks...
- extract_info: REQUIRED: Use this function to get detailed information about a specific paper using its ID. Call th...
🤖 OpenAI ArXiv Assistant (Fixed)
Examples:
• 'Search for 3 papers on machine learning'
• 'Find papers about quantum computing'
• 'Tell me about paper 2301.12345'
• Type 'quit' to exit

📝 Query: Search for 3 papers on machine learning

🔍 Processing: 'Search for 3 papers on machine learning'

--- Iteration 1 ---
🔧 Tool calls detected: 1
📞 Calling search_papers with args: {'topic': 'machine learning', 'max_results': 3}
Results are saved in: papers/machine_learning/papers_info.json
📋 Tool result: 1909.03550v1, 1811.04422v1, 1707.04849v1

--- Iteration 2 ---
🔧 Tool calls detected: 3
📞 Calling extract_info with args: {'paper_id': '1909.03550v1'}
📋 Tool result: {
  "title": "Lecture Notes: Optimization for 

<p style="background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px"> 🚨
&nbsp; <b>Different Run Results:</b> The output generated by AI chat models can vary with each execution due to their dynamic, probabilistic nature. Don't be surprised if your results differ from those shown in the video.</p>

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">
<p> 💻 &nbsp; <b> To Access the <code>requirements.txt</code> file or the <code>papers</code> folder: </b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Open"</em> and finally 3) click on <em>"L3"</em>.
</div>

In the next lessons, you will take out the tool definitions to wrap them in an MCP server. Then you will create an MCP client inside the chatbot to make the chatbot MCP compatible.  

## Resources

[Guide on how to implement tool use](https://docs.anthropic.com/en/docs/build-with-claude/tool-use/overview#how-to-implement-tool-use)

<div style="background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px">


<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>"File"</em> option on the top menu of the notebook and then 2) click on <em>"Download as"</em> and select <em>"Notebook (.ipynb)"</em>.</p>

</div>