# Hands on building autonomous agents

This is a bare-minimal implementation of an autonomous agent that can search Arxiv for relevant papers and summarize the core ideas of the paper based on the user's research question.

In the first example, we will just create a highly-abstracted version, in which LangChain does all the orchestration under the hoood.

In the second example, we will dive deep into the details and build a similar autonomous agent from scratch without using LangChain.

## Install packages & setting up environment

In [None]:
!pip install langchain
!pip install arxiv
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting langchain
  Downloading langchain-0.0.154-py3-none-any.whl (709 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m709.9/709.9 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp<4.0.0,>=3.8.3
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m26.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json<0.6.0,>=0.5.7
  Downloading dataclasses_json-0.5.7-py3-none-any.whl (25 kB)
Collecting openapi-schema-pydantic<2.0,>=1.2
  Downloading openapi_schema_pydantic-1.2.4-py3-none-any.whl (90 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m90.0/90.0 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0.0,>=4.0.0
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Coll

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "sk-......" # Replace the string content with your OpenAI API key

## A high-level abstracted example using LangChain

This section is a very high level example of how to define an agent using LangChain.
- We first define the tool that the language model can use: arxiv api, including the name and descriptions for the tool, so that the LLM knows it can use the tool when needed
- We then initialize the agent by passing it the tools and the LLM model
- Finall, we ask the agent a question. The agent will use the ReAct framework to complete the task. It first thinks if it has all the information needed, then decides to use the arxiv API to search for relevant papers, when it gets the papers, it will summarize the content and give them back to the user.

In [None]:
# import necessary packages
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.agents import Tool
from langchain.chat_models import ChatOpenAI
from langchain.utilities import ArxivAPIWrapper

In [None]:
llm = ChatOpenAI(temperature=0) # Initialize the LLM to be used

arxiv = ArxivAPIWrapper()
arxiv_tool = Tool(
    name="arxiv_search",
    description="Search on arxiv. The tool can search a keyword on arxiv for the top papers. It will return publishing date, title, authors, and summary of the papers.",
    func=arxiv.run
)

tools = [arxiv_tool]

agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

agent_chain.run("What is ReAct reasoning and acting in language models?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI'm not sure what ReAct reasoning and acting in language models is, so I should use arxiv_search to find out more information.
Action: arxiv_search
Action Input: "ReAct reasoning and acting in language models"[0m
Observation: [36;1m[1;3mPublished: 2023-03-10
Title: ReAct: Synergizing Reasoning and Acting in Language Models
Authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
Summary: While large language models (LLMs) have demonstrated impressive capabilities
across tasks in language understanding and interactive decision making, their
abilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g.
action plan generation) have primarily been studied as separate topics. In this
paper, we explore the use of LLMs to generate both reasoning traces and
task-specific actions in an interleaved manner, allowing for greater synergy
between the two: reasoning traces help the mode

'ReAct reasoning and acting in language models is a technique that integrates reasoning and action generation in language models, allowing for greater synergy between the two. It has been applied to various language and decision-making tasks and has demonstrated effectiveness over state-of-the-art baselines.'

## Building an autonomous agent from scratch

In this example, we will create an autonomous agent from scratch. The goal of this part is for you to understand how the components of autonomous agents work, instead of showing the best implementation for production. For production purposes, I recommend using pre-packages framworks like LangChain to get the job done faster.


In [None]:
!pip install openai
!pip install arxiv

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting arxiv
  Downloading arxiv-1.4.7-py3-none-any.whl (12 kB)
Collecting feedparser (from arxiv)
  Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.1/81.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sgmllib3k (from feedparser->arxiv)
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone
  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6065 sha256=21571cc6f771b2e48a06b77dc56989ddd4372ec00f037ca3daa399f49c548f4f
  Stored in directory: /root/.cache/pip/wheels/f0/69/93/a47e9d621be168e9e33c7ce60524393c0b92ae83cf6c6e8

In [None]:
import openai
import arxiv

# Set up the OpenAI API
openai.api_key = "sk-......" # Replace the string content with your OpenAI API key

"""
Wrap the OpenAI API call in this function
"""
def getResponse(prompt):
    response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            temperature = 0, # We want consistent behavior, so we set a very low temperature
            messages=[
                {"role": "system", "content": "You're a helpful assistant. Carefully follow the user's instructions."},
                {"role": "user", "content": prompt}
            ]
        )
    response = str(response['choices'][0]['message']['content'])
    return response

"""
Use GPT to determine the action to take by giving it the objective, memory, and tools.
If it think it has finished the objective, just give the answer.
If it needs more info, it will pick the tool to get the relevant information based on the tool description.
"""
def determineAction(objective, memory, tools):
    formattedPrompt = f"""Determine if the following memory is enough to answer\n
    the user's objective. Your past actions are stored in the memory for reference\n
    If it is enough, answer the question in the format: 'FINAL ANSWER: <your answer>'. \n
    If the memory is not enough, you can use a tool in the available tools section\n
    to get more information. When using a tool you should use this format: \n
    'USE <tool name>:<parameter>'. If no tool can help you achieve the user's \n
    objective, then answer 'FINAL: CANNOT ANSWER'.

    ```Objective
    Answer: {objective}
    ```

    ```Memory
    {memory}
    ```

    ```Available Tools
    {tools}
    ```

    """
    response = getResponse(formattedPrompt)
    (finished, result, memory) = parseResponse(response, memory,tools)
    return (finished, result, memory)

"""
Parse the response from GPT to determine if the objective is finished.
If it is finished, just give the final answer.
If the objective cannot be finished with the context and tools, it will say it cannot answer
If GPT picks a tool, execute the tool and save the result of the tool in memory.
"""
def parseResponse(response, memory,tools):
    finished = False

    if response.startswith('FINAL ANSWER:'):
        finished = True
        memory.append(response)
        return (finished, response, memory)
    elif response == 'FINAL: CANNOT ANSWER':
        finished = True
        memory.append(response)
        return (finished, response, memory)
    elif response.startswith('USE '):
        # split the string using ':' as the delimiter
        parsed_str = response.split(':')

        # get the tool name and parameter
        tool_name = parsed_str[0].split()[1]
        parameter = parsed_str[1]

        print("THOUGHT: " + response)
        memory.append("THOUGHT: " + response)

        result = executeTool(tool_name, parameter,tools)

        new_memory = "OBSERVATION: " + str(result)
        print(new_memory)
        memory.append(new_memory)

        return (finished, result, memory)

"""
Execute the tool that GPT picks using the parameter it gives.
Returns the execution result so that GPT can have the relevant info.
"""
def executeTool(tool_name, parameter,tools):
    # Find the tool with the given name
    tool = None
    for t in tools:
        if t['tool_name'] == tool_name:
            tool = t
            break
    
    # If the tool is found, execute its function with the given parameter
    if tool:
        return tool['function_name'](parameter)
    else:
        return "Tool not found"


"""
Wrap the search arxiv function as a tool for GPT
Input is a search keyword
Output is a list of dictionaries with title, published date, authors, and summary of papers
"""
def searchArxiv(keyword):
    # Perform a search with the given query
    search = arxiv.Search(query=keyword, max_results=3)
    
    # Get the metadata for each result and extract relevant information
    results = []
    for result in search.results():
        title = result.title
        published_date = result.published.strftime("%Y-%m-%d")
        authors = ", ".join(author.name for author in result.authors)
        summary = result.summary
        
        # Store the extracted information as a dictionary
        results.append((
            "title: " + title,
            "published_date: " + published_date,
            "authors: " + authors,
            "summary: " + summary
        ))
    
    # Return the list of tuples containing the result information
    return results

"""
Initialize memory, tools for the GPT agent.
Ask for a user objective and let it run iteratively untill the objective is achieved.
As a safety measure, it will also stop after 5 iterations just in case things go wrong.
"""
def startAgent():
    objective = input("What is your research question? ")
    # For simplicity, we will just use a list to store every thing. 
    # For production, you will probably use vector databases.
    memory = []

    tools = [{'tool_name': 'searchArxiv', 
            'description': """You can use this tool to search for scientific papers on Arxiv. The response will have title, author, published date, and summary.""", 
            'function_name' : searchArxiv,
            'parameter': 'search key word'}]
    
    n = 0
    while True:
        (finished, result, memory) = determineAction(objective, memory, tools)
        n += 1

        if finished:
            print(result)
            return
        
        if n > 5:
            print("Ended for reaching limit.")
            return


startAgent()


What is your research question? What is ReAct reasoning and acting in language models?
THOUGHT: USE searchArxiv:ReAct reasoning and acting in language models.
OBSERVATION: [('title: ReAct: Synergizing Reasoning and Acting in Language Models', 'published_date: 2022-10-06', 'authors: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao', 'summary: While large language models (LLMs) have demonstrated impressive capabilities\nacross tasks in language understanding and interactive decision making, their\nabilities for reasoning (e.g. chain-of-thought prompting) and acting (e.g.\naction plan generation) have primarily been studied as separate topics. In this\npaper, we explore the use of LLMs to generate both reasoning traces and\ntask-specific actions in an interleaved manner, allowing for greater synergy\nbetween the two: reasoning traces help the model induce, track, and update\naction plans as well as handle exceptions, while actions allow it to interfac