# AniMate Ver. 1.0.0

This notebook is used to prototype the LLM Agent that will act as the brain of the LLM-based Chatbot. Here I:
1. Define the tools available to the agent
2. Experiment with different prompt templates when constructing the agent

## Tool Construction

The API that the agent will use to find up to date anime related information is the [Jikan API](https://jikan.moe/). It parses the website MyAnimeList.net (MAL), one of the largest anime databases available, to satisfy the needs of the API.

The capabilities I need to provide the agent for core functionality are listed below:
- **Retrieve anime ID:** Get the MAL ID of the anime. Scraped from the MAL site
- **Retrieve anime info:** Get the general information about an anime. *Jikan API endpoint: getAnimeById*
- **Retrieve anime episodes:** Get the episode information about an anime. *Jikan API endpoint: getAnimeEpisodes*
- **Retrieve anime characters:** Get the character information i.e. name and voice actor. *Jikan API endpoint: getAnimeCharacters*
- **Retrieve real-time statistics:** Get the live statistics on an anime. *Jikan API endpoint: getAnimeStatistics*
- **Retrieve recommendations:** Get other anime recommendations based on a given anime. *Jikan API endpoint: getAnimeRecommendations*
- **Retrieve recommendations (general):** Get a general list of top anime recommendations. *Jikan API endpoint: getTopAnime*
- **Retrieve sentiment:** Get sentiment around the anime. *Jikan API endpoint: getAnimeReviews*

### Retrieve Anime ID

The agent needs the ability to retrieve the MyAnimeList (MAL) ID of a given anime as this ID must be included in the request for almost every endpoint of the Jikan API. Retrieving the MAL ID of a given anime is trivial as this information can simply be scraped from the internet, so the tool for this will simply enable the LLM to scrape the appropriate page of the site for the id using the name of the anime e.g. *getId(anime_name: string) -> int*.

#### Imports

In [1]:
%pip install fuzzywuzzy
%pip install langchain
%pip install python-Levenshtein
%pip install openai
%pip install tiktoken

%pip install python-dotenv
%load_ext dotenv
%dotenv

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\halor\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.




You should consider upgrading via the 'c:\Users\halor\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.



Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\halor\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\halor\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\halor\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\halor\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


In [2]:
import requests, os, yaml
from typing import Type, Optional
from fuzzywuzzy import fuzz
from pydantic import BaseModel, Field

from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.tools import BaseTool, Tool
from langchain_core.tools import ToolException
from langchain.callbacks.manager import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
)

#### Utils

In [3]:
def searchAnimeName(animeName: str) -> dict:
    url = 'https://api.jikan.moe/v4/anime?q={name}'.format(name=animeName)
    response = requests.get(url)
    response_json = response.json()
    return response_json['data']

def getBestMatch(query: str, searchResults: list[dict]) -> tuple[str, str]:
    # Sort by levenshtein distance (edit distance) between query and title. The closest
    # match will be the final result
    searchResultsSorted = sorted(searchResults, key=lambda sr: fuzz.ratio(query, sr['title']))
    bestMatch = searchResultsSorted[len(searchResults) - 1]
    bestMatchTitle = bestMatch['title']
    bestMatchId = bestMatch['mal_id'] 
    return bestMatchTitle, bestMatchId

# Test
query = "Naruto Shippuden"
searchResults = searchAnimeName(query)
bestMatch = getBestMatch(query, searchResults)
print(bestMatch)

#### Tool

In [4]:
class AnimeIDLookupSchema(BaseModel):
    query: str = Field(description="anime name")


class AnimeIDLookupTool(BaseTool):
    name = "anime_id_lookup"
    description = "Useful for when you need to retrieve the id of an anime"
    args_schema: Type[AnimeIDLookupSchema] = AnimeIDLookupSchema

    def _run(
        self,
        query: str,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool."""
        try:    
            searchResults = searchAnimeName(query)
            if searchResults:
                _, bestMatchId = getBestMatch(query, searchResults)
                return bestMatchId
            raise ToolException("Anime not found")
        except Exception as e:
            raise ToolException(e)

    async def _arun(
        self,
        query: str,
        run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("anime_id_lookup does not support async")

#### Test

Experimentation is carried out using OpenAI function calling, however useful resources for creating agents and learning how to use Llama 2 as an agent locally are listed below: 
- https://www.pinecone.io/learn/llama-2/
- https://philschmid.github.io/easyllm/examples/llama2-agent-example/#basic-example-of-using-a-tool-with-llama-2-70b
- https://medium.com/@gil.fernandes/langchain-chat-with-custom-tools-functions-and-memory-e34daa331aa7

In [5]:
key: str = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0, openai_api_key=key)

id_lookup_tool = AnimeIDLookupTool()
id_lookup_tool.handle_tool_error = True

tools = [id_lookup_tool]
agent_executor = initialize_agent(
    tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True
)
result = agent_executor({"input": "Tell me the anime ID of Cowboy Bepop"}) # LLM should return 1
result = agent_executor({"input": "Tell me the anime ID of ..."}) # LLM should throw error

### Remaining functionality

The other aspects of the agent's functionality are non-trivial in that they all require allowing the agent to make API calls to the Jikan API. Langchain supports three main ways of doing this:
1. **OpenAPI** - If the API follows the OpenAPI spec, we can use the OpenAPI toolkit to parse the APIs OpenAPI specification and provide the LLM with the available tools directly. Refer to: https://python.langchain.com/docs/use_cases/apis and https://python.langchain.com/docs/integrations/toolkits/openapi
2. **Non-OpenAPI** - If the API doesn't follow the OpenAPI spec, we can define our own documentation for the API to construct the chain. As far as I know there is no standardised way of doing this, so it would require examining the examples in the Langchain docs and going through trial and error. Refer to: https://python.langchain.com/docs/use_cases/apis
3. **Custom Tools** - We define a custom tool for each endpoint required, and under the hood facilitate fetching the APIs response

Thankfully the JikanAPI provides an OpenAPI spec, so approaches 1 & 3 are viable. I will carry out experimentation below and decide which will be optimal for the final chatbot.

### OpenAPI

In [9]:
from langchain.agents.agent_toolkits.openapi.spec import reduce_openapi_spec
from langchain.agents.agent_toolkits.openapi import planner
from langchain.requests import RequestsWrapper

key: str = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0, openai_api_key=key)
requests_wrapper = RequestsWrapper()

# from urllib.request import urlretrieve
# urlretrieve("https://raw.githubusercontent.com/jikan-me/jikan-rest/master/storage/api-docs/api-docs.json","jikan_openapi.yaml")

with open("jikan_openapi.yaml", encoding="utf-8") as f:
    raw_jikan_api_spec = yaml.load(f, Loader=yaml.Loader)
jikan_api_spec = reduce_openapi_spec(raw_jikan_api_spec)

jikan_agent = planner.create_openapi_agent(jikan_api_spec, requests_wrapper, llm)

# Using simple OpenAPI agent
# NOTE: This agent does not have access to the id lookup tool, so the id must be included in the prompt

user_query = (
    "Tell me the plot of the anime Cowboy Bepop. It has an id of 1"
)
jikan_agent.run(user_query)

# OpenAPI agent + ID lookup tool
# Here the OpenAPI agent is used as a tool alongside the id lookup tool. Notice that the prompt no
# longer needs to include the anime ID. Instead, the main LLM is used as an orchestrator, and is instructed
# to pass the id retrieved using the id lookup tool to the OpenAPI agent.

jikan_tool = Tool.from_function(
        func=jikan_agent.run,
        name="anime_api_agent",
        description="useful for when you need to answer anime related queries. The input to this tool should be in the form '[query] anime id: [id]'",
        handle_tool_error=True
    )

id_lookup_tool = AnimeIDLookupTool()
id_lookup_tool.handle_tool_error = True

tools = [id_lookup_tool, jikan_tool]
agent_executor = initialize_agent(
    tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True
)

result = agent_executor({"input": "Tell me the plot of the anime Cowboy Bepop"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `anime_id_lookup` with `{'query': 'Cowboy Bebop'}`


[0m[36;1m[1;3m1[0m[32;1m[1;3m
Invoking: `anime_api_agent` with `plot anime id: 1`


[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: api_planner
Action Input: "plot anime id: 1"[0m
Observation: [36;1m[1;3m1. GET /anime/{id} to retrieve information about the anime with id 1.[0m
Thought:[32;1m[1;3mI'm ready to execute the API calls.
Action: api_controller
Action Input: GET /anime/1[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to make a GET request to the /anime/1 endpoint to retrieve information about the anime with id 1. I will use the requests_get tool to execute this API call.

Action: requests_get
Action Input:
{
  "url": "https://api.jikan.moe/v4/anime/1",
  "params": {},
  "output_instructions": {}
}
[0m
Observation: [36;1m[1;3mTitle: Cowboy Bebop
Type: TV
Status: Finished Airing
Episodes: 26
Ai

### Custom Tools

#### Utils

In [25]:
def parseNames(objList: list[dict]) -> str:
    namesList = list(map(lambda x: x['name'], objList))
    names = ", ".join(namesList)
    return names

def getAnimeInfo(animeId: int) -> str:
    url = 'https://api.jikan.moe/v4/anime/{id}'.format(id=animeId)
    response = requests.get(url)
    response_json = response.json()
    data = response_json['data']
    
    data_parsed = """
    TITLE: {title}
    URL: {url}
    TYPE: {type}
    STATUS: {status}
    NUM_EPISODES: {num_episodes}
    AIRING: {airing}
    AIRED: {from_} to {to}
    DURATION: {duration} min per ep
    RATING: {rating}
    SCORE: {score}
    SCORED BY: {scored_by}
    RANK: {rank}
    SYNOPSIS: {synopsis}
    BACKGROUND: {background}
    LICENSORS: {licensors}
    ANIMATION STUDIOS [NOT A PRODUCER]: {studios}
    PRODUCERS: {producers}
    GENRES: {genres}
    """.format(title=data['title'], 
               url=data['url'], 
               type=data['type'], 
               status=data['status'], 
               num_episodes=data['episodes'],
               airing=data['airing'],
               from_=data['aired']['from'],
               to=data['aired']['to'],
               duration=data['duration'],
               rating=data['rating'],
               score=data['score'],
               scored_by=data['scored_by'],
               rank=data['rank'],
               synopsis=data['synopsis'],
               background=data['background'],
               licensors=parseNames(data['licensors']),
               studios=parseNames(data['studios']),
               producers=parseNames(data['producers']),
               genres=parseNames(data['genres'])
               )
    
    return data_parsed

print(getAnimeInfo(1))


    TITLE: Cowboy Bebop
    URL: https://myanimelist.net/anime/1/Cowboy_Bebop
    TYPE: TV
    STATUS: Finished Airing
    NUM_EPISODES: 26
    AIRING: False
    AIRED: 1998-04-03T00:00:00+00:00 to 1999-04-24T00:00:00+00:00
    DURATION: 24 min per ep min per ep
    RATING: R - 17+ (violence & profanity)
    SCORE: 8.75
    SCORED BY: 944288
    RANK: 45
    SYNOPSIS: Crime is timeless. By the year 2071, humanity has expanded across the galaxy, filling the surface of other planets with settlements like those on Earth. These new societies are plagued by murder, drug use, and theft, and intergalactic outlaws are hunted by a growing number of tough bounty hunters.

Spike Spiegel and Jet Black pursue criminals throughout space to make a humble living. Beneath his goofy and aloof demeanor, Spike is haunted by the weight of his violent past. Meanwhile, Jet manages his own troubled memories while taking care of Spike and the Bebop, their ship. The duo is joined by the beautiful con artist Fa

#### Tools

In [23]:
class JikanInputSchema(BaseModel):
    id: int = Field(description="anime id")


class AnimeInfoLookupTool(BaseTool):
    name = "anime_info_lookup"
    description = "Useful for when you need to get general information on an anime"
    args_schema: Type[JikanInputSchema] = JikanInputSchema

    def _run(
        self,
        id: int,
        run_manager: Optional[CallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool."""
        try:    
            animeInfo = getAnimeInfo(id)
            return animeInfo
        except Exception as e:
            raise ToolException(e)

    async def _arun(
        self,
        id: str,
        run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
    ) -> str:
        """Use the tool asynchronously."""
        raise NotImplementedError("anime_id_lookup does not support async")

#### Test

In [21]:
key: str = os.getenv("OPENAI_API_KEY")
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0, openai_api_key=key)

id_lookup_tool = AnimeIDLookupTool()
id_lookup_tool.handle_tool_error = True

info_lookup_tool = AnimeInfoLookupTool()
info_lookup_tool.handle_tool_error = True

tools = [id_lookup_tool, info_lookup_tool]
agent_executor = initialize_agent(
    tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True
)

result = agent_executor({"input": "Do you think I should watch Cowboy Bepop?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `anime_id_lookup` with `{'query': 'Cowboy Bebop'}`


[0m[36;1m[1;3m1[0m[32;1m[1;3m
Invoking: `anime_info_lookup` with `{'id': 1}`


[0m[33;1m[1;3m
    TITLE: Cowboy Bebop
    URL: https://myanimelist.net/anime/1/Cowboy_Bebop
    TYPE: TV
    STATUS: Finished Airing
    NUM_EPISODES: 26
    AIRING: False
    AIRED: 1998-04-03T00:00:00+00:00 to 1999-04-24T00:00:00+00:00
    DURATION: 24 min per ep min per ep
    RATING: R - 17+ (violence & profanity)
    SCORE: 8.75
    SCORED BY: 944288
    RANK: 45
    SYNOPSIS: Crime is timeless. By the year 2071, humanity has expanded across the galaxy, filling the surface of other planets with settlements like those on Earth. These new societies are plagued by murder, drug use, and theft, and intergalactic outlaws are hunted by a growing number of tough bounty hunters.

Spike Spiegel and Jet Black pursue criminals throughout space to make a humble living. Beneath his go

## Agent Construction