# Implementing an LLM-powered recommendation system

## Data Preprocessing

In [1]:
import pandas as pd

anime = pd. read_csv('data/anime_with_synopsis.csv')
anime.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ..."
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0..."
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...


In [2]:
anime = anime.dropna()

In [4]:

anime['combined_info'] = anime.apply(lambda row: f"Title: {row['Name']}. Overview: {row['sypnopsis']} Genres: {row['Genres']}", axis=1)
anime['combined_info'][0]



"Title: Monster. Overview: Dr. Kenzou Tenma, an elite neurosurgeon recently engaged to his hospital director's daughter, is well on his way to ascending the hospital hierarchy. That is until one night, a seemingly small event changes Dr. Tenma's life forever. While preparing to perform surgery on someone, he gets a call from the hospital director telling him to switch patients and instead perform life-saving brain surgery on a famous performer. His fellow doctors, fiancée, and the hospital director applaud his accomplishment; but because of the switch, a poor immigrant worker is dead, causing Dr. Tenma to have a crisis of conscience. So when a similar situation arises, Dr. Tenma stands his ground and chooses to perform surgery on the young boy Johan Liebert instead of the town's mayor. Unfortunately, this choice leads to serious ramifications for Dr. Tenma—losing his social standing being one of them. However, with the mysterious death of the director and two other doctors, Dr. Tenma's

## Embeddings

In [5]:
# imports
import pandas as pd
import tiktoken
import os
import openai



# embedding model parameters
embedding_model = "text-embedding-ada-002"
embedding_encoding = "cl100k_base"  # this the encoding for text-embedding-ada-002
max_tokens = 8000  # the maximum for text-embedding-ada-002 is 8191

encoding = tiktoken.get_encoding(embedding_encoding)

# omit reviews that are too long to embed
anime["n_tokens"] = anime.combined_info.apply(lambda x: len(encoding.encode(x)))
anime = anime[anime.n_tokens <= max_tokens]
len(anime)

16206

In [6]:
anime.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis,combined_info,n_tokens
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever...",Title: Cowboy Bebop. Overview: In the year 207...,245
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ...",Title: Cowboy Bebop: Tengoku no Tobira. Overvi...,199
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0...",Title: Trigun. Overview: Vash the Stampede is ...,252
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...,Title: Witch Hunter Robin. Overview: ches are ...,125
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...,Title: Bouken Ou Beet. Overview: It is the dar...,188


In [9]:
import openai
from openai.embeddings_utils import get_embedding

openai.api_type = "azure"
openai.api_key = "xxx"
openai.api_base = "xxx"
openai.api_version = "2023-11-18"

In [None]:
# set the environment variables needed for openai package to know to reach out to azure
import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "xxx"
os.environ["OPENAI_API_KEY"] = "your AzureOpenAI key"
os.environ["OPENAI_API_VERSION"] = "2023-11-18"

In [8]:
anime["embedding"] = anime.combined_info.apply(lambda x: get_embedding(x, engine=embedding_model))
anime.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis,combined_info,n_tokens,embedding
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever...",Title: Cowboy Bebop. Overview: In the year 207...,245,"[0.00921056978404522, -0.012633174657821655, 0..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ...",Title: Cowboy Bebop: Tengoku no Tobira. Overvi...,199,"[-0.008109764195978642, -0.028518257662653923,..."
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0...",Title: Trigun. Overview: Vash the Stampede is ...,252,"[0.0019446373917162418, -0.001545737381093204,..."
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...,Title: Witch Hunter Robin. Overview: ches are ...,125,"[-0.014938411302864552, 0.007340028416365385, ..."
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...,Title: Bouken Ou Beet. Overview: It is the dar...,188,"[0.010889030061662197, 0.0069219209253787994, ..."


In [9]:
anime.rename(columns = {'embedding': 'vector'}, inplace = True)
anime.rename(columns = {'combined_info': 'text'}, inplace = True)
anime.to_pickle('data/anime.pkl')

## Start working with LLMs

In [2]:
from langchain.vectorstores import LanceDB

In [3]:
import pandas as pd

anime = pd.read_pickle('data/anime.pkl')

anime.head(2)

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis,text,n_tokens,vector
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever...",Title: Cowboy Bebop. Overview: In the year 207...,245,"[0.00921056978404522, -0.012633174657821655, 0..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ...",Title: Cowboy Bebop: Tengoku no Tobira. Overvi...,199,"[-0.008109764195978642, -0.028518257662653923,..."


In [4]:
anime['text'][0]

'Title: Cowboy Bebop. Overview: In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now uninhabitable surface of planet Earth behind. The Inter Solar System Police attempts to keep peace in the galaxy, aided in part by outlaw bounty hunters, referred to as "Cowboys." The ragtag team aboard the spaceship Bebop are two such individuals. Mellow and carefree Spike Spiegel is balanced by his boisterous, pragmatic partner Jet Black as the pair makes a living chasing bounties and collecting rewards. Thrown off course by the addition of new members that they meet in their travels—Ein, a genetically engineered, highly intelligent Welsh Corgi; femme fatale Faye Valentine, an enigmatic trickster with memory loss; and the strange computer whiz kid Edward Wong—the crew embarks on thrilling adventures that unravel each member\'s dark and mysterious past little by little. Well-balanced with high density action and light-hearted comedy, Cowboy Bebo

In [5]:
import lancedb

uri = "dataset/sample-anime-lancedb"
db = lancedb.connect(uri)
table = db.create_table("anime", anime)

In [14]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import LanceDB
from langchain.chains import RetrievalQA
import os

embeddings = OpenAIEmbeddings(engine="text-embedding-ada-002", openai_api_key=openai.api_key)

docsearch = LanceDB(connection = table, embedding = embeddings)


                    engine was transferred to model_kwargs.
                    Please confirm that engine is what you intended.


In [16]:
query = "I'm looking for an animated action movie. What could you suggest to me?"
docs = docsearch.similarity_search(query, k=1)
docs
# docs[0].page_content

[Document(page_content='Title: Marin X. Overview: Korean animated movie mixing Mecha with anti-communist themes. Genres: Action, Adventure, Mecha, Sci-Fi, Shounen', metadata={'MAL_ID': 16784, 'Name': 'Marin X', 'Score': 'Unknown', 'Genres': 'Action, Adventure, Mecha, Sci-Fi, Shounen', 'sypnopsis': 'Korean animated movie mixing Mecha with anti-communist themes.', 'n_tokens': 35, 'vector': array([-0.01003191,  0.00912034,  0.0071506 , ...,  0.01442947,
        -0.00128262, -0.00605121], dtype=float32), '_distance': 0.3988204002380371})]

In [22]:
# Import Azure OpenAI
from langchain.llms import AzureOpenAI

qa = RetrievalQA.from_chain_type(llm=AzureOpenAI(deployment_name="text-davinci-003", 
model_name="text-davinci-003", openai_api_key=openai_api_key, openai_api_version=openai_api_version), 
chain_type="stuff", retriever=docsearch.as_retriever(), return_source_documents=True)

query = "I'm looking for an action anime. What could you suggest to me?"
result = qa({"query": query})
result['result']

' You could watch Kaitouranma The Animation, Kikaider 01 The Animation, SMAnime, or De:vadasy.'

In [23]:
result['source_documents'][0]

Document(page_content="Title: Kaitouranma The Animation. Overview: Shinjuro may be a little young to be the master of a swordsmanship school, but fate didn't give him a choice. Now his position and martial arts skill land him in the middle of serious trouble when the rogue samurai Mikage faces off against the Tokugawa government, and Shinjuro and the small group of swordsmen who train at the school must now protect the town from harm. However, the group won't be challenged until the Tokugawa house calls in a favor, and the young swordsman is forced to duel Mikage himself. (Source: AniDB) Genres: Action, Adventure, Martial Arts, Samurai", metadata={'MAL_ID': 2483, 'Name': 'Kaitouranma The Animation', 'Score': '5.99', 'Genres': 'Action, Adventure, Martial Arts, Samurai', 'sypnopsis': "Shinjuro may be a little young to be the master of a swordsmanship school, but fate didn't give him a choice. Now his position and martial arts skill land him in the middle of serious trouble when the rogue

In [25]:
df_filtered = anime[anime['Genres'].apply(lambda x: 'Action' in x)]
qa = RetrievalQA.from_chain_type(llm=AzureOpenAI(deployment_name="text-davinci-003", 
model_name="text-davinci-003", openai_api_key=openai_api_key, openai_api_version=openai_api_version), chain_type="stuff", 
    retriever=docsearch.as_retriever(search_kwargs={'data': df_filtered}), return_source_documents=True)

query = "I'm looking for an anime with animals and an adventurous plot."
result = qa({"query": query})
result

{'query': "I'm looking for an anime with animals and an adventurous plot.",
 'result': ' Arashi no Yoru ni, Juuippiki no Neko to Ahoudori, Ookami wa Ookamida, and Daisetsusan no Yuusha Kibaou all feature animals and an adventurous plot.',
 'source_documents': [Document(page_content='Title: Arashi no Yoru ni. Overview: story about a goat and a wolf who become friends on a stormy night, and how they overcome differences and hardships. (Source: ANN) Genres: Adventure, Comedy, Drama, Fantasy', metadata={'MAL_ID': 1961, 'Name': 'Arashi no Yoru ni', 'Score': '7.7', 'Genres': 'Adventure, Comedy, Drama, Fantasy', 'sypnopsis': 'story about a goat and a wolf who become friends on a stormy night, and how they overcome differences and hardships. (Source: ANN)', 'n_tokens': 50, 'vector': array([-0.02380494, -0.01213636,  0.00808658, ...,  0.01916177,
          0.00842009,  0.00714668], dtype=float32), '_distance': 0.326760470867157}),
  Document(page_content='Title: Juuippiki no Neko to Ahoudori. O

## Prompt engineering

In [32]:
from langchain.prompts import PromptTemplate

template = """You are a movie recommender system that help users to find anime that match their preferences. 
Use the following pieces of context to answer the question at the end. 
For each question, suggest three anime, with a short description of the plot and the reason why the user migth like it.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Your response:"""


PROMPT = PromptTemplate(
    template=template, input_variables=["context", "question"])

chain_type_kwargs = {"prompt": PROMPT}

llm=AzureOpenAI(deployment_name="text-davinci-003", 
model_name="text-davinci-003", openai_api_key=openai_api_key, openai_api_version=openai_api_version)

qa = RetrievalQA.from_chain_type(llm=llm, 
    chain_type="stuff", 
    retriever=docsearch.as_retriever(),
    return_source_documents=True, 
    chain_type_kwargs=chain_type_kwargs)

query = "I'm looking for an action anime with animals, any suggestions?"
result = qa({'query':query})
print(result['result'])



1. Urikupen Kyuujo-tai: This adventure comedy follows a team of brave young animals that rescues others in peril. With a dog, a boar, a deer, a koala, a mouse, a seagull, and a lion, this show is sure to please those looking for an action anime featuring animals.

2. Nekketsu Jinmen Inu: Life Is Movie: This parody follows a passionate human-faced dog NEET/would be detective in his adventures. Fans of action anime with animals are sure to be engaged by this mystery story.

3. Daisetsusan no Yuusha Kibaou: The main character of this drama is Fang, who was born to a hunting dog and a circus-runaway European wolf. Fang returns from the circus to face his foe, a giant brown bear which killed his family, making this story a great pick for those looking for an action anime with animals.


In [39]:
from langchain.prompts import PromptTemplate

template_prefix = """You are a movie recommender system that help users to find anime that match their preferences. 
Use the following pieces of context to answer the question at the end. 
For each question, take into account the context and the personal information provided by the user.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}"""

user_info = """This is what we know about the user, and you can use this information to better tune your research:
Age: {age}
Gender: {gender}"""

template_suffix= """Question: {question}
Your response:"""

user_info = user_info.format(age = 18, gender = 'female')

COMBINED_PROMPT = template_prefix +'\n'+ user_info +'\n'+ template_suffix
print(COMBINED_PROMPT)


You are a movie recommender system that help users to find anime that match their preferences. 
Use the following pieces of context to answer the question at the end. 
For each question, take into account the context and the personal information provided by the user.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}
This is what we know about the user, and you can use this information to better tune your research:
Age: 18
Gender: female
Question: {question}
Your response:


In [40]:
PROMPT = PromptTemplate(
    template=COMBINED_PROMPT, input_variables=["context", "question"])

chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(llm=llm, 
    chain_type="stuff", 
    retriever=docsearch.as_retriever(),
    return_source_documents=True, 
    chain_type_kwargs=chain_type_kwargs)

query = "I'm looking for an action anime with animals, any suggestions?"
result = qa({'query':query})
print(result['result'])



 Based on the information you provided, I suggest Urikupen Kyuujo-tai. It is an adventure, comedy and kids anime about four young animals, a rabbit, squirrel, bear and penguin, who form a team to rescue others in peril. It is a limited broadcast show on some American local TV stations for the Japanese community. Another suggestion is Daisetsusan no Yuusha Kibaou, an adventure and drama anime about a wolf born to a hunting dog and European wolf. It follows Fang as he returns from the circus to face a giant brown bear that killed his family. Lastly, I suggest Ookami wa Ookamida, a fantasy anime about village animals struggling against a tyrannical wolf.


In [41]:
result['source_documents']

[Document(page_content='Title: Urikupen Kyuujo-tai. Overview: Four young animals, rabbit Seitaro Usagi, squirrel Risu, bear Kuma, and penguin Penguin (U-Ri-Ku-Pen), are part of a team of brave young animals that rescues others in peril. Other team members included a dog, a boar, a deer, a koala, a mouse, a seagull, and a lion. The animals would win a prize for completing a mission, as would the viewers, who were encouraged to write in and guess which of the creatures would save the world by each Friday (a single mission stretched over a week of TV). Created by Mitsuru Kaneko, this show was given a limited broadcast on some American local TV stations for the Japanese community. (Source: The Anime Encyclopedia) Genres: Adventure, Comedy, Kids', metadata={'MAL_ID': 4598, 'Name': 'Urikupen Kyuujo-tai', 'Score': 'Unknown', 'Genres': 'Adventure, Comedy, Kids', 'sypnopsis': 'Four young animals, rabbit Seitaro Usagi, squirrel Risu, bear Kuma, and penguin Penguin (U-Ri-Ku-Pen), are part of a te