### Enhancing Recommendation Systems with Large Language Models (RAG - LangChain - OpenAI)

#### Import Libraries

In [7]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain streamlit



In [8]:
import pandas as pd
import tiktoken
import lancedb
from openai import OpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI

from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

from langchain_community.callbacks import get_openai_callback
from langchain_community.vectorstores import LanceDB 

#### Data preprocessing

In [9]:
openai_api_key = st.secrets["OPENAI_API_KEY"]
client = OpenAI(api_key=openai_api_key)

In [11]:
df = pd.read_csv('anime_with_synopsis.csv')

In [12]:
df.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ..."
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0..."
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...


In [13]:
df = df.dropna()

In [15]:
df['combined_info'] = df.apply(lambda row: f"Title: {row['Name']}. Overview: {row['sypnopsis']} Genres: {row['Genres']}", axis = 1)

In [21]:
df['combined_info'][0]

'Title: Cowboy Bebop. Overview: In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now uninhabitable surface of planet Earth behind. The Inter Solar System Police attempts to keep peace in the galaxy, aided in part by outlaw bounty hunters, referred to as "Cowboys." The ragtag team aboard the spaceship Bebop are two such individuals. Mellow and carefree Spike Spiegel is balanced by his boisterous, pragmatic partner Jet Black as the pair makes a living chasing bounties and collecting rewards. Thrown off course by the addition of new members that they meet in their travels—Ein, a genetically engineered, highly intelligent Welsh Corgi; femme fatale Faye Valentine, an enigmatic trickster with memory loss; and the strange computer whiz kid Edward Wong—the crew embarks on thrilling adventures that unravel each member\'s dark and mysterious past little by little. Well-balanced with high density action and light-hearted comedy, Cowboy Bebo

#### Embeddings (OpenAIEmbeddings)

In [28]:
embedding_model = "text-embedding-ada-002"
embedding_encoding = "cl100k_base"  # this the encoding for text-embedding-ada-002
max_tokens = 8000  # the maximum for text-embedding-ada-002 is 8191

encoding = tiktoken.get_encoding(embedding_encoding)

#omit descriptions that are too long to embed
df["n_tokens"] = df.combined_info.apply(lambda x: len(encoding.encode(x)))
df = df[df.n_tokens <= max_tokens]
len(df)

269

In [29]:
df.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis,combined_info,n_tokens
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever...",Title: Cowboy Bebop. Overview: In the year 207...,245
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ...",Title: Cowboy Bebop: Tengoku no Tobira. Overvi...,199
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0...",Title: Trigun. Overview: Vash the Stampede is ...,252
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...,Title: Witch Hunter Robin. Overview: ches are ...,125
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...,Title: Bouken Ou Beet. Overview: It is the dar...,188


In [30]:
def get_embedding(text, model = "text-embedding-3-small"):
    text = text.replace("\n", " ")
    return client.embeddings.create(input=[text], model = model).data[0].embedding

df["embedding"] = df.combined_info.apply(lambda x: get_embedding(x, model = embedding_model))
df.head()

Unnamed: 0,MAL_ID,Name,Score,Genres,sypnopsis,combined_info,n_tokens,embedding
0,1,Cowboy Bebop,8.78,"Action, Adventure, Comedy, Drama, Sci-Fi, Space","In the year 2071, humanity has colonized sever...",Title: Cowboy Bebop. Overview: In the year 207...,245,"[0.011228950694203377, -0.04166668280959129, -..."
1,5,Cowboy Bebop: Tengoku no Tobira,8.39,"Action, Drama, Mystery, Sci-Fi, Space","other day, another bounty—such is the life of ...",Title: Cowboy Bebop: Tengoku no Tobira. Overvi...,199,"[0.005144437309354544, -0.03935664892196655, -..."
2,6,Trigun,8.24,"Action, Sci-Fi, Adventure, Comedy, Drama, Shounen","Vash the Stampede is the man with a $$60,000,0...",Title: Trigun. Overview: Vash the Stampede is ...,252,"[-0.04247792065143585, -0.027609333395957947, ..."
3,7,Witch Hunter Robin,7.27,"Action, Mystery, Police, Supernatural, Drama, ...",ches are individuals with special powers like ...,Title: Witch Hunter Robin. Overview: ches are ...,125,"[0.011457948014140129, -0.031876180320978165, ..."
4,8,Bouken Ou Beet,6.98,"Adventure, Fantasy, Shounen, Supernatural",It is the dark century and the people are suff...,Title: Bouken Ou Beet. Overview: It is the dar...,188,"[-0.01856212317943573, -0.048905372619628906, ..."


In [31]:
df.rename(columns={'embedding': 'vector'}, inplace=True)
df.rename(columns={'combined_info': 'text'}, inplace= True)
df.to_pickle('RecommnedationSystemRAG.pkl')

#### Working with LLMs

In [58]:
# Assuming 'anime' is your original DataFrame
# Let's create a new DataFrame with the structure LanceDB expects
lancedb_data = pd.DataFrame({
    'id': df['MAL_ID'],
    'text': df['text'],  # This was previously 'combined_info'
    'vector': df['vector'].tolist(),  # Ensure this is a list of lists
    'metadata': df.apply(lambda row: {
        'Name': row['Name'],
        'Score': row['Score'],
        'Genres': row['Genres'],
        'sypnopsis': row['sypnopsis']
    }, axis=1).tolist()
})

In [60]:
# Establish a connection to the LanceDB database
uri = "sample-anime-lancedb"
db = lancedb.connect(uri)

# Create or access the table in the LanceDB connection
table = db.create_table("anime", data=lancedb_data)

In [61]:
# embeddings = OpenAIEmbeddings(engine="text-embedding-ada-002")
embeddings = OpenAIEmbeddings(
    deployment="SL-document_embedder",
    model="text-embedding-ada-002",
    show_progress_bar=True,
    openai_api_key = openai_api_key
)

In [62]:
# Pass the LanceDBConnection object to LanceDB in LangChain
docsearch = LanceDB(connection=db, embedding=embeddings, table_name="anime")

# Initialize the language model
llm = ChatOpenAI(
    model_name="gpt-4",
    temperature=0,
    api_key=openai_api_key
)

#### Prompt engineering

In [63]:
# define custom prompt
template = """You are a movie recommender system that help users to find anime that match their preferences. 
Use the following pieces of context to answer the question at the end. 
For each question, suggest three anime, with a short description of the plot and the reason why the user migth like it.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Your response:"""


In [64]:
PROMPT = PromptTemplate(
    template=template, input_variables=["context", "question"])

chain_type_kwargs = {"prompt": PROMPT}

qa_chain = RetrievalQA.from_chain_type(llm=llm,
                                       chain_type="stuff",
                                       retriever=docsearch.as_retriever(),
                                       return_source_documents=True,
                                       chain_type_kwargs=chain_type_kwargs)

query = "I'm looking for an action anime. What could you suggest to me?"

#query and response
with get_openai_callback() as cb:
    result = qa_chain({"query": query})
    
print(result['result'])

  0%|          | 0/1 [00:00<?, ?it/s]

1. "Bakuretsu Tenshi": This anime is set in a future where crime is rampant and citizens are allowed to carry firearms for self-defense. The story revolves around Kyohei Tachibana, a culinary student who gets involved with a group of mercenaries. This anime is filled with action, adventure, and a bit of comedy. If you enjoy stories with a mix of crime, action, and a bit of humor, this might be a good choice for you.

2. "Akira": This is a classic action anime set in a dystopian future. The story follows Shoutarou Kaneda, a leader of a biker gang, whose friend develops psychic abilities after an accident. The government tries to quarantine his friend to prevent further destruction. This anime is known for its intense action scenes, complex storyline, and its exploration of political and philosophical themes. If you enjoy action-packed stories with deep themes, you might like this anime.

3. "Full Metal Panic? Fumoffu": This anime combines action and comedy in a school setting. The story

#### Streamlit app interface

In [65]:
pip install streamlit

Note: you may need to restart the kernel to use updated packages.


In [66]:
import streamlit as st

def main():
    st.title("Movie RAG Recommendation System")
    st.write("Welcome to the Recommender! Tell me what kind of movie you're looking for.")
    
    user_input = st.text_input("What kind of anime are you interested in?")
    
    if st.button("Get Recommendations"):
        if user_input:
            with st.spinner("Generating Recommendations..."):
                result = qa_chain({"query": user_input})
                st.write(result['result'])
        else:
            st.warning("please enter your movie preferences")
            
if __name__ == "__main__":
    main()

2024-08-18 17:34:13.908 
  command:

    streamlit run /Applications/anaconda3/lib/python3.11/site-packages/ipykernel_launcher.py [ARGUMENTS]
2024-08-18 17:34:13.909 Session state does not function when running a script without `streamlit run`
