In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0


In [3]:
!cd /content/drive/MyDrive/Deepak/AI/AIMakerSpace/LLM-Ops-Cohort-1-main/Week-1/Thursday/

In [4]:
from dotenv import load_dotenv

# Load variables from .env file
load_dotenv('/content/drive/MyDrive/Deepak/AI/AIMakerSpace/.env')

True

# Questioning Barbie and Oppenheimer Through the Use of Agents

In the following notebook we will build an application that queries both the Barbie and Oppenheimer movies Wikipedia pages, as well as their reviews.

The main focus of this notebook is to showcase a brief introduction to Agents.

## Build 🏗️

There are 3 main tasks in this notebook:

1. Contruct a Barbie retriever
2. Construct an Oppenheimer retriever
3. Combine the two and allow users to query both resources from a single input through the use of Agents

## Ship 🚢

Based on Tuesday's session - construct a Chainlit (or Gradio) application that allows users to interface with the application.

## Share 🚀

Make a social media post about your final application.

### Dependencies

As always, let's start with some dependencies!

In [5]:
!pip install -q -U langchain openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [6]:
import getpass
import os
#openai_api_key = getpass.getpass("Enter your OpenAI API Key: ")
#os.environ["OPENAI_API_KEY"] = openai_api_key

### LLM

We will be leveraging OpenAI's `gpt-3.5-turbo` throughout the notebook, and we can keep it consistent throughout!

In [7]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo-0613", temperature = 0)

### Data Collection and Transformation

We'll be leveraging the `WikipediaLoader` tool to collect information from Wikipedia.

Be sure to set the `doc_content_chars_max` parameter so that you capture the *entire* Wikipedia article content.

In [8]:
!pip install -q -U wikipedia

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone


In [9]:
from langchain.document_loaders import WikipediaLoader, CSVLoader
dir_path= "/content/drive/MyDrive/Deepak/AI/AIMakerSpace/LLM-Ops-Cohort-1-main/Week-1/Thursday"
barbie_wikipedia_docs = WikipediaLoader(
    query="Barbie (film)",
    load_max_docs= 1000, # YOUR CODE HERE,
    doc_content_chars_max=4000,### YOUR CODE HERE
    ).load()

barbie_csv_docs = CSVLoader(
    file_path=f"{dir_path}/barbie_data/barbie.csv",### YOUR CODE HERE,
    source_column="Review",### YOUR CODE HERE
    ).load()

Since we'll be using same format source documentation separated by topic, we can save ourselves some extra effort and set up our splitters once.

We're going to leverage the `RecursiveCharacterTextSplitter` again, this time paying close attention to the format our Wikipedia articles and reviews are in so we can be sure to chunk them appropritately.

> HINT: You can pass a list of separators when you intialize your `RecursiveTextSplitter`! They are acted on in order of element 0 -> element len(list).

RELEVANT DOCS:
- [`RecursiveCharacterTextSplitter`](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html#langchain.text_splitter.RecursiveCharacterTextSplitter)

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

wikipedia_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000, ### YOUR CODE HERE,
    chunk_overlap =50, ### YOUR CODE HERE,
    length_function = len,### YOUR CODE HERE,
    is_separator_regex= False,
    separators = ["\n==","\n", " "]### YOUR CODE HERE # keep headings, then paragraphs, then sentences
)

csv_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000, ### YOUR CODE HERE,
    chunk_overlap = 50,### YOUR CODE HERE,
    length_function = len,### YOUR CODE HERE,
    is_separator_regex= False,
    separators = ["\n", " "]### YOUR CODE HERE # keep paragraphs, then sentences
)

chunked_barbie_wikipedia_docs = wikipedia_text_splitter.transform_documents(barbie_wikipedia_docs) ### YOUR CODE HERE
chunked_barbie_csv_docs = csv_text_splitter.transform_documents(barbie_csv_docs)### YOUR CODE HERE

#### Retrieval and Embedding Strategy

We've already discussed the useful application of `CacheBackedEmbeddings`, so let's do it again!

RELEVANT DOCS:
- [`CacheBackedEmbeddings`](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain-embeddings-cache-cachebackedembeddings)

In [11]:
!pip install -q -U rank_bm25 tiktoken faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m75.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [12]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore

# set up cached embeddings store
store = LocalFileStore("./cache/")### YOUR CODE HERE

core_embeddings_model = OpenAIEmbeddings()### YOUR CODE HERE

embedder = CacheBackedEmbeddings.from_bytes_store(core_embeddings_model, store, namespace=core_embeddings_model.model)### YOUR CODE HERE

We'll implement a `FAISS` vectorstore, and create a retriever from it.

In [13]:
barbie_csv_faiss_retriever = FAISS.from_documents(chunked_barbie_csv_docs, embedder).as_retriever()### YOUR CODE HERE


There are a number of excellent options to retrieve documents - we'll be looking at an additional example today, which is called the `EnsembleRetriever`.

The method this is using is outlined in [this paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).

The brief explanation is:

1. We collect results from two different retrieval methods over the same corpus
2. We apply a reranking algorithm to rerank our source documents to be the *most relevant* without losing specific or potentially low-ranked information rich documents
3. We feed the top-k results into the LLM with our query as context.

> HINT: Your weight list should be of type `List[float]` and the `sum(List[float])` should be `1`.

In [14]:
# set up BM25 retriever
barbie_wikipedia_bm25_retriever = BM25Retriever.from_documents(
    chunked_barbie_wikipedia_docs ### YOUR CODE HERE
)
barbie_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
barbie_wikipedia_faiss_store = FAISS.from_documents(
    chunked_barbie_wikipedia_docs, ### YOUR CODE HERE,
    embedder, ### YOUR CODE HERE
)
barbie_wikipedia_faiss_retriever = barbie_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
barbie_ensemble_retriever = EnsembleRetriever(
    retrievers=[barbie_wikipedia_bm25_retriever, barbie_wikipedia_faiss_retriever],### YOUR CODE HERE,
    weights= [0.25, 0.75]### YOUR CODE HERE # should sum to 1
)

#### Retrieval Agent

We can create a simple conversational retrieval Agent by using the built-ins provided by LangChain!

> HINT: Be sure to provide good natural language descriptions of what the tool should be used for to get the best results.

RELEVANT DOCS:
- [`create_retriever_tool`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool.html#langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool)

In [15]:
from langchain.agents.agent_toolkits import create_retriever_tool

barbie_wikipedia_retrieval_tool = create_retriever_tool(
    barbie_ensemble_retriever, ### YOUR CODE HERE
    "Wikipedia",### YOUR CODE HERE
    "retrieves information related to plot, history, cast etc. of the Barbie movie", ### YOUR CODE HERE
)

barbie_csv_retrieval_tool = create_retriever_tool(
    barbie_csv_faiss_retriever, ### YOUR CODE HERE
    "Reviews",### YOUR CODE HERE
    "retrieves public reviews of the Barbie movie",### YOUR CODE HERE
)

barbie_retriever_tools = [barbie_wikipedia_retrieval_tool, barbie_csv_retrieval_tool]### YOUR CODE HERE

Now that we've created our tools, we can combined them into an agent!

RELEVANT DOCS:
- [`create_conversational_retrieval_agent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent.html#langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent)

In [16]:
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent

barbie_retriever_agent_executor = create_conversational_retrieval_agent(llm, barbie_retriever_tools, verbose=True)# YOUR CODE HERE

In [17]:
barbie_retriever_agent_executor({"input" : "Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Reviews` with `Barbie`


[0m[33;1m[1;3m[Document(page_content="for details and her ability to create a dreamlike world. Barbieland is a feminist utopia in which Barbies serve as presidents, Supreme Court members, doctors, physicists, Nobel prize winners, and cancer cures. The dolls are not structurally different from our world, but they are represented by political fantasy and decade-dependent what-ifs. Barbie has gone to space, could vote, and own property before many human women, and her appearance has changed to mirror society's beauty politics. The representational doll has become an extension of political fantasy, reflecting the beauty politics of society. Gerwig features a cast of Barbies, including Issa Rae, Emma Mackey, Alexandra Shipp, and Hari Nef. Stereotypical Barbie (Margot Robbie) is the protagonist, a blonde-haired, blue-eyed Barbie created by Ruth Handler. Her Ken counterpart, Ryan Gosling, is p

{'input': 'Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?', additional_kwargs={}, example=False),
  AIMessage(content='', additional_kwargs={'function_call': {'name': 'Reviews', 'arguments': '{\n  "__arg1": "Barbie"\n}'}}, example=False),
  FunctionMessage(content='[Document(page_content="for details and her ability to create a dreamlike world. Barbieland is a feminist utopia in which Barbies serve as presidents, Supreme Court members, doctors, physicists, Nobel prize winners, and cancer cures. The dolls are not structurally different from our world, but they are represented by political fantasy and decade-dependent what-ifs. Barbie has gone to space, could vote, and own property before many human women, and her appearance has changed to mirro

In [18]:
barbie_retriever_agent_executor({"input" : "What is a very quick summary of the plot of the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Wikipedia` with `Barbie`


[0m[36;1m[1;3m[Document(page_content='In 1997, Danish-Norwegian band Aqua released "Barbie Girl", a song about two fashion dolls manufactured by Mattel named Barbie and Ken. The song was a massive success, thrusting the band into worldwide fame. Per Stephen Thomas Erlewine, "Barbie Girl" was an "inexplicable pop culture phenomenon", and Bree Player of Marie Claire Australia wrote it helped solidify the Barbie doll as a cultural icon. However, Mattel reacted negatively to "Barbie Girl" and sued Aqua\'s US label, MCA Records, claiming the band violated their trademark and the lyrics\' sexually suggestive nature stained the company\'s reputation as a child-friendly brand.Since the beginning of her career, rapper Nicki Minaj has made the Barbie doll a significant part of her public persona. A widely known aspect of her artistry is her use of alter egos, one of whom is named Harajuku Barbi

{'input': 'What is a very quick summary of the plot of the Barbie movie?',
 'chat_history': [AIMessage(content="Based on the reviews, it seems that people had mixed opinions about the movie Barbie. Some viewers found the movie to be philosophical and appreciated its deeper themes and exploration of existential questions. They enjoyed the film's blend of fun and emotional depth, as well as its sharp and intelligent comments on social dynamics and gender roles. These viewers found the movie to be more than just a lighthearted comedy and appreciated its thought-provoking elements.\n\nHowever, there were also viewers who did not appreciate the philosophical aspects of the movie. They felt that the movie was dragged into the cultural war and used as a puppet by political militancy. These viewers may have found the philosophical elements to be unnecessary or distracting from the overall enjoyment of the film.\n\nOverall, it seems that the movie Barbie incorporates philosophical elements thro

### Oppenheimer Retrieval System

We're going to repourpose some of what we created previously, but this time we'll explore a different multi-source retrieval system.

In [19]:
oppenheimer_wikipedia_docs = WikipediaLoader(query="Oppenheimer", load_max_docs=1).load()### YOUR CODE HERE

oppenheimer_csv_docs = CSVLoader(file_path=f"{dir_path}/oppenheimer_data/oppenheimer.csv", source_column="Review_Url").load() ### YOUR CODE HERE

In [21]:
chunked_opp_wikipedia_docs = wikipedia_text_splitter.transform_documents(oppenheimer_wikipedia_docs) ### YOUR CODE HERE
chunked_opp_csv_docs = csv_text_splitter.transform_documents(oppenheimer_csv_docs) ### YOUR CODE HERE

In [22]:
opp_csv_faiss_retriever = FAISS.from_documents(chunked_opp_csv_docs, embedder).as_retriever() ### YOUR CODE HERE

# set up BM25 retriever
opp_wikipedia_bm25_retriever = BM25Retriever.from_documents(chunked_opp_wikipedia_docs)### YOUR CODE HERE
opp_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
opp_wikipedia_faiss_store = FAISS.from_documents(chunked_opp_wikipedia_docs, embedder)### YOUR CODE HERE
opp_wikipedia_faiss_retriever = opp_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
opp_ensemble_retriever = EnsembleRetriever(retrievers=[opp_wikipedia_bm25_retriever,opp_wikipedia_faiss_retriever], weights=[0.5, 0.5]) ### YOUR CODE HERE

#### Multi-source chain

We're going to allow the LLM to decide which information is most -> least valuable.

The way we'll do this is with LangChain's rather powerful "Expression Language"!

> HINT: You can leverage [this](https://python.langchain.com/docs/use_cases/question_answering/how_to/multiple_retrieval) resource if you get stuck - but experiment with different prompts/formats.

In [23]:
from langchain.prompts import ChatPromptTemplate

system_message = """Use the information from the below two sources to answer any questions.

Source 1: public user reviews about the Oppenheimer movie
<source1>
{source1}
</source1>

Source 2: the wikipedia page for the Oppenheimer movie including the plot summary, cast, and production information
<source2>
{source2}
</source2>
"""

prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])

In [24]:
oppenheimer_multisource_chain = {
    "source1": (lambda x: x["question"]) | opp_ensemble_retriever,
    "source2": (lambda x: x["question"]) | opp_csv_faiss_retriever,
    "question": lambda x: x["question"],
} | prompt | llm

In [25]:
oppenheimer_multisource_chain.invoke({"question" : "What did people think of the Oppenheimer movie?"})

AIMessage(content="Based on the public user reviews, opinions about the Oppenheimer movie are generally positive. Reviewers praised the film for its compelling storytelling, engaging plot, and the ability to make a complex character like J. Robert Oppenheimer easy to understand. The actors' performances, particularly Robert Downey's, were highly praised. The cinematography was also appreciated for its beautiful craftsmanship. Some reviewers mentioned that they had read the book prior to watching the film and still enjoyed it. However, there were also a few criticisms. One reviewer felt that the film lacked emotion and failed to capture the relationships and charisma of Oppenheimer as portrayed in the book. Another reviewer mentioned that the film lacked heart despite its visually stunning imagery and decent performances. Overall, the majority of reviewers found the Oppenheimer movie to be an enjoyable and thought-provoking experience.", additional_kwargs={}, example=False)

# Agent Creation

Now we can finally start building our Agent!

The first thing we'll need to do is provide our Agent a Toolbelt. (list of tools). Much like Batman, our LLM-powered Agent can use these tools as it sees fit.

While the examples we're constructing in this notebook are straightforward for brevity and simplicities sake - there is no limit to what you can build with Agents, as we'll see as we progress through the program.

So, let's begin by setting up our Tools!

You'll notice that we have to set up a function to allow our `OppenheimerInfo` tool to interface with the Agent - this is due to it have a specific required input. Creating custom tools is a pattern that you'll want to grow acustomed to as you use LangChain more and more.

In [26]:
from langchain.agents import Tool

def query_oppenheimer(input):
    return oppenheimer_multisource_chain.invoke({"question" : input})

tools = [
    Tool(
        name = "BarbieTool", ### YOUR CODE HERE,
        func= barbie_retriever_agent_executor.invoke, ### YOUR CODE HERE,
        description="good for anwering Barbie related questions." ### YOUR CODE HERE
    ),
    Tool(
        name = "OppenheimerTool", ### YOUR CODE HERE,
        func= query_oppenheimer, ### YOUR CODE HERE,
        description= "good for anwering Oppenheimer related questions."### YOUR CODE HERE
    ),
]

Now that we've set up our Agents toolbelt, let's set up the LLM that will be powering it!

I would suggest playing around with these prompts - and experiments to find what works best for you.

RELEVANT DOCS:
- [`ZeroShotAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.mrkl.base.ZeroShotAgent.html#langchain-agents-mrkl-base-zeroshotagent)

In [27]:
from langchain.agents import ZeroShotAgent, AgentExecutor

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools, ### YOUR CODE HERE
    prefix = prefix, ### YOUR CODE HERE
    suffix = suffix, ### YOUR CODE HERE
    input_variables=["input", "agent_scratchpad"] ### YOUR CODE HERE
)

In [28]:
from langchain import LLMChain

llm_chain = LLMChain(llm=llm, prompt=prompt)### YOUR CODE HERE

All that's left to do now is create our `ZeroShotAgent` and our `AgentExecutor`, which are the "reasoner" and "actor" halfs of the `ReAct` method of Agent implementation.

Read all about the `ReAct` framework [here](https://react-lm.github.io/)

In [29]:
barbenheimer_agent = ZeroShotAgent(
    llm_chain=llm_chain,### YOUR AGENTS,
    tools=tools,### YOUR AGENTS,
    verbose=True ### YOUR AGENTS
    )

barbenheimer_agent_chain = AgentExecutor.from_agent_and_tools(
    agent= barbenheimer_agent, ### YOUR AGENTS,
    tools=tools, ### YOUR AGENTS,
    verbose= True ### YOUR AGENTS
    )

## Conclusion

All that is left to do now, is feed inputs to your Agent and watch it work!

Remember to use the `{"input" : "YOUR QUERY HERE"}` format when prompting the Agent.

In [30]:
barbenheimer_agent_chain.invoke({"input" : "What did people like about the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I'm not sure what people liked about the Barbie movie. I should use the BarbieTool to find out.
Action: BarbieTool
Action Input: "What did people like about the Barbie movie?"[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Reviews` with `Barbie`


[0m[33;1m[1;3m[Document(page_content="for details and her ability to create a dreamlike world. Barbieland is a feminist utopia in which Barbies serve as presidents, Supreme Court members, doctors, physicists, Nobel prize winners, and cancer cures. The dolls are not structurally different from our world, but they are represented by political fantasy and decade-dependent what-ifs. Barbie has gone to space, could vote, and own property before many human women, and her appearance has changed to mirror society's beauty politics. The representational doll has become an extension of political fantasy, reflecting the beauty politics of society. Gerwi

InvalidRequestError: ignored

In [31]:
barbenheimer_agent_chain.run({"input" : "What did people like about the Oppenheimer movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I don't have enough information to answer this question.
Action: OppenheimerTool
Action Input: "Oppenheimer movie"[0m
Observation: [33;1m[1;3mcontent="The Oppenheimer movie is a biopic about American scientist J. Robert Oppenheimer. It follows his crucial role in developing the atomic bomb during World War II. The movie explores the ethical aspects of creating the bomb, as well as Oppenheimer's personal life and political pressures. The film has received positive reviews, with viewers praising its dialogue, character drama, and captivating storytelling. Christopher Nolan directed the movie, and Cillian Murphy portrays J. Robert Oppenheimer. The cinematography is breathtaking, transporting the audience to the World War II era. The supporting cast's performances are also highly praised. Overall, the Oppenheimer movie is considered a masterpiece and an awe-inspiring theatrical experience." additional_kwargs={} exampl

'People liked the Oppenheimer movie because of its captivating storytelling, dialogue, character drama, breathtaking cinematography, and the performances of the cast. It is considered a masterpiece and an awe-inspiring theatrical experience.'

In [32]:
barbenheimer_agent_chain.run({"input" : "Did the movies Barbie and Oppenheimer share similar themes or ideas?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to determine if the movies Barbie and Oppenheimer share similar themes or ideas.
Action: Use BarbieTool to answer the question about Barbie.
Action Input: "Do the Barbie movies have any similar themes or ideas as the Oppenheimer movie?"[0m
Observation: Use BarbieTool to answer the question about Barbie. is not a valid tool, try one of [BarbieTool, OppenheimerTool].
Thought:[32;1m[1;3mI cannot use BarbieTool to answer this question. I should try using OppenheimerTool instead.
Action: Use OppenheimerTool to answer the question about Oppenheimer.
Action Input: "Do the Oppenheimer movies have any similar themes or ideas as the Barbie movie?"[0m
Observation: Use OppenheimerTool to answer the question about Oppenheimer. is not a valid tool, try one of [BarbieTool, OppenheimerTool].
Thought:[32;1m[1;3mI cannot use OppenheimerTool to answer this question either. I don't have the necessary tools to determine if t

"I don't know if the movies Barbie and Oppenheimer share similar themes or ideas."

## Next Steps

It's time to build a Chainlit (or Gradio) application and host it on Hugging Face Spaces! :ship: