# Creating a more robust RAQA system using LlamaIndex

We'll be putting together a system for querying both qualitative and quantitative data using LlamaIndex.

To stick to a theme, we'll continue to use BarbenHeimer data as our base - but this can, and should, be extended to other topics/domains.

# Build 🏗️
There are 3 main tasks in this notebook:

- Create a Qualitative VectorStore query engine
- Create a quantitative NLtoSQL query engine
- Combine the two using LlamaIndex's OpenAI agent framework.

# Ship 🚢
Create an host a Gradio or Chainlit application to serve your project on Hugging Face spaces.

# Share 🚀
Make a social media post about your final application and tag @AIMakerspace

## Setup Google Colab to Work With Google-Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.0


In [3]:
!cd /content/drive/MyDrive/Deepak/AIMakerSpace/LLM-Ops-Cohort-1-main/Week-2/Tuesday

In [5]:
from dotenv import load_dotenv

# Load variables from .env file
load_dotenv('/content/drive/MyDrive/Deepak/AIMakerSpace/.env')

True

### A note on terminology:

You'll notice that there are quite a few similarities between LangChain and LlamaIndex. LlamaIndex can largely be thought of as an extension to LangChain, in some ways - but they moved some of the language around. Let's spend a few moments disambiguating the language.

- `QueryEngine` -> `RetrievalQA`:
  -  `QueryEngine` is just LlamaIndex's way of indicating something is an LLM "chain" on top of a retrieval system
- `OpenAIAgent` vs. `ZeroShotAgent`:
  - The two agents have the same fundamental pattern: Decide which of a list of tools to use to answer a user's query.
  - `OpenAIAgent` (LlamaIndex's primary agent) does not need to rely on an agent excecutor due to the fact that it is leveraging OpenAI's [functional api](https://openai.com/blog/function-calling-and-other-api-updates) which allows the agent to interface "directly" with the tools instead of operating through an intermediary application process.

There is, however, a much large terminological difference when it comes to discussing data.

##### Nodes vs. Documents

As you're aware of from the previous weeks assignments, there's an idea of `documents` in NLP which refers to text objects that exist within a corpus of documents.

LlamaIndex takes this a step further and reclassifies `documents` as `nodes`. Confusingly, it refers to the `Source Document` as simply `Documents`.

The `Document` -> `node` structure is, almost exactly, equivalent to the `Source Document` -> `Document` structure found in LangChain - but the new terminology comes with some clarity about different structure-indices.

We won't be leveraging those structured indicies today, but we will be leveraging a "benefit" of the `node` structure that exists as a default in LlamaIndex, which is the ability to quickly filter nodes based on their metadata.

![image](https://i.imgur.com/B1QDjs5.png)

### BOILERPLATE

This is only relevant when running the code in a Jupyter Notebook.

In [6]:
import nest_asyncio

nest_asyncio.apply()

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

### Primary Dependencies and Context Setting

#### Dependencies and OpenAI API key setting

First of all, we'll need our primary libraries - and to set up our OpenAI API key.

In [11]:
!pip install -U -q openai==0.27.8 llama-index==0.8.6 nltk==3.8.1

In [12]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

import openai
openai.api_key = os.environ["OPENAI_API_KEY"]

#### Context Setting

Now, LlamaIndex has the ability to set `ServiceContext`. You can think of this as a config file of sorts. The basic idea here is that we use this to establish some core properties and then can pass it to various services.

While we could set this up as a global context, we're going to leave it as `ServiceContext` so we can see where it's applied.

We'll set a few significant contexts:

- `chunk_size` - this is what it says on the tin
- `llm` - this is where we can set what model we wish to use as our primary LLM when we're making `QueryEngine`s and more
- `embed_model` - this will help us keep our embedding model consistent across use cases


We'll also create some resources we're going to keep consistent across all of our indices today.

- `text_splitter` - This is what we'll use to split our text, feel free to experiment here
- `SimpleNodeParser` - This is what will work in tandem with the `text_splitter` to parse our full sized documents into nodes.

In [13]:
from llama_index import ServiceContext
from llama_index.node_parser.simple import SimpleNodeParser
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.llms import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(openai_api_key=os.environ["OPENAI_API_KEY"])### YOUR CODE HERE
chunk_size = 1000 ### YOUR CODE HERE
llm = OpenAI(
    temperature=0,
    model="gpt-3.5-turbo-0613",
    streaming=True
)

service_context = ServiceContext.from_defaults(
    llm= llm,### YOUR CODE HERE,
    chunk_size=1000, ### YOUR CODE HERE,
    embed_model=embed_model ### YOUR CODE HERE
)

text_splitter = TokenTextSplitter(
    chunk_size=1000### YOUR CODE HERE
)

node_parser = SimpleNodeParser(
    text_splitter=text_splitter### YOUR CODE HERE
)

### BarbenHeimer Wikipedia Retrieval Tool

Now we can get to work creating our semantic `QueryEngine`!

We'll follow a similar pattern as we did with LangChain here - and the first step (as always) is to get dependencies.

In [10]:
!pip install -U -q chromadb==0.4.6 tiktoken==0.4.0 sentence-transformers==2.2.2 pydantic==1.10.11

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m405.5/405.5 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m47.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.4/58.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.5/59.5 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m86.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [14]:
from llama_index import VectorStoreIndex
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
import chromadb

#### ChromaDB

We'll be using [ChromaDB](https://www.trychroma.com/) as our `VectorStore` today!

It works in a similar fashion to tools like Pinecone, Weaveate, and more - but it's locally hosted and will serve our purposes fine.

You'll also notice the return of `OpenAIEmbedding()`, which is the embeddings model we'll be leveraging. Of course, this is using the `ada` model under the hood - and already comes equipped with in-memory caching.

You'll notice we can pass our `service_context` into our `VectorStoreIndex`!

In [15]:
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("wikipedia_barbie_opp")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
wiki_vector_index = VectorStoreIndex([], storage_context=storage_context, service_context=service_context)

In [16]:
!pip install -U -q wikipedia

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone


Essentially the same as the LangChain example - we're just going to be pulling information straight from Wikipedia using the built in `WikipediaReader`.

Setting `auto_suggest=False` ensures we run into fewer auto-correct based errors.

In [17]:
from llama_index.readers.wikipedia import WikipediaReader

movie_list = ["Barbie (film)", "Oppenheimer (film)"]

wiki_docs = WikipediaReader().load_data(pages=movie_list, auto_suggest=False)

In [18]:
print(wiki_docs)



#### Node Construction

Now we will loop through our documents and metadata and construct nodes (associated with particular metadata for easy filtration later).

We're using the `node_parser` we created at the top of the Notebook.

In [19]:
for movie, wiki_doc in zip(movie_list, wiki_docs):
    nodes = node_parser.get_nodes_from_documents([wiki_doc])### YOUR CODE HERE
    for node in nodes:
        node.metadata = {"title":movie}# YOUR CODE HERE
    wiki_vector_index.insert_nodes(nodes)### YOUR CODE HERE

#### Auto Retriever Functional Tool

This tool will leverage OpenAI's functional endpoint to select the correct metadata filter and query the filtered index - only looking at nodes with the desired metadata.

A simplified diagram: ![image](https://i.imgur.com/AICDPav.png)

First, we need to create our `VectoreStoreInfo` object which will hold all the relevant metadata we need for each component (in this case title metadata).

Notice that you need to include it in a text list.

In [20]:
from llama_index.tools import FunctionTool
from llama_index.vector_stores.types import (
    VectorStoreInfo,
    MetadataInfo,
    ExactMatchFilter,
    MetadataFilters,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine

from typing import List, Tuple, Any
from pydantic import BaseModel, Field

top_k = 3

vector_store_info = VectorStoreInfo(
    content_info="semantic information about movies",
    metadata_info=[MetadataInfo(
        name="title",
        type="str",
        description="title of the movie, one of [Barbie (film), Oppenheimer (film)]",
    )]
)

Now we'll create our base PyDantic object that we can use to ensure compatability with our application layer. This verifies that the response from the OpenAI endpoint conforms to this schema.

In [21]:
class AutoRetrieveModel(BaseModel):
    query: str = Field(..., description="natural language query string")
    filter_key_list: List[str] = Field(
        ..., description="List of metadata filter field names"
    )
    filter_value_list: List[str] = Field(
        ...,
        description=(
            "List of metadata filter field values (corresponding to names specified in filter_key_list)"
        )
    )

Now we can build our function that we will use to query the functional endpoint.

>The `docstring` is important to the functionality of the application.

In [22]:
def auto_retrieve_fn(
    query: str, filter_key_list: List[str], filter_value_list: List[str]
):
    """Auto retrieval function.

    Performs auto-retrieval from a vector database, and then applies a set of filters.

    """
    query = query or "Query"

    exact_match_filters = [
        ExactMatchFilter(key=k, value=v)
        for k, v in zip(filter_key_list, filter_value_list)
    ]
    retriever = VectorIndexRetriever(
        wiki_vector_index, filters=MetadataFilters(filters=exact_match_filters), top_k=top_k
    )
    query_engine = RetrieverQueryEngine.from_args(retriever)

    response = query_engine.query(query)
    return str(response)

Now we need to wrap our system in a tool in order to integrate it into the larger application.

Source Code Here:
- [`FunctionTool`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/tools/function_tool.py#L21)

In [23]:
from llama_index.tools.utils import create_schema_from_function
description = f"""\
Use this tool to look up semantic information about films.
The vector database schema is given below:
{vector_store_info.json()}
"""

auto_retrieve_tool = FunctionTool.from_defaults(
    fn= auto_retrieve_fn,# YOUR CODE HERE,
    name="auto-retrieve-tool",# YOUR CODE HERE,
    description=description,# YOUR CODE HERE,
    fn_schema= AutoRetrieveModel# YOUR CODE HERE,
)

All that's left to do is attach the tool to an OpenAIAgent and let it rip!

Source Code Here:
- [`OpenAIAgent`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/agent/openai_agent.py#L361)

In [24]:
from llama_index.agent import OpenAIAgent

agent = OpenAIAgent.from_tools(
    ### YOUR CODE HERE
    [auto_retrieve_tool],llm=llm, verbose=True
)

In [25]:
response = agent.chat("Tell me what happens (briefly) in the Barbie movie.")
print(str(response))

=== Calling Function ===
Calling function: auto-retrieve-tool with args: {
  "query": "Barbie movie",
  "filter_key_list": ["title"],
  "filter_value_list": ["Barbie (film)"]
}
Got output: The Barbie movie is a 2023 American fantasy comedy film directed by Greta Gerwig. It is the first live-action Barbie film and is based on the Barbie fashion dolls by Mattel. The film follows Barbie and Ken on a journey of self-discovery after Barbie experiences an existential crisis. It features an ensemble cast, including Margot Robbie, Ryan Gosling, America Ferrera, Kate McKinnon, Helen Mirren, Issa Rae, Simu Liu, Michael Cera, Rhea Perlman, and Will Ferrell. The film premiered in July 2023 and has received critical acclaim, becoming one of the highest-grossing films of the year.
The Barbie movie is a 2023 American fantasy comedy film directed by Greta Gerwig. It follows Barbie and Ken on a journey of self-discovery after Barbie experiences an existential crisis. The film features an ensemble cast 

### BarbenHeimer SQL Tool

We'll walk through the steps of creating a natural language to SQL system in the following section.

> NOTICE: This does not have parsing on the inputs or intermediary calls to ensure that users are using safe SQL queries. Use this with caution in a production environment without adding specific guardrails from either side of the application.

In [26]:
!pip install -q -U sqlalchemy pandas

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m64.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m341.8/341.8 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 2.0.3 which is incompatible.[0m[31m
[0m

The next few steps should be largely straightforward, we'll want to:

1. Read in our `.csv` files into `pd.DataFrame` objects
2. Create an in-memory `sqlite` powered `sqlalchemy` engine
3. Cast our `pd.DataFrame` objects to the SQL engine
4. Create an `SQLDatabase` object through LlamaIndex
5. Use that to create a `QueryEngineTool` that we can interact with through the `NLSQLTableQueryEngine`!

If you get stuck, please consult the documentation.

#### Read `.csv` Into Pandas

In [27]:
import pandas as pd
base_path= "/content/drive/MyDrive/Deepak/AIMakerSpace/LLM-Ops-Cohort-1-main/Week-2/Tuesday"
barbie_df = pd.read_csv(f"{base_path}/barbie_data/barbie.csv")# YOUR CODE HERE
oppenheimer_df = pd.read_csv(f"{base_path}/oppenheimer_data/oppenheimer.csv")# YOUR CODE HERE

In [28]:
barbie_df.head()

Unnamed: 0.1,Unnamed: 0,Review_Date,Author,Rating,Review_Title,Review,Review_Url
0,0,21 July 2023,LoveofLegacy,6.0,"Beautiful film, but so preachy\n","Margot does the best with what she's given, bu...",/review/rw9199947/?ref_=tt_urv
1,1,22 July 2023,imseeg,7.0,3 reasons FOR seeing it and 1 reason AGAINST.\n,The first reason to go see it:,/review/rw9199947/?ref_=tt_urv
2,2,22 July 2023,Natcat87,6.0,Too heavy handed\n,"As a woman that grew up with Barbie, I was ver...",/review/rw9199947/?ref_=tt_urv
3,3,31 July 2023,ramair350,10.0,"As a guy I felt some discomfort, and that's o...",As much as it pains me to give a movie called ...,/review/rw9199947/?ref_=tt_urv
4,4,24 July 2023,heatherhilgers,9.0,A Technicolor Dream\n,"Wow, this movie was a love letter to cinema. F...",/review/rw9199947/?ref_=tt_urv


In [29]:
oppenheimer_df.head()

Unnamed: 0.1,Unnamed: 0,Review_Date,Author,Rating,Review_Title,Review,Review_Url
0,0,19 July 2023,Orlando_Gardner,9.0,Murphy is exceptional\n,You'll have to have your wits about you and yo...,/review/rw9199470/?ref_=tt_urv
1,1,20 July 2023,Jeremy_Urquhart,8.0,"A challenging watch to be sure, but a worthwh...",One of the most anticipated films of the year ...,/review/rw9202448/?ref_=tt_urv
2,2,20 July 2023,MrDHWong,10.0,A brilliantly layered examination of a man th...,"""Oppenheimer"" is a biographical thriller film ...",/review/rw9202246/?ref_=tt_urv
3,3,20 July 2023,and_mikkelsen,10.0,Nolan delivers a powerfull biopic that shows ...,This movie is just... wow! I don't think I hav...,/review/rw9202357/?ref_=tt_urv
4,4,26 July 2023,Geekofriendly,8.0,"Nolan touches greatness, falls just slightly ...",I was familiar with the Manhattan project and ...,/review/rw9216587/?ref_=tt_urv


#### Create SQLAlchemy engine with SQLite

In [30]:
from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///:memory:")

#### Convert `pd.DataFrame` to SQL tables

In [31]:
barbie_df.to_sql(
    name = "barbie_table",# name of table
    con = engine# engine
)

125

In [32]:
oppenheimer_df.to_sql(
    name = "oppenheimer_table",# name of table
    con = engine# engine
)

150

#### Construct a `SQLDatabase` index

Source Code Here:
- [`SQLDatabase`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/langchain_helpers/sql_wrapper.py#L9)

In [33]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(
    engine = engine,# YOUR CODE HERE,
    include_tables=["barbie_table", "oppenheimer_table"]# YOUR CODE HERE
    )

#### Create the NLSQLTableQueryEngine interface for all added SQL tables

Source Code Here:
- [`NLSQLTableQueryEngine`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/indices/struct_store/sql_query.py#L75C1-L75C1)

In [34]:
from llama_index.indices.struct_store.sql_query import NLSQLTableQueryEngine

sql_query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,# YOUR CODE HERE,
    tables=["barbie_table", "oppenheimer_table"]# YOUR CODE HERE
)

#### Wrap It All Up in a `QueryEngineTool`

You'll want to ensure you have a descriptive...description.

An example is provided here:

```
"Useful for translating a natural language query into a SQL query over a table containing: "
"barbie, containing information related to reviews of the Barbie movie"
"oppenheimer, containing information related to reviews of the Oppenheimer movie"
```

Sorce Code Here:

- [`QueryEngineTool`](https://github.com/jerryjliu/llama_index/blob/d24767b0812ac56104497d8f59095eccbe9f2b08/llama_index/tools/query_engine.py#L13)

In [35]:
from llama_index.tools.query_engine import QueryEngineTool

sql_tool = QueryEngineTool.from_defaults(
    query_engine=sql_query_engine, # YOUR CODE HERE,
    name="sql_query_tool",# Add a name here,
    description=(
        # Add a natural language description here
        """Useful for translating a natural language query into a SQL query over a table containing:
        barbie, containing information related to reviews of the Barbie movie
        oppenheimer, containing information related to reviews of the Oppenheimer movie
        """
    ),

)

In [36]:
agent = OpenAIAgent.from_tools(
    [sql_tool], llm=llm, verbose=True### YOUR CODE HERE
)

In [37]:
response = agent.chat("What is the average rating of the two films?")

=== Calling Function ===
Calling function: sql_query_tool with args: {
  "input": "SELECT AVG(rating) AS average_rating FROM barbie UNION SELECT AVG(rating) AS average_rating FROM oppenheimer"
}
Got output: The average rating for Barbie is 7.36, while the average rating for Oppenheimer is 8.35.


In [38]:
print(str(response))

The average rating for the Barbie film is 7.36, while the average rating for the Oppenheimer film is 8.35.


### Combining The Tools Together

Now, we can simple add our tools into the `OpenAIAgent`, and off we go!

In [39]:
barbenheimer_agent = OpenAIAgent.from_tools(
    [auto_retrieve_tool, sql_tool], llm=llm, verbose=True ### YOUR CODE HERE
)
# barbenheimer_agent = OpenAIAgent.from_tools(
#     [auto_retrieve_tool], llm=llm, verbose=True ### YOUR CODE HERE
# )

In [40]:
response = barbenheimer_agent.chat("What is the lowest rating of the two films - and can you summarize what the reviewer said?")

=== Calling Function ===
Calling function: sql_query_tool with args: {
  "input": "SELECT MIN(rating) AS lowest_rating FROM barbie UNION SELECT MIN(rating) AS lowest_rating FROM oppenheimer"
}
Got output: The lowest rating among the Barbie and Oppenheimer products is 3.0.
=== Calling Function ===
Calling function: sql_query_tool with args: {
  "input": "SELECT review FROM barbie WHERE rating = 3.0 UNION SELECT review FROM oppenheimer WHERE rating = 3.0"
}
Got output: Some of the reviews for movies with a rating of 3.0 include: "And the only entertaining part is two hours in when someone 'passes gas'", "Boring movie, which is mostly political court room scenes", "First of all- they left out the science and the interesting problem solving when inventing the a-bomb", and "While the production value, cinematography and acting were what you would expect from Margo Robbie and the supporting cast. I felt the movie fell short in terms of the easily impressionable. Come to find out my daughter 

In [41]:
print(str(response))

The lowest rating among the two films is 3.0. Here are some summarized reviews from the reviewers:

- "And the only entertaining part is two hours in when someone 'passes gas'"
- "Boring movie, which is mostly political court room scenes"
- "First of all- they left out the science and the interesting problem solving when inventing the a-bomb"
- "While the production value, cinematography and acting were what you would expect from Margo Robbie and the supporting cast. I felt the movie fell short in terms of the easily impressionable. Come to find out my daughter is better suited to take on the world that I thought she was. Proud day for me as a father."


In [47]:
response = barbenheimer_agent.chat("How many times do the Barbie reviews mention 'Ken'? Tell me what happens (briefly) in the Barbie movie.?")

=== Calling Function ===
Calling function: sql_query_tool with args: {
  "input": "SELECT COUNT(*) AS ken_mentions FROM barbie WHERE review LIKE '%Ken%'"
}
Got output: There are 30 mentions of Ken in the reviews for Barbie.
=== Calling Function ===
Calling function: auto-retrieve-tool with args: {
  "query": "Barbie movie summary",
  "filter_key_list": ["title"],
  "filter_value_list": ["Barbie (film)"]
}
Got output: The Barbie film is a live-action fantasy comedy directed by Greta Gerwig. It follows the journey of Barbie and Ken as they navigate an existential crisis and embark on a journey of self-discovery. The film takes place in Barbieland, a matriarchal society where different variations of Barbies and Kens reside. Barbie and Ken encounter various challenges and ultimately work together to challenge societal expectations and rectify the faults of their previous society. The film received critical acclaim and became the second highest-grossing film of 2023.


In [48]:
print(str(response))

In the Barbie movie, Ken is mentioned 30 times in the reviews. The movie is a live-action fantasy comedy directed by Greta Gerwig. It follows the story of Barbie and Ken as they navigate an existential crisis and embark on a journey of self-discovery. The movie is set in Barbieland, a matriarchal society where different variations of Barbies and Kens reside. Throughout the film, Barbie and Ken face various challenges and work together to challenge societal expectations and address the flaws of their previous society. The movie received critical acclaim and was the second highest-grossing film of 2023.


In [44]:
response = barbenheimer_agent.chat("How many times do the Barbie reviews mention 'Ken', and what is a summary of his character in the Barbie movie?")

=== Calling Function ===
Calling function: sql_query_tool with args: {
  "input": "SELECT COUNT(*) AS ken_mentions FROM barbie WHERE review LIKE '%Ken%'"
}
Got output: There are 30 mentions of Ken in the reviews for Barbie.
=== Calling Function ===
Calling function: auto-retrieve-tool with args: {
  "query": "Ken character summary in Barbie movie",
  "filter_key_list": ["title"],
  "filter_value_list": ["Barbie (film)"]
}
Got output: Ken is a character in the Barbie movie. He is portrayed as having low self-esteem and seeking approval from Barbie. The film explores the negative consequences of hierarchical power structures, with Ken being depicted as part of an underclass compared to the ruling Barbies. Ken has a power ballad in the film, which is seen as a moment where the movie transcends traditional expectations for a Barbie film.


In [46]:
print(str(response))

In the Barbie movie, Ken is mentioned 30 times in the reviews. He is portrayed as a character with low self-esteem who seeks approval from Barbie. The film explores the negative consequences of hierarchical power structures, with Ken being depicted as part of an underclass compared to the ruling Barbies. Ken's character is highlighted in a power ballad, which is seen as a moment where the movie goes beyond traditional expectations for a Barbie film.
