## 進階機器學習 HW4-4
### Implement a simple question-answering (QA) system using the Chroma vector database. Provide a brief demonstration of its functionality. 
* 學號: B103040047
* 姓名: 周安

### ChromaDB function: 
embedding function + store + similarity search

### Create a local DB in this directory, and a collection (like a datasheet in DB)

In [26]:
import chromadb
# client = chromadb.Client() # In-memory DB
client = chromadb.PersistentClient(path="HW4_4_db/")

In [172]:
client.list_collections()

[Collection(name=IMDb_Movie_Reviews)]

### Delete this collection and reinitialize the collection if you want to clean this collection.
If user really wants to free some unused disk space, it should do the `VACUUM` to whole DB. 

In [None]:
# client.delete_collection(name="IMDb_Movie_Reviews")
# client.list_collections()
# client.delete_collection(name="IMDb_Movie_Reviews2")
# client.list_collections()

[Collection(name=IMDb_Movie_Reviews)]

### Create Embedding function

In [66]:
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from sentence_transformers import SentenceTransformer
# download the model
# embedding_model = SentenceTransformer("all-MiniLM-L6-v2")
embedding_model = SentenceTransformer("paraphrase-mpnet-base-v2")
embedding = embedding_model.encode("This is a test sentence.")
# print(embedding[:100])
sentence_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
                    model_name="paraphrase-mpnet-base-v2"
                )

### Get the collection

In [67]:
collection = client.get_or_create_collection("IMDb_Movie_Reviews", embedding_function=sentence_ef)
client.list_collections()
print("Number of record in this collection:", collection.count())

Number of record in this collection: 0


### Created Colloections detail
#### IMDb_Movie_Reviews : include movie title , review, description
* dataset resource: [IMDb Movie Reviews Genres Description and Emotions](https://www.kaggle.com/datasets/fahadrehman07/movie-reviews-and-emotion-dataset/data)
---

### Preprocess the dataset.
#### Columns description:
* Rating: Indicates the score that critics have assigned to a film.
* Reviews: Written analyses of the films that share thoughts and observations.
* movie_name: The names or titles of the films that were examined for reviews.
* Resenhas: Translations of reviews into other languages.
* Genres: The division of films into various groups.
* Descriptions: Written analyses of the films.
* Emotions: Each description expresses an emotional tone.
  
#### Plan to do:
* remove the `Resenhas` column.
* Extract 1000 rows to add into DB.
* Turn all fields into several documents for each entry.
* apply some actions on each document:
    * string is converted into lower case.
    * delete redundant spaces
    * Remove 'See full summary' and everything after it

In [68]:
import pandas as pd


# Load the CSV file
df = pd.read_csv("Movies_Reviews_modified_version1.csv")

# Drop unnecessary columns
df = df.drop(columns=['Unnamed: 0', 'Resenhas'])

# Shuffle the dataframe and get 1000 rows, set the random state for reproducibility
df_shuffled = df.sample(n=1000, random_state=45)

# Display the shuffled dataframe
df_shuffled.head()

Unnamed: 0,Ratings,Reviews,movie_name,genres,Description,emotion
15696,10.0,Saw a advance screening of this on Friday nigh...,Wedding Crashers,"['Comedy', 'Romance']","John Beckwith and Jeremy Grey, a pair of commi...",sadness
9014,2.0,Even with the awesome Harvey Keitel this movie...,The Last Man,['Comedy'],"After a tour of duty in the Philippines, Major...",sadness
17350,7.0,I was fortunate enough to see this movie on pr...,The Guardian,"['Action', 'Adventure', 'Drama']",The Guardian is a love story about an Olympic ...,sadness
7997,4.0,When a novel as renowned as Charles Dickens' G...,Obsession,"['Drama', 'Thriller', 'Mystery']",A returning vet attending college falls in lov...,joy
21018,5.0,"I went into ""Julie & Julia"" with big expectati...",Julie & Julia,"['Romance', 'Drama']",Julia Child's story of her start in the cookin...,anticipation


#### Goal of preprcessing the input documents to DB:
The goal of this code is to break down each row of a DataFrame (df) into individual components that will be added to a Chroma DB collection. The key components include:

* Documents: Each row is split into a list of textual data (documents). These could be individual sentences or other pieces of information extracted from the row.

* Metadatas: The metadata associated with each piece of data. Metadata can include any additional context or information about the data, such as its source, type, or any other properties relevant to the record.

* IDs: A unique identifier (id) for each row of data, typically used to track the document in the database and retrieve it later.

In [74]:
import re

# Define the order of columns to be processed
field_in_documents = ["Description", "Reviews", "emotion"]

def clean_text(text):
    """
    Clean and normalize a text string:
    - Convert to lowercase
    - Remove 'See full summary' and everything after it
    - Remove extra whitespace and leading/trailing spaces
    """
    text = str(text).lower()  # Convert text to lowercase
    text = re.sub(r"see full summary.*", "", text)  # Remove 'See full summary' and following content
    text = re.sub(r"\s+", " ", text).strip()  # Replace multiple spaces with one, and trim
    return text

def preprocess_row(row, doc_id):
    """
    Process a single dataframe row:
    - Clean each text field
    - Prepare corresponding metadata for each field
    - Assign unique IDs for each piece of content
    Return:
    - List of cleaned data (per field)
    - List of metadata dictionaries (per field)
    - List of document IDs (per field)
    """
    data = []
    metadata = []
    id_str = []

    for col_name in field_in_documents:
        cleaned_text = clean_text(row[col_name])  # Clean text content
        data.append(cleaned_text)

        # Prepare metadata for this document piece
        meta_dict = {
            # "source": "internal",
            "name": row["movie_name"],
            "type": col_name,
            "genre": row["genres"],
            "rating": str(row["Ratings"]),
            "doc_id": f"doc{doc_id}"
        }

        metadata.append(meta_dict)
        id_str.append(f"doc{doc_id}_{col_name}")  # Generate unique ID for this piece

    return data, metadata, id_str

In [75]:
len(df_shuffled)

1000

### Add data into DB.

In [77]:
documents = []
metadatas = []
ids = []
total_cases = len(df_shuffled)
for index, row in df_shuffled.iterrows():
    data, metadata, id_str = preprocess_row(row, index)
    # prepare the db input
    documents.extend(data)
    metadatas.extend(metadata)
    ids.extend(id_str)
    # print(documents, metadatas, ids)
    # break
    # load into db every 32 cases of data
    if index % 32 == 31 or index+1 == total_cases:
        # add into collection
        collection.add(
            documents=documents,
            metadatas=metadatas,
            ids=ids
        )
        # clear the data for next round
        documents = []
        metadatas = []
        ids = []
print("Number of record in this collection:", collection.count())

Number of record in this collection: 3000


In [114]:
def pretty_print_results(results):
    """
    Format and print query results in a readable way.
    """
    if results:
        print(f"Found {len(results['documents'][0])} matching results:\n")
        
        for i, doc in enumerate(results['documents'][0]):
            # Extract metadata for the current document (assuming it's a list of metadata dictionaries)
            metadata = results['metadatas'][0][i]  # Metadata is inside a list
            
            print(f"Result {i+1}:")
            print(f"  Movie Name: {metadata.get('name', 'N/A')}")
            print(f"  \033[96mType: {metadata.get('type', 'N/A')}\033[0m")
            print(f"  Genre: {metadata.get('genre', 'N/A')}")
            print(f"  Rating: {metadata.get('rating', 'N/A')}")
            print(f"  doc_id: {metadata.get('doc_id', 'N/A')}")
            print(f"  \033[93mDocument: {doc}\033[0m")
            print(f"  Distance: {results['distances'][0][i]:.4f}")  # Print similarity distance
            print('-' * 50)
    else:
        print("No results found.")


## After finish creating the DB, it can start to query this db.
---
從資料集取出1000筆資料放入資料庫，此資料集具有以下幾種類別:**"movie_name", "genres", "Description", "Reviews", "emotion", "Ratings"**。  
會被比較的embedding vector是由其中三種類別去做("Description", "Reviews", "emotion")  
可以針對這些類別去選擇問題詢問，並會回答一些跟問題最相關的前3個或更多個答案。  
`pretty_print_results`會秀出被選的答案的`metadata`(包含"movie_name", "document type", "genre", "rating", "doc_id", "Document(**Answer**)", "similarity distance")  
詳細實作方法如下所示:

In [203]:
results = collection.query(
    query_texts=["Which movie descriptions are closest in meaning to 'A heartwarming story of friendship'?"],
    n_results=3,
    where={"type": "Description"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: One Day
  [96mType: Description[0m
  Genre: ['Drama', 'Romance']
  Rating: 9.0
  doc_id: doc24973
  [93mDocument: it is a meditative love story about two people who are drawn to each other at a time in their lives when circumstances seem to forbid them to ever be together. it is not a story of a ...[0m
  Distance: 0.9575
--------------------------------------------------
Result 2:
  Movie Name: Love
  [96mType: Description[0m
  Genre: ['Drama', 'Science Fiction']
  Rating: 4.0
  doc_id: doc28498
  [93mDocument: love explores the relationship between friends, family, and lovers. this story begins with the origins of a blossoming friendship between two 10-year-old girls. unfortunately, the childhood...[0m
  Distance: 0.9703
--------------------------------------------------
Result 3:
  Movie Name: Love
  [96mType: Description[0m
  Genre: []
  Rating: 9.0
  doc_id: doc20136
  [93mDocument: love explores the relationship between

In [204]:
results = collection.query(
    query_texts=["Which movie talks about the monster?"],
    n_results=3,
    where={"type": "Description", "type":"Reviews"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: The Monster
  [96mType: Reviews[0m
  Genre: ['Comedy', 'Horror', 'Science Fiction']
  Rating: 3.0
  doc_id: doc20796
  [93mDocument: i remember three years ago watching the trailer for this and being a little excited. it looked like an interesting creature feature, something we just don't have enough of these days.telling the story of a mother and daughter who are involved in a car crash in the middle of nowhere. when helps comes they learn that there is something sinister stalking them from the darkness.okay, solid concept what did they do with it? not much actually, in fact arguably the threat from the beastie isn't even the primary theme of the film. mother and daughter have issues, they don't get along due to the mothers poor life choices and the constant flashbacks dominate the film.i wanted a creature feature not a lifetime melodrama! sadly alike said melodramas we have serious character issues, the daughter is frustratingly a

In [205]:
results = collection.query(
    query_texts=["Which movie talks about the crime activity?"],
    n_results=3,
    where={"type": "Description", "type":"Reviews"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Joyride
  [96mType: Reviews[0m
  Genre: ['Comedy', 'Drama', 'Thriller']
  Rating: 3.0
  doc_id: doc28826
  [93mDocument: silly and ludicrous thriller (if one could say so!) about two young friends who robbered a beautiful car just to find out a corpse in the trunk. a ordinary premise that never hit the point. there's only one curiosity in this misfire: old adam (batman) west living a pimp.believe me, don't lose your precious time. i give this a 3 (three).[0m
  Distance: 0.9638
--------------------------------------------------
Result 2:
  Movie Name: Public Enemies
  [96mType: Reviews[0m
  Genre: ['Drama', 'Crime']
  Rating: 7.0
  doc_id: doc40766
  [93mDocument: in the words of john dillinger (johnny depp) spoken to his love interest billie frechette (marion cotillard, a good year), "i like baseball, movies, good clothes, fast cars... and you. what else do you need to know?" like a one legged pirate, "public enemies", helmed by

In [206]:
results = collection.query(
    query_texts=["Find movies with happy emotions and family themes."],
    n_results=3,
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Date Movie
  [96mType: Description[0m
  Genre: ['Comedy']
  Rating: 10.0
  doc_id: doc16796
  [93mDocument: spoof of romantic comedies which focuses on a man, his crush, his parents, and her father.[0m
  Distance: 0.9387
--------------------------------------------------
Result 2:
  Movie Name: The Legend of Zorro
  [96mType: Reviews[0m
  Genre: ['Action', 'Adventure']
  Rating: 9.0
  doc_id: doc16179
  [93mDocument: i am a mom of 5 children, so when i get a night out to see a movie i don't want to see a dud. when it is a date with your husband it is even tougher. it can't be too much of a chic flick and i don't want to see something too violent. it isn't easy to find a movie with some action (dad) some romance (mom) and some comedy (both). this movie did it. now i am not saying it should get some academy award but i sure wish there were more movies like this. it was suspenseful without being gory. it was romantic and sexy witho

In [207]:
results = collection.query(
    query_texts=["Tell me about movies that involve artificial intelligence."],
    n_results=3,
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Proteus
  [96mType: Reviews[0m
  Genre: ['Thriller', 'Horror', 'Science Fiction']
  Rating: 4.0
  doc_id: doc38738
  [93mDocument: this movie has all the hallmarks of a *great* stupid sci-fi flick... stupid scientists, dumb hi-tech gadgets, shape-shifting aliens who seduce their prey, expository videos labeled "watch me"... the list goes on. don't expect much from the plot or special effects... watch it for the cliches...[0m
  Distance: 1.0605
--------------------------------------------------
Result 2:
  Movie Name: The Pallbearer
  [96mType: Reviews[0m
  Genre: ['Comedy', 'Romance']
  Rating: 2.0
  doc_id: doc1404
  [93mDocument: no comment - stupid movie, acting average or worse... screenplay - no sense at all... skip it![0m
  Distance: 1.1314
--------------------------------------------------
Result 3:
  Movie Name: Eden
  [96mType: Reviews[0m
  Genre: ['Drama']
  Rating: 5.0
  doc_id: doc28198
  [93mDocument: plot loses

In [225]:
results = collection.query(
    query_texts=["Tell me the positive and suggestive review of movie whose rating is 6.0"],
    n_results=3,
    where={"rating":"6.0"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Joyride
  [96mType: Reviews[0m
  Genre: ['Comedy', 'Drama', 'Thriller']
  Rating: 6.0
  doc_id: doc28828
  [93mDocument: i really cannot believe the amount of negativity being directed at this movie! i really enjoyed it! amy hathaway is hot and fun to watch. she sort of reminds me of alicia silverstone in excess baggage. i was transfixed watching toby mcguire and benicio del toro, as i have been fans of both of theirs for years.. the movie actually has quite a bit of tension and suspense, and it works, because you really feel for these kids, and the situation they are stuck in. sure, there are some definite continuity errors, and the acting is far from perfect, at times, but somehow these don't detract from the storyline and the overall feel of the movie. i watched half of the movie last night, and was hooked enough to come back and finish it,today. a strong 5, but maybe even a 6, in my book. definitely worth a look, if you're into 

In [226]:
results = collection.query(
    query_texts=["Show me the emotion or description."],
    n_results=3,
    where={"name": "The Three Musketeers"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: The Three Musketeers
  [96mType: emotion[0m
  Genre: ['Action', 'Adventure', 'Romance']
  Rating: 4.0
  doc_id: doc19226
  [93mDocument: anger[0m
  Distance: 1.1996
--------------------------------------------------
Result 2:
  Movie Name: The Three Musketeers
  [96mType: emotion[0m
  Genre: ['Action', 'Adventure', 'Comedy']
  Rating: 7.0
  doc_id: doc1019
  [93mDocument: anger[0m
  Distance: 1.1996
--------------------------------------------------
Result 3:
  Movie Name: The Three Musketeers
  [96mType: emotion[0m
  Genre: ['Action', 'Adventure', 'Comedy', 'Drama']
  Rating: 7.0
  doc_id: doc15026
  [93mDocument: anger[0m
  Distance: 1.1996
--------------------------------------------------


說明: 可以看到他有針對問題的要求給予emotion和description，有請它在"Employee of the Month"這部電影範圍下去搜尋。

In [210]:
results = collection.query(
    query_texts=["Show me the emotion and description."],
    n_results=3,
    where={"name": "Employee of the Month"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Employee of the Month
  [96mType: emotion[0m
  Genre: ['Comedy', 'Romance']
  Rating: 7.0
  doc_id: doc17374
  [93mDocument: anger[0m
  Distance: 1.2178
--------------------------------------------------
Result 2:
  Movie Name: Employee of the Month
  [96mType: emotion[0m
  Genre: ['Comedy', 'Romance']
  Rating: 5.0
  doc_id: doc17370
  [93mDocument: anger[0m
  Distance: 1.2178
--------------------------------------------------
Result 3:
  Movie Name: Employee of the Month
  [96mType: Reviews[0m
  Genre: ['Comedy', 'Romance']
  Rating: 5.0
  doc_id: doc17370
  [93mDocument: the movie is about competition between two guys for the employee of the month. plus to sleep with jessica simpson.jessica simpson is a horrible actress. she looks blank; she does not show any expression at all. she's just in the movie for her cleavage show, but it is not worth it.dax shepard is real good and brings up a few laughs with efren ramirez. apar

說明: 這是將movie name寫在query中，也可以順利看到一些description

In [211]:
results = collection.query(
    query_texts=["Show me the description about 'Employee of the Month'."],
    n_results=3,
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Employee of the Month
  [96mType: Description[0m
  Genre: ['Comedy', 'Romance']
  Rating: 5.0
  doc_id: doc17370
  [93mDocument: a slacker competes with a repeat winner for the "employee of the month" title at work, in order to gain the affections of a new female employee.[0m
  Distance: 1.0414
--------------------------------------------------
Result 2:
  Movie Name: Employee of the Month
  [96mType: Description[0m
  Genre: ['Comedy', 'Romance']
  Rating: 7.0
  doc_id: doc17374
  [93mDocument: a slacker competes with a repeat winner for the "employee of the month" title at work, in order to gain the affections of a new female employee.[0m
  Distance: 1.0414
--------------------------------------------------
Result 3:
  Movie Name: Safety Not Guaranteed
  [96mType: Description[0m
  Genre: ['Comedy', 'Romance', 'Science Fiction', 'Drama']
  Rating: 2.0
  doc_id: doc26524
  [93mDocument: three magazine employees head out on an

說明: 以下例子顯示搜尋條件沒有限縮comedy genre，也可以找到有關comedy的結果!

In [212]:
results = collection.query(
    query_texts=["Show the interesting plot in comedy films"],
    n_results=3,
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Just Married
  [96mType: Reviews[0m
  Genre: ['Comedy']
  Rating: 2.0
  doc_id: doc9964
  [93mDocument: there are several laugh-out-loud moments in this film, and it is only because of these that it doesn't merit a 1/10 rating.the film is crude and crass. it tries to be very pc in its attitude to sex  lots of talk, the presence of sex aids, the newly-wed wife who has been a player but is still (supposedly) sweet and charming  but these efforts just add to the tedium and the unappealing nature of the two lead characters.the story plods along, ashton kutcher (the male lead) is irritating and brittany murphy is short. short of comedic talent and physically short; the disparity in the heights of the two leads is yet another distraction from the "action".by their nature, romantic comedies are predictable when it comes to the overall plot; boy meets girl, they have difficulties getting together, but all is well at the end. that's ok, bu

In [229]:
results = collection.query(
    query_texts=["Show the emotion in 'First Knight'."],
    n_results=3,
    where={"name":"First Knight"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: First Knight
  [96mType: Description[0m
  Genre: ['Action', 'Adventure', 'Drama', 'Romance']
  Rating: 3.0
  doc_id: doc353
  [93mDocument: lancelot falls in love with guinevere, who is due to be married to king arthur. meanwhile, a violent warlord tries to seize power from arthur and his knights of the round table.[0m
  Distance: 1.3957
--------------------------------------------------
Result 2:
  Movie Name: First Knight
  [96mType: Reviews[0m
  Genre: ['Action', 'Adventure', 'Drama', 'Romance']
  Rating: 3.0
  doc_id: doc353
  [93mDocument: how on earth did sean connery, richard gere, julie ormand, ben cross, sir john gielgud, and other actors of note, ever get roped into making this awful atrocious movie? the word "hokum" is undoubtedly a compliment in this case. surely it wasn't for money? nobody's that greedy or hard up. or did they all think it was a good idea at the time? if so, they were badly mistaken.the supposed nor

In [214]:
results = collection.query(
    query_texts=["Main location in 'The Three Musketeers'."],
    n_results=3,
    where={"name":"The Three Musketeers"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: The Three Musketeers
  [96mType: Description[0m
  Genre: ['Action', 'Adventure', 'Romance']
  Rating: 10.0
  doc_id: doc19305
  [93mDocument: france, 1625: young d'artagnan heads to paris to join the musketeers but the evil cardinal has disbanded them - save 3. he meets the 3, athos, porthos and aramis, and joins them on their quest to save the king and country.[0m
  Distance: 0.7673
--------------------------------------------------
Result 2:
  Movie Name: The Three Musketeers
  [96mType: Description[0m
  Genre: ['Action', 'Adventure', 'Romance']
  Rating: 8.0
  doc_id: doc19263
  [93mDocument: france, 1625: young d'artagnan heads to paris to join the musketeers but the evil cardinal has disbanded them - save 3. he meets the 3, athos, porthos and aramis, and joins them on their quest to save the king and country.[0m
  Distance: 0.7673
--------------------------------------------------
Result 3:
  Movie Name: The Three Musketee

In [215]:
results = collection.query(
    query_texts=["Main location in 'Rio'."],
    n_results=3,
    where={"name":"Rio"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Rio
  [96mType: Reviews[0m
  Genre: ['Crime']
  Rating: 9.0
  doc_id: doc35613
  [93mDocument: in rio de janeiro, the macaw baby blu is captured by dealers and smuggled to the united states of america. while driving through moose lake, minnesotta, the truck that is transporting blu has a minor accident that drops the box where he is trapped on the road. the girl linda finds the bird and raises him with love. fifteen years later, blu is a domesticated and intelligent bird that does not fly and lives a comfortable life with the bookshop owner linda. out of the blue, the clumsy brazilian ornithologist tulio visits linda and explains that blu is the last male of his species alive and he has a female called jewel in rio de janeiro. he invites linda to travel with blu to rio de janeiro to mate with jewel and save their species.linda travels with blu and tulio to rio de janeiro and they leave blue and jewel in a large cage in the institute

In [216]:
results = collection.query(
    query_texts=["Main character name in 'The Quick and the Dead'."],
    n_results=3,
    where={"name":"The Quick and the Dead"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: The Quick and the Dead
  [96mType: Reviews[0m
  Genre: ['Action', 'Drama', 'Western']
  Rating: 7.0
  doc_id: doc20613
  [93mDocument: sharon stone is another gun in the old west town it seems that she is here to pay off an old score that has haunted her since she was a child she becomes swept up in a deadly quick-draw contest where anybody can challenge anybody in the windy dusty streets the fighters must not draw until the clock makes the first chime of the hour whoever is standing after the draw is the winner the prize is $123,000 the lawless town of redemption is ruled by a despicable ironfisted gunman called john herod who takes a lot to scare him hackman plays pretty well the kind people hate he is, here, a fearless, sadistic, cold-blooded killer in charge of everything, who decides who lives or who dies herod wants a preacher in the tournament  even if he has to beat, kick, and knock him to the ground to force him back into 

In [217]:
results = collection.query(
    query_texts=["Key subjects in 'Little Women'."],
    n_results=3,
    where={"name":"Little Women"}
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Little Women
  [96mType: Reviews[0m
  Genre: ['Drama', 'Family', 'Romance']
  Rating: 8.0
  doc_id: doc11144
  [93mDocument: i saw little women tonight courtesy of a preview at my local theater. it was cleverly written by greta gerwig especially her use of flashbacks to tell the traditional story interspersed with a later one instead of a strictly linear retelling. the entire cast was outstanding with saoirse ronan leading the way. definitely worth paying for![0m
  Distance: 0.8596
--------------------------------------------------
Result 2:
  Movie Name: Little Women
  [96mType: Reviews[0m
  Genre: ['Drama', 'Family', 'Romance']
  Rating: 9.0
  doc_id: doc11146
  [93mDocument: the newest adaptation of louisa may alcott's classic novel, "little women," was the best version i have ever seen. directed by greta gerwig, the film stars saoirse ronan as jo march, emma watson as meg, florence pugh as amy, eliza scanlen as beth, and lau

In [218]:
results = collection.query(
    query_texts=["Which reviews has the most intense feeling?"],
    n_results=5,
)

pretty_print_results(results)

Found 5 matching results:

Result 1:
  Movie Name: Pride and Prejudice and Zombies
  [96mType: Reviews[0m
  Genre: ['Romance', 'Horror', 'Comedy', 'Thriller']
  Rating: 9.0
  doc_id: doc40492
  [93mDocument: my disclaimer is this: i tend to rate a bit high because i rate almost purely on how much i enjoyed a film; i'm as far from a critic or 'movie snob' as one could possibly be. my rating reflects pure enjoyment and if it's worth spending the money to see at the theater. in my opinion, it is!the movie is truly pride and prejudice...and zombies; it follows the basic outline of the p&p novel, even some of the same dialog, but throws in zombies. personally, i loved the book and i loved the keira knightly film and i will admit it took me about 10 minutes or so to adjust to what i was seeing; but i quickly became immersed. the action is very well done and if you're a fan of women being in charge in an action movie then you will really like this!if you're a fan of matt smith then this is

In [219]:
results = collection.query(
    query_texts=["what movie is about future?"],
    n_results=3,
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Safety Not Guaranteed
  [96mType: Reviews[0m
  Genre: ['Comedy', 'Romance', 'Science Fiction', 'Drama']
  Rating: 10.0
  doc_id: doc26540
  [93mDocument: with a budget of perhaps 10 box tops from some cocoa puffs, this film managed to be the most entertaining thing i've seen all year. the entire cast fits together and play off each other in a delightful way and the ending is great without being sappy sweet or maudlin.i don't want to get into the plot, but aubrey plaza's debut as a lead actress is right on target and i see much success for her in the future in a lot more movies. mark duplass and jake johnson are excellent, and that new kid karan soni...well, you'll be seeing more of him in the future is my guess.this is a film i would see again and again, each time it appears in the future. i would take a date to it, i would take kids to it, i would go by myself.i would see it in a box, i would see it with a fox, i would see it at yo

In [220]:
results = collection.query(
    query_texts=["Show some discussions in romantic comedies."],
    n_results=3,
    where={"genre": "['Romance']"}
    
)

pretty_print_results(results)

Found 3 matching results:

Result 1:
  Movie Name: Hero
  [96mType: Reviews[0m
  Genre: ['Romance']
  Rating: 5.0
  doc_id: doc39923
  [93mDocument: in this romantic action film, the daughter of an inspector general (played by athiya shetty) and the son of a notorious criminal (played by sooraj pancholi) fall in love. naturally, there is no easy path for such a romance.from the clips and descriptions, i expected a much more romantic and emotional film, and was disappointed with the actual plot. shetty and pancholi, both in debut roles, had good chemistry. pancholi was excellent, but i didn't feel that shetty gave a good performance. her unappealing character made a dramatic and unbelievable turn around from spoiled brat to sensitive adult, showing a weakness in story writing. the fight scenes were well-done, but the choreography and music were mediocre (with the exception of the fantastic "main hoon hero tera"). honestly, the best part of this average movie was during the credits, w

### Try to delete every record in collection.

In [221]:
# delect record by its id
def delete_collection(record_id, coll=collection):
    coll.delete(
        ids=[record_id]
    )

In [222]:
# print("Before deleting; Number of record in this collection:", collection.count())
# for index, row in df_shuffled.iterrows():
#     for col_name in field_in_documents:
#         delete_collection("doc" + str(index) + "_" + col_name, collection)
# print("After deleting; Number of record in this collection:", collection.count())

#### Free some not used space in db.

In [223]:
# import sqlite3

# conn = sqlite3.connect("HW4_4_db/chroma.sqlite3")
# cur = conn.cursor()

# cur.execute("VACUUM")
# conn.commit()
# conn.close()