# Retrieve & Re-Rank Demo over the dataset

This examples demonstrates the Retrieve & Re-Rank Setup and allows to search over a dataset.
You can input a query with a movie title and plot. The script then uses semantic search to find relevant search results from the dataset.

For semantic search, we use `SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')` and retrieve five suggestions as results that are relevant to the input query.

Next, we use a more powerful CrossEncoder (`cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')`) that
scores the query and all retrieved passages for their relevancy. The cross-encoder further boost the performance,
especially when you search over a corpus for which the bi-encoder was not trained for.


In [None]:
import pandas as pd #importing the pandas module for loading the dataset into pandas dataframe

# Loading the CSV file into a DataFrame
df = pd.read_csv('/content/wiki_movie_plots_deduped.csv', nrows=1000)

# Display the first thousand rows of the DataFrame
print(df.head())

   Release Year                             Title Origin/Ethnicity  \
0          1901            Kansas Saloon Smashers         American   
1          1901     Love by the Light of the Moon         American   
2          1901           The Martyred Presidents         American   
3          1901  Terrible Teddy, the Grizzly King         American   
4          1902            Jack and the Beanstalk         American   

                             Director Cast    Genre  \
0                             Unknown  NaN  unknown   
1                             Unknown  NaN  unknown   
2                             Unknown  NaN  unknown   
3                             Unknown  NaN  unknown   
4  George S. Fleming, Edwin S. Porter  NaN  unknown   

                                           Wiki Page  \
0  https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...   
1  https://en.wikipedia.org/wiki/Love_by_the_Ligh...   
2  https://en.wikipedia.org/wiki/The_Martyred_Pre...   
3  https://en.wikipedia.

In [None]:
import pandas as pd #loading pandas module for displaying with help of data frame

#Reading the first 1000 rows and only 'Title' and 'Plot' columns. We need those only so that we can query them and get the results
df = pd.read_csv('/content/wiki_movie_plots_deduped.csv', usecols=['Title', 'Plot'], nrows=1000)
df.head()

Unnamed: 0,Title,Plot
0,Kansas Saloon Smashers,"A bartender is working at a saloon, serving dr..."
1,Love by the Light of the Moon,"The moon, painted with a smiling face hangs ov..."
2,The Martyred Presidents,"The film, just over a minute long, is composed..."
3,"Terrible Teddy, the Grizzly King",Lasting just 61 seconds and consisting of two ...
4,Jack and the Beanstalk,The earliest known adaptation of the classic f...


Just like the previous part Semantic Search, Now we loaded the first 1000 data entries from the csv file into the dataframe. We need only the title and plot columns for performing the operations, so we loaded them in this format "Sno. Title Plot".

First let us go with saving the above dataframe into a seperate dataset to send it to the model for converting them into embeddings. We require only two cououmns but there are various other coloumns and different data types. It would be very hard to give it to the model, so let us create another dataset with only required coloumns: title and plot. And then we will pass that to a model for creating embeddings and then calculate semantic searches using cosine similarities.

In [None]:
import pandas as pd

# Specifying the path of the new CSV file with two columns and first thousand entries
output_file = "first_1000_entries_dataset.csv"

# Saving the DataFrame to a new CSV file
df.to_csv(output_file, index=False)

print("DataFrame saved to", output_file)

DataFrame saved to first_1000_entries_dataset.csv


Now, we have the dataset with 1000 entries and there are only two columns-- Title and Plot. Now as we are having the datasets, for converting into embeddings from the dataset.

For getting those methods, we need to intsall required libraries and import all the necessary methods to complete the coding for to implement this model with the functionality.

In [None]:
!pip install -U sentence-transformers rank_bm25

Collecting sentence-transformers
  Downloading sentence_transformers-2.6.1-py3-none-any.whl (163 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/163.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m153.6/163.3 kB[0m [31m5.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.3/163.3 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting rank_bm25
  Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m65.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.1.105

In [None]:
!pip install pandas



Now we have installed all the libraries. Let us start coding!!!

Now in part 1, we have used the semantic search for getting the top five relevant search results with movie titles and plots from an input query with one movie plot.

Now let us use BM25 model which searches the relevant results with semantic search and get improved results. In below codes, we are going to apply BM25 logic with semantic search concept and gives the top 5 most relevant search results from the plot given as input query

In [None]:
#Python script for applying "BM25" logic
import pandas as pd #importing modules and libraries
from rank_bm25 import BM25Okapi

#loading csv dataset
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining 'Title' and 'Plot' into a single string for each entry
df['combined'] = df['Title'] + ": " + df['Plot']
texts = df['combined'].tolist()

#tokenizing the texts for BM25 model
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#BM25 search function
def bm25_search(query, N=5):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]  # retrieve top 5 relevant results
    return [(texts[idx], idx) for idx in top_indices]

#inputting query from dynamically from runtime
query = input("Enter your query here: ")

#retrieving results
top_results = bm25_search(query)

#printing the results
for result, idx in top_results:
    title, plot = result.split(": ", 1)
    print(f'{title}: "{plot}", Index: {idx}\n')

Enter your query here: “Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions
I Do: "The Boy meets and marries The Girl. A year later, the two walk down the street with a baby carriage carrying a bottle instead of a baby when they run into The Girl's brother who asks the couple to do him a favor and babysit his children. They accept and the remainder of the short consists of gags showcasing the difficulties of babysitting children. At the very end, The Boy discovers some knitted baby clothes in a drawer (implying that The Girl is pregnant).", Index: 386

Tarzan of the Apes: "John and Alice Clayton, Lord and Lady Greystoke (True Boardman and Kathleen Kirkham), are passengers on the Fuwalda, a ship bound for Africa. When the vessel is taken over by mutineers the sailor Binns (George B. French) saves them from being murdered, but they are marooned on the tropical coast. After their deaths their infant son is adopted by Kala, an ape, who raises him as her o

We added BM25 method with the existing Semantic Search conceptsw to get more relevant results. It is most widely used for getting the relevant search results.

Compared to the traditional searches, it uses frequency of each term and limits it to certain value and if it over repeats then it will make sure that relevant score is not affected by the frequency.

It also considers the rarity of the documents and also nomalizes the document length by cutting of the extra length.

In [None]:
#Python script for applying "BM25" logic
import pandas as pd #importing modules and libraries
from rank_bm25 import BM25Okapi

#loading csv dataset
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining 'Title' and 'Plot' into a single string for each entry
df['combined'] = df['Title'] + ": " + df['Plot']
texts = df['combined'].tolist()

#tokenizing the texts for BM25 model
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#BM25 search function
def bm25_search(query, N=5):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]  # retrieve top 5 relevant results
    return [(texts[idx], idx) for idx in top_indices]

#inputting query from dynamically from runtime
query = input("Enter your query here: ")

#retrieving results
top_results = bm25_search(query)

#printing the results
for result, idx in top_results:
    title, plot = result.split(": ", 1)
    print(f'{title}: "{plot}", Index: {idx}\n')

Enter your query here: Western romance
The Call of the Wild: "A white girl (Florence Lawrence) rejects a proposal from an Indian brave (Charles Inslee) in this early one-reel Western melodrama. Despite the rejection, the Indian still comes to the girl's defense when she is abducted by his warring tribe. In her first year in films, Florence Lawrence was already the most popular among the Biograph Company's anonymous stock company players. By 1909, she was known the world over as "The Biograph Girl."", Index: 19

Wild and Woolly: "As described in a film magazine review,[1] Jeff Hillington (Fairbanks), son of railroad magnate Collis J. Hillington (Bytell), tires of the East and longs for the wild and woolly West. He has his apartment and office fixed up in his understanding of the accepted Western style, which he has gleaned from dime novels. A delegation from Bitter Creek comes to New York City seeking financial backing for the construction of a spur line, and go to Collis to explain the

In [None]:
#Python script for applying "BM25" logic
import pandas as pd #importing modules and libraries
from rank_bm25 import BM25Okapi

#loading csv dataset
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining 'Title' and 'Plot' into a single string for each entry
df['combined'] = df['Title'] + ": " + df['Plot']
texts = df['combined'].tolist()

#tokenizing the texts for BM25 model
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#BM25 search function
def bm25_search(query, N=5):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]  # retrieve top 5 relevant results
    return [(texts[idx], idx) for idx in top_indices]

#inputting query from dynamically from runtime
query = input("Enter your query here: ")

#retrieving results
top_results = bm25_search(query)

#printing the results
for result, idx in top_results:
    title, plot = result.split(": ", 1)
    print(f'{title}: "{plot}", Index: {idx}\n')

Enter your query here: "Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo
Sahara: "Silent film femme fatale, Louise Glaum, portrays the role of Mignon, a Parisian music hall celebrity. Mignon marries a young American civil engineer, John Stanley, portrayed by Matt Moore. Stanley is transferred to Egypt to work on an engineering project in the Sahara. Mignon and her son, portrayed by Pat Moore, join Stanley in the desert.[3][4] Unhappy with life in the desert, Mignon leaves Stanley and her son in the desert and moves to Cairo with the wealthy Baron Alexis, portrayed by Edwin Stevens. Mignon lives in Baron Alexis' palace while Stanley goes blind and becomes addicted to the drug hasheesh. Mignon later encounters Stanley and her son, who have become beggars in the streets of Cairo.[3][4] Mignon returns to the desert to care for her husband, and the two are reconciled.", Index: 293

He Who G

In [None]:
#Python script for applying "BM25" logic
import pandas as pd #importing modules and libraries
from rank_bm25 import BM25Okapi

#loading csv dataset
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining 'Title' and 'Plot' into a single string for each entry
df['combined'] = df['Title'] + ": " + df['Plot']
texts = df['combined'].tolist()

#tokenizing the texts for BM25 model
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#BM25 search function
def bm25_search(query, N=5):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]  # retrieve top 5 relevant results
    return [(texts[idx], idx) for idx in top_indices]

#inputting query from dynamically from runtime
query = input("Enter your query here: ")

#retrieving results
top_results = bm25_search(query)

#printing the results
for result, idx in top_results:
    title, plot = result.split(": ", 1)
    print(f'{title}: "{plot}", Index: {idx}\n')

Enter your query here: Comedy film, office disguises, boss's daughter, elopement
The Boy Friend: "Comedy about a small-town girl unhappy with her family, and a boy trying to please her by throwing a big party.", Index: 583

Mabel's Blunder: "Mabel's Blunder tells the tale of a young woman who is secretly engaged to the boss's son.[1] The young man's sister comes to visit at their office, and a jealous Mabel, not knowing who the visiting woman is, dresses up as a (male) chauffeur to spy on them.", Index: 87

Bucking Broadway: "As described in a film magazine,[3] Cheyenne Harry (Carey), one of the cowboys on a ranch in Wyoming, falls in love with Helen (Malone), his boss's daughter. She decides to elope to the city with Captain Thornton (Pegg), a wealthy visitor to the ranch from New York. Cheyenne and Helen's father (Wells) are downhearted. Cheyenne, devastated by the loss of his fiance, decides to go to the city to rescue her, and finds Thorton giving a dinner party in a hotel about to

In [None]:
#Python script for applying "BM25" logic
import pandas as pd #importing modules and libraries
from rank_bm25 import BM25Okapi

#loading csv dataset
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining 'Title' and 'Plot' into a single string for each entry
df['combined'] = df['Title'] + ": " + df['Plot']
texts = df['combined'].tolist()

#tokenizing the texts for BM25 model
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#BM25 search function
def bm25_search(query, N=5):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]  # retrieve top 5 relevant results
    return [(texts[idx], idx) for idx in top_indices]

#inputting query from dynamically from runtime
query = input("Enter your query here: ")

#retrieving results
top_results = bm25_search(query)

#printing the results
for result, idx in top_results:
    title, plot = result.split(": ", 1)
    print(f'{title}: "{plot}", Index: {idx}\n')

Enter your query here: Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria
Cleopatra: "Because the film has been lost, the following summary is reconstructed from a description in a contemporary film magazine.
Cleopatra (Bara), the Siren of Egypt, by a clever ruse reaches Caesar (Leiber) and he falls victim to her charms. They plan to rule the world together, but then Caesar falls. Cleopatra's life is desired by the church, as the wanton woman's rule has become intolerable. Pharon (Roscoe), a high priest, is given a sacred dagger to take her life. He gives her his love instead and, when she is in need of some money, leads her to the tomb of his ancestors, where she tears the treasure from the breast of the mummy. With this wealth she goes to Rome to meet Antony (Hall). He leaves the affairs of state and travels to Alexandria with her, where they revel. Antony is recalled to Rome and married to Octavia (Bl

In [None]:
#Python script for applying "BM25" logic
import pandas as pd #importing modules and libraries
from rank_bm25 import BM25Okapi

#loading csv dataset
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining 'Title' and 'Plot' into a single string for each entry
df['combined'] = df['Title'] + ": " + df['Plot']
texts = df['combined'].tolist()

#tokenizing the texts for BM25 model
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#BM25 search function
def bm25_search(query, N=5):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]  # retrieve top 5 relevant results
    return [(texts[idx], idx) for idx in top_indices]

#inputting query from dynamically from runtime
query = input("Enter your query here: ")

#retrieving results
top_results = bm25_search(query)

#printing the results
for result, idx in top_results:
    title, plot = result.split(": ", 1)
    print(f'{title}: "{plot}", Index: {idx}\n')

Enter your query here: "Denis Gage Deane-Tanner
Captain Alvarez: "A melodrama about an American who becomes a revolutionary leader battling evil government spies in Argentina. William Desmond Taylor portrays the title role, and Denis Gage Deane-Tanner, Taylor's younger brother, is thought to have played the small role of a blacksmith.", Index: 67

One Week: "The story involves two newlyweds, Keaton and Seely, who receive a build-it-yourself house as a wedding gift. The house can be built, supposedly, in "one week". A rejected suitor secretly re-numbers packing crates. The movie recounts Keaton's struggle to assemble the house according to this new "arrangement". The end result is depicted in the picture. As if this were not enough, Keaton finds he has built his house on the wrong site and has to move it. The movie reaches its tense climax when the house becomes stuck on railroad tracks. Keaton and Seely try to move it out the way of an oncoming train, which eventually passes on the nei

** Let us calculate the values of @Recall and @MRR now.**

#Calculating the Recall values

Recall is defined as the how many actual relevant results were shown out of all actual relevant results for the query. Mathematically, this is given by:

Recall@k =        true postives@k/(true positives@k + true negatives@K)

1. Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

The top 5 results for this query are: I Do, Tarzan of the Apes, The Scar of Shame, Kiki, Hell Harbor

Out of these five results,
I Do- IRRELEVANT
Tarzan of the Apes- IRRELEVANT
The Scar of Shame- IRRELEVANT
Kiki- IRRELEVANT
Hell Harbor- IRRELEVANT
Out of five results, all are irrelevant      (1,2,3,4,5- irrelevant)

Recall@1 = 0/(1+2) = 0 = 0
Recall@2 = 0/(1+2) = 0 = 0
Recall@3 = 0/(1+2) = 0 = 0
Recall@4 = 0/(2+1) = 0 = 0
Recall@5 = 0/(3+0) = 0 = 0

2. Western romance

The top 5 results for this query are: The Call of the Wild,
 Wild and Woolly, Romance ,Four Songs, The Forbidden City

Out of these five results,
The Call of the Wild- RELEVANT
Wild and Woolly- RELEVANT
Romance- RELEVANT
Four Songs- RELEVANT
The Forbidden City- IRRELEVANT
Out of five results, first four are relevant    (1,2,3,4- relevant, 5-irrelevant)

Recall@1 = 1/(1+3) = 1/4 = 0.25
Recall@2 = 2/(2+2) = 2/4 = 0.50
Recall@3 = 3/(3+1) = 3/4 = 0.75
Recall@4 = 4/(4+0) = 4/4 = 1
We can stop as we got 1 at fourth step


3. Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo

The top 5 results for this query are: Sahara, He Who Gets Slapped, Broken Hearts of the Hollywood, Peacock Alley, True Heart Susie

Out of these five results,
Sahara- RELEVANT
He Who Gets Slapped- RELEVANT
Broken Hearts of the Hollywood- IRRELEVANT
Peacock Alley- IRRELEVANT
True Heart Susie- IRRELEVANT
Out of five results, two are relevant  (1,2- relevant,3,4,5 -irrelevant)

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 2/(2+0) = 2/2 = 1
We can stop as we got 1 at third step


4. Comedy film, office disguises, boss's daughter, elopement

The top 5 results for this query are: The Boy Friend, Mabel's Blunder, Bucking Broadway, Cruel Cruel Love, A Busy Day

Out of these five results.
The Boy Friend- IRRELEVANT
Mabel's Blunder- RELEVANT
Bucking Broadway- RELEVANT
Cruel Cruel Love- IRRELEVANT
A Busy Day- IRRELEVANT
Out of five results, two are relevant   (2,3-relevant, 1,4,5-irrelavant)

Recall@1 = 0/(0+2) = 0/2 = 0
Recall@2 = 1/(1+1) = 1/2 = 0.5
Recall@3 = 2/(2+0) = 2/2 = 1    
Since we got 1 at third step, we do not need to continue.

5. Lost film, Cleopatra charms Caesar, plots world rule, treasures from
mummy, revels with Antony, tragic end with serpent in Alexandria.

The top 5 results for this query are: Cleopatra, Mama's Affair, Peter's Pan, Madame X, The Equisite Thief

Out of these five results.
Cleopatra- RELEVANT
Mama's Affair- IRRELEVANT
Peter's Pan- IRRELEVANT
Madame X- IRRELEVANT
The Equisite Thief- IRRELEVANT  
Out of 5 results, only one is relevant (1-relevant, 2,3,4,5-irrelevant)

Recall@1 = 1/(1+0) = 1/1 = 1
Since we got 1 at first step, we do not need to continue.

6. Denis Gage Deane-Tanner

The top 5 results for this query are: Captain Alvarez, Old Lady 31, Number ,Please ?, Now or Never, The Story Ends Happily

Out of these five results.
Captain Alvarez- RELEVANT
One WeeK- IRRELEVANT
Old Lady 31 - IRRELEVANT
Number, Please? - IRRELEVANT
Now or Never - IRRELEVANT           (1 -relevant, 2,3,4,5 -irrelevant)

Recall@1 = 1/(1+0) = 1/1 = 1
Since we got 1 at first step, we do not need to continue.


** Calculating the MRR **

This metric is useful when we want our system to return the best relevant item and want that item to be at a higher position. Mathematically, this is given by:

To calculate MRR, we first calculate the reciprocal rank. It is simply the reciprocal of the rank of the first correct relevant result and the value ranges from 0 to 1.

1. Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

The top 5 results for this query are: I Do, Tarzan of the Apes, The Scar of Shame, Kiki, Hell Harbor

Out of these five results,
I Do- IRRELEVANT
Tarzan of the Apes- IRRELEVANT
The Scar of Shame- IRRELEVANT
Kiki- IRRELEVANT
Hell Harbor- IRRELEVANT
Out of five results, all are irrelevant      (1,2,3,4,5- irrelevant)

To calculate MRR, we first calculate the reciprocal rank. It is simply the reciprocal of the rank of the first correct relevant result and the value ranges from 0 to 1.

For this query, the reciprocal rank is
un defined and MRR = 0 (as the correct item does not exist)

So, we are considering only 2,3,4,5,6 queries as we must take only top five values for MRR. (first query MRR is 0)

2. Western romance

The top 5 results for this query are: The Call of the Wild,
 Wild and Woolly, Romance ,Four Songs, The Forbidden City

Out of these five results,
The Call of the Wild- RELEVANT
Wild and Woolly- RELEVANT
Romance- RELEVANT
Four Songs- RELEVANT
The Forbidden City- IRRELEVANT
Out of five results, first four are relevant    (1,2,3,4- relevant, 5-irrelevant)


For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


3. Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo

The top 5 results for this query are: Sahara, He Who Gets Slapped, Broken Hearts of the Hollywood, Peacock Alley, True Heart Susie

Out of these five results,
Sahara- RELEVANT
He Who Gets Slapped- RELEVANT
Broken Hearts of the Hollywood- IRRELEVANT
Peacock Alley- IRRELEVANT
True Heart Susie- IRRELEVANT
Out of five results, two are relevant  (1,2- relevant,3,4,5 -irrelevant)                     
For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


4. Comedy film, office disguises, boss's daughter, elopement

The top 5 results for this query are: The Boy Friend, Mabel's Blunder, Bucking Broadway, Cruel Cruel Love, A Busy Day

Out of these five results.
The Boy Friend- IRRELEVANT
Mabel's Blunder- RELEVANT
Bucking Broadway- RELEVANT
Cruel Cruel Love- IRRELEVANT
A Busy Day- IRRELEVANT
Out of five results, two are relevant   (2,3-relevant, 1,4,5-irrelavant)

For this query, the reciprocal rank is
1/2 and MRR = 2 (as the first correct item is at position 2.)


5. Lost film, Cleopatra charms Caesar, plots world rule, treasures from
mummy, revels with Antony, tragic end with serpent in Alexandria.

The top 5 results for this query are: Cleopatra, Mama's Affair, Peter's Pan, Madame X, The Equisite Thief

Out of these five results.
Cleopatra- RELEVANT
Mama's Affair- IRRELEVANT
Peter's Pan- IRRELEVANT
Madame X- IRRELEVANT
The Equisite Thief- IRRELEVANT  
Out of 5 results, only one is relevant (1-relevant, 2,3,4,5-irrelevant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

6. Denis Gage Deane-Tanner

he top 5 results for this query are: Captain Alvarez, Old Lady 31, Number ,Please ?, Now or Never, The Story Ends Happily

Out of these five results.
Captain Alvarez- RELEVANT
One WeeK- IRRELEVANT
Old Lady 31 - IRRELEVANT
Number, Please? - IRRELEVANT
Now or Never - IRRELEVANT           (1 -relevant, 2,3,4,5 -irrelevant)   

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

After calculating the individual MRR's, we need to calculate their mean to get the MRR for the problem. Here we are having 5 Queries, so

MMR Total = [MRR(Query2)+ MRR(Query3)+ MRR(Query4)+ MRR(Query5) + MRR(Query6)]/Total queries
= 1+1+2+1+1/5
= 6/5 = 1.2

Mean Reciprocal Rank for this data is 1.2

Now we will use the Re-ranked concept with the BM25 and Semantic search concepts to get more relevant results. In the above code fragments, we have applied BM25 model to the semantic search concept. Now we are going to retrieve them and combine with re-ranked methodology using cross encoders.

In [None]:
#python script for using semantic search along with BM25 and combining with Re-Ranker
#importing all the necessary libraries and modules
import pandas as pd
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi
import numpy as np

#loading the cleaned dataset with first thousand entries and with coloumns "Title" and "Plot"
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining elements of plot and title data into 'Combined' for performing operations easily
df['combined'] = df['Title'] + ": " + df['Plot']
#converting the 'combined' column to a list for easy further processing
texts = df['combined'].tolist()

#we are using the Bi-Encoder to encode all the data from the dataset so that we can use it with semantic search
#we are using the cross encoder for to rerank the lists to improve the quality
#This retrieve and re-ranking is special addition to the second part of the code with first part using only semantic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

#encoding the documents using the bi-encoder
document_embeddings = bi_encoder.encode(texts)

#setting up the BM25 model for Initial Retrieval. BM25 is popular ranking function used to get the relevance of the documents.
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#definig both bm25, search and rerank functions
def bm25_search(query, N=100):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]
    return top_indices, np.sort(doc_scores)[::-1][:N]

def rerank_with_cross_encoder(query, candidate_idxs):
    pairs = [[query, texts[idx]] for idx in candidate_idxs]
    scores = cross_encoder_model.predict(pairs)
    ranked_idxs = [x for _, x in sorted(zip(scores, candidate_idxs), key=lambda pair: pair[0], reverse=True)]
    return ranked_idxs[:5]  # Return the top 5 results

def search_and_rerank(query):
    candidates_idx, _ = bm25_search(query)
    top_5_idx = rerank_with_cross_encoder(query, candidates_idx)
    return [(texts[idx], idx) for idx in top_5_idx]  # Return texts and indices

#inputing the query dynamically from the user
query = input("Enter your query here: ")

#performing search and rerank. Getting only five relevant results
top_5_results = search_and_rerank(query)

#initializing an empty dictionary to store the formatted results
formatted_results = {}

for result, idx in top_5_results:
    title, plot = result.split(": ", 1)
    formatted_results[title] = {"Plot": plot, "Index": idx}

#printing the top five most relevant search results with their title, plots and indexes.
for title, info in formatted_results.items():
    print(f' {title}: "{info["Plot"]}", Index:{info["Index"]}\n')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.73k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/540 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/265M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/554 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Enter your query here: “Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions
 Mamba: "The film takes place in Neu Posen, German East Africa sometime before the First World War. "Mamba" is the name given to a South African snake. The reptile of this adventure is Auguste Bolte (played by Jean Hersholt), who is constantly reminding those with whom he has a chance to converse that he can buy anything. He neglects his appearance and does not even bother to shave or brush his hair. The German officers hold themselves aloof from him and the only individual he has an opportunity to talk to at length is his valet-secretary, a Cockney, who feeds his master with flattery. One afternoon Bolte recalls that he has received a letter asking for 200,000 marks from Count von Linden. The Count is in Germany and in a footnote it is written that Bolte might marry von Linden's daughter, Helen. The white people of the post have as little to do with Bolte as possible and the 

In [None]:
#python script for using semantic search along with BM25 and combining with Re-Ranker
#importing all the necessary libraries and modules
import pandas as pd
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi
import numpy as np

#loading the cleaned dataset with first thousand entries and with coloumns "Title" and "Plot"
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining elements of plot and title data into 'Combined' for performing operations easily
df['combined'] = df['Title'] + ": " + df['Plot']
#converting the 'combined' column to a list for easy further processing
texts = df['combined'].tolist()

#we are using the Bi-Encoder to encode all the data from the dataset so that we can use it with semantic search
#we are using the cross encoder for to rerank the lists to improve the quality
#This retrieve and re-ranking is special addition to the second part of the code with first part using only semantic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

#encoding the documents using the bi-encoder
document_embeddings = bi_encoder.encode(texts)

#setting up the BM25 model for Initial Retrieval. BM25 is popular ranking function used to get the relevance of the documents.
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#definig both bm25, search and rerank functions
def bm25_search(query, N=100):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]
    return top_indices, np.sort(doc_scores)[::-1][:N]

def rerank_with_cross_encoder(query, candidate_idxs):
    pairs = [[query, texts[idx]] for idx in candidate_idxs]
    scores = cross_encoder_model.predict(pairs)
    ranked_idxs = [x for _, x in sorted(zip(scores, candidate_idxs), key=lambda pair: pair[0], reverse=True)]
    return ranked_idxs[:5]  # Return the top 5 results

def search_and_rerank(query):
    candidates_idx, _ = bm25_search(query)
    top_5_idx = rerank_with_cross_encoder(query, candidates_idx)
    return [(texts[idx], idx) for idx in top_5_idx]  # Return texts and indices

#inputing the query dynamically from the user
query = input("Enter your query here: ")

#performing search and rerank. Getting only five relevant results
top_5_results = search_and_rerank(query)

#initializing an empty dictionary to store the formatted results
formatted_results = {}

for result, idx in top_5_results:
    title, plot = result.split(": ", 1)
    formatted_results[title] = {"Plot": plot, "Index": idx}

#printing the top five most relevant search results with their title, plots and indexes.
for title, info in formatted_results.items():
    print(f' {title}: "{info["Plot"]}", Index:{info["Index"]}\n')

Enter your query here: Western romance
 The General: "Western & Atlantic Railroad train engineer Johnnie Gray (Keaton) is in Marietta, Georgia to see one of the two loves of his life, his fiancée Annabelle Lee (Marion Mack)—the other being his locomotive, The General—when the American Civil War breaks out. He hurries to be first in line to enlist in the Confederate Army, but is rejected because he is too valuable in his present job; unfortunately, Johnnie is not told this reason and is forcibly ejected. On leaving, he runs into Annabelle's father and brother, who beckon to him to join them in line, but he sadly walks away, giving them the impression that he does not want to enlist. Annabelle coldly informs Johnnie that she will not speak to him again until he is in uniform.
A year passes, and Annabelle receives word that her father has been wounded. She travels north on the W&ARR with The General pulling the train to see him but still wants nothing to do with Johnnie. When the train ma

In [None]:
#python script for using semantic search along with BM25 and combining with Re-Ranker
#importing all the necessary libraries and modules
import pandas as pd
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi
import numpy as np

#loading the cleaned dataset with first thousand entries and with coloumns "Title" and "Plot"
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining elements of plot and title data into 'Combined' for performing operations easily
df['combined'] = df['Title'] + ": " + df['Plot']
#converting the 'combined' column to a list for easy further processing
texts = df['combined'].tolist()

#we are using the Bi-Encoder to encode all the data from the dataset so that we can use it with semantic search
#we are using the cross encoder for to rerank the lists to improve the quality
#This retrieve and re-ranking is special addition to the second part of the code with first part using only semantic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

#encoding the documents using the bi-encoder
document_embeddings = bi_encoder.encode(texts)

#setting up the BM25 model for Initial Retrieval. BM25 is popular ranking function used to get the relevance of the documents.
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#definig both bm25, search and rerank functions
def bm25_search(query, N=100):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]
    return top_indices, np.sort(doc_scores)[::-1][:N]

def rerank_with_cross_encoder(query, candidate_idxs):
    pairs = [[query, texts[idx]] for idx in candidate_idxs]
    scores = cross_encoder_model.predict(pairs)
    ranked_idxs = [x for _, x in sorted(zip(scores, candidate_idxs), key=lambda pair: pair[0], reverse=True)]
    return ranked_idxs[:5]  # Return the top 5 results

def search_and_rerank(query):
    candidates_idx, _ = bm25_search(query)
    top_5_idx = rerank_with_cross_encoder(query, candidates_idx)
    return [(texts[idx], idx) for idx in top_5_idx]  # Return texts and indices

#inputing the query dynamically from the user
query = input("Enter your query here: ")

#performing search and rerank. Getting only five relevant results
top_5_results = search_and_rerank(query)

#initializing an empty dictionary to store the formatted results
formatted_results = {}

for result, idx in top_5_results:
    title, plot = result.split(": ", 1)
    formatted_results[title] = {"Plot": plot, "Index": idx}

#printing the top five most relevant search results with their title, plots and indexes.
for title, info in formatted_results.items():
    print(f' {title}: "{info["Plot"]}", Index:{info["Index"]}\n')

Enter your query here: Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo
 Sahara: "Silent film femme fatale, Louise Glaum, portrays the role of Mignon, a Parisian music hall celebrity. Mignon marries a young American civil engineer, John Stanley, portrayed by Matt Moore. Stanley is transferred to Egypt to work on an engineering project in the Sahara. Mignon and her son, portrayed by Pat Moore, join Stanley in the desert.[3][4] Unhappy with life in the desert, Mignon leaves Stanley and her son in the desert and moves to Cairo with the wealthy Baron Alexis, portrayed by Edwin Stevens. Mignon lives in Baron Alexis' palace while Stanley goes blind and becomes addicted to the drug hasheesh. Mignon later encounters Stanley and her son, who have become beggars in the streets of Cairo.[3][4] Mignon returns to the desert to care for her husband, and the two are reconciled.", Index:293

 A Woman 

In [None]:
#python script for using semantic search along with BM25 and combining with Re-Ranker
#importing all the necessary libraries and modules
import pandas as pd
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi
import numpy as np

#loading the cleaned dataset with first thousand entries and with coloumns "Title" and "Plot"
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining elements of plot and title data into 'Combined' for performing operations easily
df['combined'] = df['Title'] + ": " + df['Plot']
#converting the 'combined' column to a list for easy further processing
texts = df['combined'].tolist()

#we are using the Bi-Encoder to encode all the data from the dataset so that we can use it with semantic search
#we are using the cross encoder for to rerank the lists to improve the quality
#This retrieve and re-ranking is special addition to the second part of the code with first part using only semantic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

#encoding the documents using the bi-encoder
document_embeddings = bi_encoder.encode(texts)

#setting up the BM25 model for Initial Retrieval. BM25 is popular ranking function used to get the relevance of the documents.
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#definig both bm25, search and rerank functions
def bm25_search(query, N=100):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]
    return top_indices, np.sort(doc_scores)[::-1][:N]

def rerank_with_cross_encoder(query, candidate_idxs):
    pairs = [[query, texts[idx]] for idx in candidate_idxs]
    scores = cross_encoder_model.predict(pairs)
    ranked_idxs = [x for _, x in sorted(zip(scores, candidate_idxs), key=lambda pair: pair[0], reverse=True)]
    return ranked_idxs[:5]  # Return the top 5 results

def search_and_rerank(query):
    candidates_idx, _ = bm25_search(query)
    top_5_idx = rerank_with_cross_encoder(query, candidates_idx)
    return [(texts[idx], idx) for idx in top_5_idx]  # Return texts and indices

#inputing the query dynamically from the user
query = input("Enter your query here: ")

#performing search and rerank. Getting only five relevant results
top_5_results = search_and_rerank(query)

#initializing an empty dictionary to store the formatted results
formatted_results = {}

for result, idx in top_5_results:
    title, plot = result.split(": ", 1)
    formatted_results[title] = {"Plot": plot, "Index": idx}

#printing the top five most relevant search results with their title, plots and indexes.
for title, info in formatted_results.items():
    print(f' {title}: "{info["Plot"]}", Index:{info["Index"]}\n')

Enter your query here: "Comedy film, office disguises, boss's daughter, elopement
 Ask Father: "Lloyd is a serious young middle-class guy on the make, who wants to marry the boss’ daughter. The problem is getting in to see the boss so that he can ask for her hand in marriage; the office is guarded by a bunch of comic, clumsy flunkies who throw everyone out who tries to get in. When Lloyd gets into the boss’ office, the latter uses trap doors and conveyor belts to expel him; Lloyd then goes to the costume company next door, tries to get in wearing drag (no success), and then in medieval armor – that works, since he bangs everyone over the head with his club, but then he finds out that the daughter has eloped with another suitor. Lloyd decides to be sensible and he settles for the cute switchboard operator (Daniels) instead. The film includes a brief wall climbing sequence. Light-hearted, short, fast-paced.", Index:253

 Bucking Broadway: "As described in a film magazine,[3] Cheyenne Har

In [None]:
#python script for using semantic search along with BM25 and combining with Re-Ranker
#importing all the necessary libraries and modules
import pandas as pd
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi
import numpy as np

#loading the cleaned dataset with first thousand entries and with coloumns "Title" and "Plot"
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining elements of plot and title data into 'Combined' for performing operations easily
df['combined'] = df['Title'] + ": " + df['Plot']
#converting the 'combined' column to a list for easy further processing
texts = df['combined'].tolist()

#we are using the Bi-Encoder to encode all the data from the dataset so that we can use it with semantic search
#we are using the cross encoder for to rerank the lists to improve the quality
#This retrieve and re-ranking is special addition to the second part of the code with first part using only semantic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

#encoding the documents using the bi-encoder
document_embeddings = bi_encoder.encode(texts)

#setting up the BM25 model for Initial Retrieval. BM25 is popular ranking function used to get the relevance of the documents.
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#definig both bm25, search and rerank functions
def bm25_search(query, N=100):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]
    return top_indices, np.sort(doc_scores)[::-1][:N]

def rerank_with_cross_encoder(query, candidate_idxs):
    pairs = [[query, texts[idx]] for idx in candidate_idxs]
    scores = cross_encoder_model.predict(pairs)
    ranked_idxs = [x for _, x in sorted(zip(scores, candidate_idxs), key=lambda pair: pair[0], reverse=True)]
    return ranked_idxs[:5]  # Return the top 5 results

def search_and_rerank(query):
    candidates_idx, _ = bm25_search(query)
    top_5_idx = rerank_with_cross_encoder(query, candidates_idx)
    return [(texts[idx], idx) for idx in top_5_idx]  # Return texts and indices

#inputing the query dynamically from the user
query = input("Enter your query here: ")

#performing search and rerank. Getting only five relevant results
top_5_results = search_and_rerank(query)

#initializing an empty dictionary to store the formatted results
formatted_results = {}

for result, idx in top_5_results:
    title, plot = result.split(": ", 1)
    formatted_results[title] = {"Plot": plot, "Index": idx}

#printing the top five most relevant search results with their title, plots and indexes.
for title, info in formatted_results.items():
    print(f' {title}: "{info["Plot"]}", Index:{info["Index"]}\n')

Enter your query here: Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.
 Cleopatra: "Because the film has been lost, the following summary is reconstructed from a description in a contemporary film magazine.
Cleopatra (Bara), the Siren of Egypt, by a clever ruse reaches Caesar (Leiber) and he falls victim to her charms. They plan to rule the world together, but then Caesar falls. Cleopatra's life is desired by the church, as the wanton woman's rule has become intolerable. Pharon (Roscoe), a high priest, is given a sacred dagger to take her life. He gives her his love instead and, when she is in need of some money, leads her to the tomb of his ancestors, where she tears the treasure from the breast of the mummy. With this wealth she goes to Rome to meet Antony (Hall). He leaves the affairs of state and travels to Alexandria with her, where they revel. Antony is recalled to Rome and married to Octavia (

In [None]:
#python script for using semantic search along with BM25 and combining with Re-Ranker
#importing all the necessary libraries and modules
import pandas as pd
from sentence_transformers import SentenceTransformer, CrossEncoder
from rank_bm25 import BM25Okapi
import numpy as np

#loading the cleaned dataset with first thousand entries and with coloumns "Title" and "Plot"
df = pd.read_csv('/content/first_1000_entries_dataset.csv')

#combining elements of plot and title data into 'Combined' for performing operations easily
df['combined'] = df['Title'] + ": " + df['Plot']
#converting the 'combined' column to a list for easy further processing
texts = df['combined'].tolist()

#we are using the Bi-Encoder to encode all the data from the dataset so that we can use it with semantic search
#we are using the cross encoder for to rerank the lists to improve the quality
#This retrieve and re-ranking is special addition to the second part of the code with first part using only semantic search
model_name = 'nq-distilbert-base-v1'
bi_encoder = SentenceTransformer(model_name)
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

#encoding the documents using the bi-encoder
document_embeddings = bi_encoder.encode(texts)

#setting up the BM25 model for Initial Retrieval. BM25 is popular ranking function used to get the relevance of the documents.
tokenized_corpus = [doc.split(" ") for doc in texts]
bm25 = BM25Okapi(tokenized_corpus)

#definig both bm25, search and rerank functions
def bm25_search(query, N=100):
    query_tokens = query.split(" ")
    doc_scores = bm25.get_scores(query_tokens)
    top_indices = np.argsort(doc_scores)[::-1][:N]
    return top_indices, np.sort(doc_scores)[::-1][:N]

def rerank_with_cross_encoder(query, candidate_idxs):
    pairs = [[query, texts[idx]] for idx in candidate_idxs]
    scores = cross_encoder_model.predict(pairs)
    ranked_idxs = [x for _, x in sorted(zip(scores, candidate_idxs), key=lambda pair: pair[0], reverse=True)]
    return ranked_idxs[:5]  # Return the top 5 results

def search_and_rerank(query):
    candidates_idx, _ = bm25_search(query)
    top_5_idx = rerank_with_cross_encoder(query, candidates_idx)
    return [(texts[idx], idx) for idx in top_5_idx]  # Return texts and indices

#inputing the query dynamically from the user
query = input("Enter your query here: ")

#performing search and rerank. Getting only five relevant results
top_5_results = search_and_rerank(query)

#initializing an empty dictionary to store the formatted results
formatted_results = {}

for result, idx in top_5_results:
    title, plot = result.split(": ", 1)
    formatted_results[title] = {"Plot": plot, "Index": idx}

#printing the top five most relevant search results with their title, plots and indexes.
for title, info in formatted_results.items():
    print(f' {title}: "{info["Plot"]}", Index:{info["Index"]}\n')

Enter your query here: Denis Gage Deane-Tanner
 Captain Alvarez: "A melodrama about an American who becomes a revolutionary leader battling evil government spies in Argentina. William Desmond Taylor portrays the title role, and Denis Gage Deane-Tanner, Taylor's younger brother, is thought to have played the small role of a blacksmith.", Index:67

 Silk Husbands and Calico Wives: "As described in a film magazine,[4] Deane Kendall (Peters), a country boy who has succeeded in being admitted to the bar, finds few clients in the small village of Harmony. When there is a sensational case involving a man being tried for the murder of his wife's lover, Edith Beecher (Alden), court stenographer and Deane's sweetheart, manages to arrange for Deane to defend the husband. Deane's masterful defense frees the man and Deane wins a position with a city law firm. Deane marries Edith and they move to the city. Deane makes rapid progress but Edith remains a "home body." Society girl Georgia Wilson (Novak

** Let us calculate the values of @Recall and @MRR now.**

#Calculating the Recall values

Recall is defined as the how many actual relevant results were shown out of all actual relevant results for the query. Mathematically, this is given by:

Recall@k =        true postives@k/(true positives@k + true negatives@K)

1. Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

The top 5 results for this query are: Mamba, The Four Horsemen of the Apocalypse, Tarzan of the Aps, Atlantis, Chang: A Drama of the Wilderness

Out of these five results,
Mamba- RELEVANT
The Four Horsemen of the Apocalypse- IRRELEVANT
Tarzan of the Aps- IRRELEVANT
Atlantis- RELEVANT
Chang: A Drama of the Wilderness- RELEVANT
Out of five results, three are relevant (1,4,5- relevant and 2,3 irrelevant)

Recall@1 = 1/(1+2) = 1/3 = 0.33
Recall@2 = 1/(1+2) = 1/3 = 0.33
Recall@3 = 1/(1+2) = 1/3 = 0.33
Recall@4 = 2/(2+1) = 2/3 = 0.66
Recall@5 = 3/(3+0) = 3/3 = 1

2. Western romance

The top 5 results for this query are: The General, Romance, The Sheik,
 Wild and Woolly, All Quiet on the Western Front,

Out of these five results,
The General- RELEVANT
Romance- RELEVANT
The Sheik- RELEVANT
Wild and Woolly- IRRELEVANT
All Quiet on the Western Front- IRRELEVANT   (1,2,3- relevant, 4,5 irrelevant)

Recall@1 = 1/(1+2) = 1/3 = 0.33
Recall@2 = 2/(2+1) = 2/3 = 0.66
Recall@3 = 3/(3+0) = 3/3 = 1
Since we got 1 at third step, we do not need to continue.

3. Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo

The top 5 results for this query are: Sahara, A Women of Affairs, He Who Gets Slapped, Foolish Wives, Kiki

Out of these five results,
Sahara- RELEVANT
A Women of Affairs- IRRELEVANT
He Who Gets Slapped- IRRELEVANT
Foolish Wives- IRRELEVANT
Kiki- RELEVANT                        (1,5- relevant, 2,3,4-irrelevant)

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 1/(1+2) = 1/2 = 0.5
Recall@3 = 1/(1+2) = 1/2 = 0.5
Recall@4 = 1/(1+1) = 1/2 = 0.5
Recall@5 = 1/(1+0) = 1/1 = 1


4. Comedy film, office disguises, boss's daughter, elopement

The top 5 results for this query are: Ask Father, Bucking Broadway, Mabel's Blunder, His Wedding Night, Show Boat

Out of these five results.
Ask Father- RELEVANT
Bucking Broadway- RELEVANT
Mabel's Blunder- IRRELEVANT
His Wedding Night- IRRELEVANT
Show Boat- IRRELEVANT                    (1,2-relevant,3,4,5-irrelavant)

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 2/(2+0) = 2/2 = 1           
Since we got 1 at second step, we do not need to continue.

5. Lost film, Cleopatra charms Caesar, plots world rule, treasures from
mummy, revels with Antony, tragic end with serpent in Alexandria.

The top 5 results for this query are: Cleopatra, Mama's Affair, The Hunchback of Notre Dame, Three Ages, What the Daisy Said

Out of these five results.
Cleopatra- RELEVANT
Mama's Affair- IRRELEVANT
The Hunchback of Notre Dame- IRRELEVANT
Three Ages- IRRELEVANT
What the Daisy Said- IRRELEVANT  (1 -relevant, 2,3,4,5 -irrelevant)

Recall@1 = 1/(1+0) = 1/1 = 1
Since we got 1 at first step, we do not need to continue.

6. Denis Gage Deane-Tanner

The top 5 results for this query are: Captain Alvarez, Silk Husbands and Caalico Wives, Hangman's House, The Law of Men, The Delicious Little Devil

Out of these five results.
Captain Alvarez- RELEVANT
Silk Husbands and Calico Wives- IRRELVANT
Hangman's Houses- IRRELEVANT
The Law of Men- RELEVANT
The Delicious- IRRELEVANT           (1,4 -relevant, 2,3,5 -irrelevant)

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 1/(1+1) = 1/2 = 0.5
Recall@3 = 1/(1+1) = 1/2 = 0.5
Recall@4 = 2/(2+0) = 2/2 = 1
Since we got 1 at fourth step, we do not need to continue.


** Calculating the MRR **

This metric is useful when we want our system to return the best relevant item and want that item to be at a higher position. Mathematically, this is given by:

To calculate MRR, we first calculate the reciprocal rank. It is simply the reciprocal of the rank of the first correct relevant result and the value ranges from 0 to 1.


1. Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

The top 5 results for this query are: Mamba, The Four Horsemen of the Apocalypse, Tarzan of the Aps, Atlantis, Chang: A Drama of the Wilderness

Out of these five results,
Mamba- RELEVANT
The Four Horsemen of the Apocalypse- IRRELEVANT
Tarzan of the Aps- IRRELEVANT
Atlantis- RELEVANT
Chang: A Drama of the Wilderness- RELEVANT
Out of five results, three are relevant (1,4,5- relevant and 2,3 irrelevant)

To calculate MRR, we first calculate the reciprocal rank. It is simply the reciprocal of the rank of the first correct relevant result and the value ranges from 0 to 1.

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

2. Western romance

The top 5 results for this query are: The General, Romance, The Sheik,
 Wild and Woolly, All Quiet on the Western Front,

Out of these five results,
The General- RELEVANT
Romance- RELEVANT
The Sheik- RELEVANT
Wild and Woolly- IRRELEVANT
All Quiet on the Western Front- IRRELEVANT   (1,2,3- relevant, 4,5 irrelevant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


3. Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo

The top 5 results for this query are: Sahara, A Women of Affairs, He Who Gets Slapped, Foolish Wives, Kiki

Out of these five results,
Sahara- RELEVANT
A Women of Affairs- IRRELEVANT
He Who Gets Slapped- IRRELEVANT
Foolish Wives- IRRELEVANT
Kiki- RELEVANT                        (1,5- relevant, 2,3,4-irrelevant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


4. Comedy film, office disguises, boss's daughter, elopement

The top 5 results for this query are: Ask Father, Bucking Broadway, Mabel's Blunder, His Wedding Night, Show Boat

Out of these five results.
Ask Father- RELEVANT
Bucking Broadway- RELEVANT
Mabel's Blunder- IRRELEVANT
His Wedding Night- IRRELEVANT
Show Boat- IRRELEVANT                    (1,2-relevant,3,4,5-irrelavant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


5. Lost film, Cleopatra charms Caesar, plots world rule, treasures from
mummy, revels with Antony, tragic end with serpent in Alexandria.

The top 5 results for this query are: Cleopatra, 4 Devils, Disraeli, Bound in Morocco,Souls for Sale  

Out of these five results.
Cleopatra- RELEVANT
4 Devils- IRRELEVANT
Disraeli- IRRELEVANT
Bound in Morocco- RELEVANT
Souls for Sale- IRRELEVANT       (1,4 -relevant, 2,3,5 -irrelevant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

6. Denis Gage Deane-Tanner

The top 5 results for this query are: Captain Alvarez, Silk Husbands and Caalico Wives, Hangman's House, The Law of Men, The Delicious Little Devil

Out of these five results.
Captain Alvarez- RELEVANT
Silk Husbands and Calico Wives- IRRELVANT
Hangman's Houses- IRRELEVANT
The Law of Men- RELEVANT
The Delicious- IRRELEVANT    

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

After calculating the individual MRR's, we need to calculate their mean to get the MRR for the problem. Here we are having 5 Queries, so

MMR Total = [MRR(Query1)+ MRR(Query2)+ MRR(Query3)+ MRR(Query4) + MRR(Query5)]/Total queries
= 1+1+1+1+1/5
=1

Mean Reciprocal Rank for this data is 1