**Semantic Search - Relevant Search Results Retrieval from a dataset**

This examples demonstrates the setup for retrieving the relevant searches form the dataset uing a query

You can input a query with the movie plot or title or both. The script then uses semantic search to find relevant search results in Simple English Wikipedia (as it is smaller and fits better in RAM).

As model, we use: nq-distilbert-base-v1

It was trained on the Natural Questions dataset, a dataset with real questions from Google Search together with annotated data from Wikipedia providing the answer. For the passages, we encode the Wikipedia article tile together with the individual text passages.

In [None]:
!pip install kaggle



In [None]:
import pandas as pd #importing the pandas module for loading the dataset into pandas dataframe

# Loading the CSV file into a DataFrame
df = pd.read_csv('/content/wiki_movie_plots_deduped.csv', nrows=1000)

# Display the first thousand rows of the DataFrame
print(df.head())

   Release Year                             Title Origin/Ethnicity  \
0          1901            Kansas Saloon Smashers         American   
1          1901     Love by the Light of the Moon         American   
2          1901           The Martyred Presidents         American   
3          1901  Terrible Teddy, the Grizzly King         American   
4          1902            Jack and the Beanstalk         American   

                             Director Cast    Genre  \
0                             Unknown  NaN  unknown   
1                             Unknown  NaN  unknown   
2                             Unknown  NaN  unknown   
3                             Unknown  NaN  unknown   
4  George S. Fleming, Edwin S. Porter  NaN  unknown   

                                           Wiki Page  \
0  https://en.wikipedia.org/wiki/Kansas_Saloon_Sm...   
1  https://en.wikipedia.org/wiki/Love_by_the_Ligh...   
2  https://en.wikipedia.org/wiki/The_Martyred_Pre...   
3  https://en.wikipedia.

In [None]:
import pandas as pd #loading pandas module for displaying with help of data frame

#Reading the first 1000 rows and only 'Title' and 'Plot' columns. We need those only so that we can query them and get the results
df = pd.read_csv('/content/wiki_movie_plots_deduped.csv', usecols=['Title', 'Plot'], nrows=1000)
df.head()

Unnamed: 0,Title,Plot
0,Kansas Saloon Smashers,"A bartender is working at a saloon, serving dr..."
1,Love by the Light of the Moon,"The moon, painted with a smiling face hangs ov..."
2,The Martyred Presidents,"The film, just over a minute long, is composed..."
3,"Terrible Teddy, the Grizzly King",Lasting just 61 seconds and consisting of two ...
4,Jack and the Beanstalk,The earliest known adaptation of the classic f...


Now we loaded the first 1000 data entries from the csv file into the dataframe. We need only the title and plot columns, so we loaded them in this format "Sno Title Plot".

First let us start by installing the libraries.

In [None]:
!pip install sentence-transformers torch

Collecting sentence-transformers
  Downloading sentence_transformers-2.6.1-py3-none-any.whl (163 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.3/163.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m59.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m67.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m

Now we have downloaded all the required libraries. We are going to use the transformers for building a basic semantic search engine.

In [None]:
!pip install -U sentence-transformers



In [None]:
!pip install sentence-transformers



In [None]:
import pandas as pd

# Specifying the path of the new CSV file with two columns and first thousand entries
output_file = "first_1000_entries_dataset.csv"

# Saving the DataFrame to a new CSV file
df.to_csv(output_file, index=False)

print("DataFrame saved to", output_file)

DataFrame saved to first_1000_entries_dataset.csv


I converted the dataframe with the thousand values and desired two coloumns to an another dataset with the name "first_1000_entries_dataset.csv" so that it will be very flexible for the code to perfrom the operations and fetch the top five reelvant results accordingly.

In [None]:
import json #importing required modules
from sentence_transformers import SentenceTransformer, CrossEncoder, util
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import time
import gzip
import os
import torch

if not torch.cuda.is_available(): #checking for the GPU availability
  print("Warning: No GPU found. Please add GPU to your notebook")

df2 = pd.read_csv("/content/first_1000_entries_dataset.csv") #specifying the path of the updated dataset for perfroming operation and loading it

df['text'] = df['Title'] + ' ' + df['Plot'] #Concatenating the two title and plot columns into a single column

#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
# We are using the "all-MiniLM-L6-v2" model developed by the Microsoft Research with Bi-Encoder
model_name = 'all-MiniLM-L6-v2'
bi_encoder = SentenceTransformer(model_name)

corpus_embeddings = bi_encoder.encode(df['text']) #Loading the corpus data from the dataset and encoding the data into embbedings

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. We split these articles into texts and encode them with the bi-encoder
query = input("Enter your query: ")  #taking query as input from runtime
query_embedding = bi_encoder.encode(query)

# Calculate cosine similarity between query and corpus embeddings
similarity_scores = cosine_similarity([query_embedding], corpus_embeddings)[0]

# Retrieve top N most similar results
N = 5
top_indices = similarity_scores.argsort()[-N:][::-1]
top_results = df.iloc[top_indices]

print(top_results)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Enter your query: Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions
                                Title  \
438               Nanook of the North   
431                  The Frozen North   
83    In the Land of the Head Hunters   
822                   Masked Emotions   
626  Chang: A Drama of the Wilderness   

                                                  Plot  \
438  The documentary follows the lives of an Inuk, ...   
431  The film opens near the "last stop on the subw...   
83   The following plot synopsis was published in c...   
822  Set on the Maine coast, a young sloop skipper ...   
626  In the directors' own words, Chang is a "melod...   

                                                  text  
438  Nanook of the North The documentary follows th...  
431  The Frozen North The film opens near the "last...  
83   In the Land of the Head Hunters The following ...  
822  Masked Emotions Set on the Maine coast, a youn...  
626  Chang: A 

We are going with the "all-MiniLM-L6-v2 " model developed by the Microsoft Research. The advantage of this model is that it is already trained on bringing the relevant search results. It is a bit more powerful that BERT and ChatGPT. It is also a smaller model.

First we loaded all the required modules. Then we have written a small code to test whether our GPU is working or not. If GPU is not working then it prints a message that "Warning: No GPU found. Please add GPU to your notebook."

Then I have loaded the cleaned dataset (with only two rows which are required--title and plot) with the first thousand entries. And then we merged two coloumns in the dataframe as a single coloumn 'test' to accept and run the queries with both title and plot.

We have used the Bi encoder (text encoding and context encoding) for getting better search results and "MiniLM" model for converting all corpus (Here, corpus is the whole dataset) into embeddings and storing them in corpus_embeddings array.

Then we have taken query as input for the runtime and then converted into embedding by using the same procedure and storing it into the query_embeddings array.

Then both similarity between both vectors (marked in the vector space) is calculated with cosine similarity method. After calculating, all the scores are arranged in descedning order. The higher the smiliarity score, higher the relevancy and top 5 title, plots and text columns are fetched by the top_indices and stored into top_results variable, which is printed finally.





In [None]:
import json #importing required modules
from sentence_transformers import SentenceTransformer, CrossEncoder, util
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import time
import gzip
import os
import torch

if not torch.cuda.is_available(): #checking for the GPU availability
  print("Warning: No GPU found. Please add GPU to your notebook")

df2 = pd.read_csv("/content/first_1000_entries_dataset.csv") #specifying the path of the updated dataset for perfroming operation and loading it

df['text'] = df['Title'] + ' ' + df['Plot'] #Concatenating the two title and plot columns into a single column

#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
# We are using the "all-MiniLM-L6-v2" model developed by the Microsoft Research with Bi-Encoder
model_name = 'all-MiniLM-L6-v2'
bi_encoder = SentenceTransformer(model_name)

corpus_embeddings = bi_encoder.encode(df['text']) #Loading the corpus data from the dataset and encoding the data into embbedings

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. We split these articles into texts and encode them with the bi-encoder
query = input("Enter your query: ")  #taking query as input from runtime
query_embedding = bi_encoder.encode(query)

# Calculate cosine similarity between query and corpus embeddings
similarity_scores = cosine_similarity([query_embedding], corpus_embeddings)[0]

# Retrieve top N most similar results
N = 5
top_indices = similarity_scores.argsort()[-N:][::-1]
top_results = df.iloc[top_indices]

print(top_results)

Enter your query: “Western romance
                         Title  \
347                    Romance   
158           Bucking Broadway   
189            Wild and Woolly   
292  A Romance of Happy Valley   
489      The Enchanted Cottage   

                                                  Plot  \
347  As described in a film publication,[2] a youth...   
158  As described in a film magazine,[3] Cheyenne H...   
189  As described in a film magazine review,[1] Jef...   
292  As described in a film magazine,[3] the senior...   
489  Crippled by the war, Oliver Bashforth (Richard...   

                                                  text  
347  Romance As described in a film publication,[2]...  
158  Bucking Broadway As described in a film magazi...  
189  Wild and Woolly As described in a film magazin...  
292  A Romance of Happy Valley As described in a fi...  
489  The Enchanted Cottage Crippled by the war, Oli...  


In [None]:
import json #importing required modules
from sentence_transformers import SentenceTransformer, CrossEncoder, util
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import time
import gzip
import os
import torch

if not torch.cuda.is_available(): #checking for the GPU availability
  print("Warning: No GPU found. Please add GPU to your notebook")

df2 = pd.read_csv("/content/first_1000_entries_dataset.csv") #specifying the path of the updated dataset for perfroming operation and loading it

df['text'] = df['Title'] + ' ' + df['Plot'] #Concatenating the two title and plot columns into a single column

#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
# We are using the "all-MiniLM-L6-v2" model developed by the Microsoft Research with Bi-Encoder
model_name = 'all-MiniLM-L6-v2'
bi_encoder = SentenceTransformer(model_name)

corpus_embeddings = bi_encoder.encode(df['text']) #Loading the corpus data from the dataset and encoding the data into embbedings

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. We split these articles into texts and encode them with the bi-encoder
query = input("Enter your query: ")  #taking query as input from runtime
query_embedding = bi_encoder.encode(query)

# Calculate cosine similarity between query and corpus embeddings
similarity_scores = cosine_similarity([query_embedding], corpus_embeddings)[0]

# Retrieve top N most similar results
N = 5
top_indices = similarity_scores.argsort()[-N:][::-1]
top_results = df.iloc[top_indices]

print(top_results)

Enter your query: Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo
                              Title  \
293                          Sahara   
821            Married in Hollywood   
977                     Mothers Cry   
30   The House with Closed Shutters   
65                       A Busy Day   

                                                  Plot  \
293  Silent film femme fatale, Louise Glaum, portra...   
821  A showgirl, part of a troupe, tours Europe whe...   
977  The film is focused on the life of widowed mot...   
30   During the American Civil War a young soldier ...   
65   In A Busy Day, a wife (played by an energetic ...   

                                                  text  
293  Sahara Silent film femme fatale, Louise Glaum,...  
821  Married in Hollywood A showgirl, part of a tro...  
977  Mothers Cry The film is focused on the life of...  
30   The House with

In [None]:
import json #importing required modules
from sentence_transformers import SentenceTransformer, CrossEncoder, util
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import time
import gzip
import os
import torch

if not torch.cuda.is_available(): #checking for the GPU availability
  print("Warning: No GPU found. Please add GPU to your notebook")

df2 = pd.read_csv("/content/first_1000_entries_dataset.csv") #specifying the path of the updated dataset for perfroming operation and loading it

df['text'] = df['Title'] + ' ' + df['Plot'] #Concatenating the two title and plot columns into a single column

#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
# We are using the "all-MiniLM-L6-v2" model developed by the Microsoft Research with Bi-Encoder
model_name = 'all-MiniLM-L6-v2'
bi_encoder = SentenceTransformer(model_name)

corpus_embeddings = bi_encoder.encode(df['text']) #Loading the corpus data from the dataset and encoding the data into embbedings

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. We split these articles into texts and encode them with the bi-encoder
query = input("Enter your query: ")  #taking query as input from runtime
query_embedding = bi_encoder.encode(query)

# Calculate cosine similarity between query and corpus embeddings
similarity_scores = cosine_similarity([query_embedding], corpus_embeddings)[0]

# Retrieve top N most similar results
N = 5
top_indices = similarity_scores.argsort()[-N:][::-1]
top_results = df.iloc[top_indices]

print(top_results)

Enter your query: Comedy film, office disguises, boss's daughter, elopement.
                              Title  \
253                      Ask Father   
68              Caught in a Cabaret   
441                         Pay Day   
65                       A Busy Day   
192  Amarilly of Clothes-Line Alley   

                                                  Plot  \
253  Lloyd is a serious young middle-class guy on t...   
68   Chaplin plays a waiter who fakes being a Greek...   
441  Chaplin plays a laborer on a house constructio...   
65   In A Busy Day, a wife (played by an energetic ...   
192  Set in San Francisco during the early 1900s, t...   

                                                  text  
253  Ask Father Lloyd is a serious young middle-cla...  
68   Caught in a Cabaret Chaplin plays a waiter who...  
441  Pay Day Chaplin plays a laborer on a house con...  
65   A Busy Day In A Busy Day, a wife (played by an...  
192  Amarilly of Clothes-Line Alley Set in San Fran...

In [None]:
import json #importing required modules
from sentence_transformers import SentenceTransformer, CrossEncoder, util
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import time
import gzip
import os
import torch

if not torch.cuda.is_available(): #checking for the GPU availability
  print("Warning: No GPU found. Please add GPU to your notebook")

df2 = pd.read_csv("/content/first_1000_entries_dataset.csv") #specifying the path of the updated dataset for perfroming operation and loading it

df['text'] = df['Title'] + ' ' + df['Plot'] #Concatenating the two title and plot columns into a single column

#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
# We are using the "all-MiniLM-L6-v2" model developed by the Microsoft Research with Bi-Encoder
model_name = 'all-MiniLM-L6-v2'
bi_encoder = SentenceTransformer(model_name)

corpus_embeddings = bi_encoder.encode(df['text']) #Loading the corpus data from the dataset and encoding the data into embbedings

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. We split these articles into texts and encode them with the bi-encoder
query = input("Enter your query: ")  #taking query as input from runtime
query_embedding = bi_encoder.encode(query)

# Calculate cosine similarity between query and corpus embeddings
similarity_scores = cosine_similarity([query_embedding], corpus_embeddings)[0]

# Retrieve top N most similar results
N = 5
top_indices = similarity_scores.argsort()[-N:][::-1]
top_results = df.iloc[top_indices]

print(top_results)

Enter your query: Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria
                Title                                               Plot  \
162         Cleopatra  Because the film has been lost, the following ...   
668          4 Devils  The plot concerns four orphans (Janet Gaynor, ...   
377          Disraeli  As described in a film magazine,[3] Disraeli (...   
199  Bound in Morocco  As described in a film magazine,[3] George Tra...   
476    Souls for Sale  Remember "Mem" Steddon (Eleanor Boardman) marr...   

                                                  text  
162  Cleopatra Because the film has been lost, the ...  
668  4 Devils The plot concerns four orphans (Janet...  
377  Disraeli As described in a film magazine,[3] D...  
199  Bound in Morocco As described in a film magazi...  
476  Souls for Sale Remember "Mem" Steddon (Eleanor...  


In [None]:
import json #importing required modules
from sentence_transformers import SentenceTransformer, CrossEncoder, util
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import time
import gzip
import os
import torch

if not torch.cuda.is_available(): #checking for the GPU availability
  print("Warning: No GPU found. Please add GPU to your notebook")

df2 = pd.read_csv("/content/first_1000_entries_dataset.csv") #specifying the path of the updated dataset for perfroming operation and loading it

df['text'] = df['Title'] + ' ' + df['Plot'] #Concatenating the two title and plot columns into a single column

#We use the Bi-Encoder to encode all passages, so that we can use it with sematic search
# We are using the "all-MiniLM-L6-v2" model developed by the Microsoft Research with Bi-Encoder
model_name = 'all-MiniLM-L6-v2'
bi_encoder = SentenceTransformer(model_name)

corpus_embeddings = bi_encoder.encode(df['text']) #Loading the corpus data from the dataset and encoding the data into embbedings

# As dataset, we use Simple English Wikipedia. Compared to the full English wikipedia, it has only about 170k articles. We split these articles into texts and encode them with the bi-encoder
query = input("Enter your query: ")  #taking query as input from runtime
query_embedding = bi_encoder.encode(query)

# Calculate cosine similarity between query and corpus embeddings
similarity_scores = cosine_similarity([query_embedding], corpus_embeddings)[0]

# Retrieve top N most similar results
N = 5
top_indices = similarity_scores.argsort()[-N:][::-1]
top_results = df.iloc[top_indices]

print(top_results)

Enter your query: Denis Gage Deane-Tanner
                      Title  \
67          Captain Alvarez   
942         Hold Everything   
654                 Rookies   
979  Near the Rainbow's End   
966      A Man from Wyoming   

                                                  Plot  \
67   A melodrama about an American who becomes a re...   
942  Brown plays Gink Schiner, a third-rate fighter...   
654  During World War I an entertainer named Greg L...   
979  Rancher Tug Wilson (Alfred Hewston) discovers ...   
966  After the United States enters World War I in ...   

                                                  text  
67   Captain Alvarez A melodrama about an American ...  
942  Hold Everything Brown plays Gink Schiner, a th...  
654  Rookies During World War I an entertainer name...  
979  Near the Rainbow's End Rancher Tug Wilson (Alf...  
966  A Man from Wyoming After the United States ent...  


Let us calculate the values of @Recall and @MMR now.

**Calculating the Recall values**

Recall is defined as the how many actual relevant results were shown out of all actual relevant results for the query. Mathematically, this is given by:

Recall@k =        true postives@k/(true positives@k + true negatives@K)

1. Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

The top 5 results for this query are: Nanook of the North, The Frozen North, In the Land of the Head Hunters, Masked Emotions, Chang: A Drama of the Wilderness   

Out of these five results,
Nanook of the North- RELEVANT
The Frozen North- RELEVANT
In the Land of the Head Hunters- IRRELEVANT
Masked Emotions- IRRELEVANT
Chang: A Drama of the Wilderness-RELEVANT
Out of five results, three are relevant (1,2,5- relevant and 3,4- irrelevant)

Recall@1 = 1/(1+2) = 1/3 = 0.33
Recall@2 = 2/(2+1) = 2/3 = 0.66
Recall@3 = 2/(2+1) = 2/3 = 0.66
Recall@4 = 2/(2+1) = 2/3 = 0.66
Recall@5 = 3/(3+0) = 3/3 = 1

2. Western romance

The top 5 results for this query are: Romance, Bucking Broadway, Wild and Wolly, A Romance of Happy Valley, The Enchanted Cottage

Out of these five results,
Romance- RELEVANT
Bucking Broadway- RELEVANT   
Wild and Woolly- IRRELEVANT
A Romance of Happy Valley- IRRELEVANT
The Enchanted Cottage- RELEVANT  (1,2,5- relevant, 3,4- irrelevant)

Recall@1 = 1/(1+2) = 1/3 = 0.33
Recall@2 = 2/(2+1) = 2/3 = 0.66
Recall@3 = 2/(2+1) = 2/3 = 0.66
Recall@4 = 2/(2+1) = 2/3 = 0.66
Recall@5 = 3/(3+0) = 3/3 = 1

3. Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo

The top 5 results for this query are: Sahara, Married in Hollywood,
 Mothers Cry, The House with Closed Shutters, A Busy Day   

Out of these five results,
Sahara- RELEVANT
Married in Hollywood- IRRELEVANT
Mothers Cry- IRRELEVANT
The House with Closed Shutters- IRRELEVANT
A Busy Day- IRRELEVANT (1- relevant, 2,3,4,5- irrelevant)

Recall@1 = 1/(1+0) = 1/1 = 1
Since we got 1 at first step, we do not need to continue.

4. Comedy film, office disguises, boss's daughter, elopement

The top 5 results for this query are: Ask Father, Caught in a Cabaret, Pay Day, A Busy Day, Amarilly of Clothes-Line Alley   

Out of these five results.
Ask Father- RELEVANT
Caught in a Cabaret- RELEVANT
Pay Day- IRRELEVANT
A Busy Day- IRRELEVANT
Amarilly of Clothes-Line Alley- IRRELEVANT

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 2/(2+0) = 2/2 = 1
Since we got 1 at second step, we do not need to continue.

5. Lost film, Cleopatra charms Caesar, plots world rule, treasures from
mummy, revels with Antony, tragic end with serpent in Alexandria.

The top 5 results for this query are: Cleopatra, 4 Devils, Disraeli, Bound in Morocco,Souls for Sale  

Out of these five results.
Cleopatra- RELEVANT
4 Devils- IRRELEVANT
Disraeli- IRRELEVANT
Bound in Morocco- RELEVANT
Souls for Sale- IRRELEVANT

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 1/(1+1) = 1/2 = 0.5
Recall@3 = 1/(1+1) = 1/2 = 0.5
Recall@4 = 2/(2+0) = 2/2 = 1
Since we got 1 at fourth step, we do not need to continue.

6. Denis Gage Deane-Tanner

The top 5 results for this query are: Captain Alvarez, Hold Everything, Rookies, Near the Rainbow's End, A Man from Wyoming

Out of these five results.
Captain Alvarez- RELEVANT
Hold Everything- IRRELEVANT
Rookies- IRRELEVANT
Near the Rainbow's End- RELEVANT    
A Man from Wyoming- IRRELEVANT

Recall@1 = 1/(1+1) = 1/2 = 0.5
Recall@2 = 1/(1+1) = 1/2 = 0.5
Recall@3 = 1/(1+1) = 1/2 = 0.5
Recall@4 = 2/(2+0) = 2/2 = 1
Since we got 1 at fourth step, we do not need to continue.

** Calculating the MRR **

This metric is useful when we want our system to return the best relevant item and want that item to be at a higher position. Mathematically, this is given by:

To calculate MRR, we first calculate the reciprocal rank. It is simply the reciprocal of the rank of the first correct relevant result and the value ranges from 0 to 1.


1. Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

The top 5 results for this query are: Nanook of the North, The Frozen North, In the Land of the Head Hunters, Masked Emotions, Chang: A Drama of the Wilderness   

Out of these five results,
Nanook of the North- RELEVANT
The Frozen North- RELEVANT
In the Land of the Head Hunters- IRRELEVANT
Masked Emotions- IRRELEVANT
Chang: A Drama of the Wilderness-RELEVANT
Out of five results, three are relevant (1,2,5- relevant and 3,4- irrelevant)

To calculate MRR, we first calculate the reciprocal rank. It is simply the reciprocal of the rank of the first correct relevant result and the value ranges from 0 to 1.

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

2. Western romance

The top 5 results for this query are: Romance, Bucking Broadway, Wild and Wolly, A Romance of Happy Valley, The Enchanted Cottage

Out of these five results,
Romance- RELEVANT
Bucking Broadway- RELEVANT   
Wild and Woolly- IRRELEVANT
A Romance of Happy Valley- IRRELEVANT
The Enchanted Cottage- RELEVANT  (1,2,5- relevant, 3,4- irrelevant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


3. Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo

The top 5 results for this query are: Sahara, Married in Hollywood,
 Mothers Cry, The House with Closed Shutters, A Busy Day   

Out of these five results,
Sahara- RELEVANT
Married in Hollywood- IRRELEVANT
Mothers Cry- IRRELEVANT
The House with Closed Shutters- IRRELEVANT
A Busy Day- IRRELEVANT (1- relevant, 2,3,4,5- irrelevant)

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


4. Comedy film, office disguises, boss's daughter, elopement

The top 5 results for this query are: Ask Father, Caught in a Cabaret, Pay Day, A Busy Day, Amarilly of Clothes-Line Alley   

Out of these five results.
Ask Father- RELEVANT
Caught in a Cabaret- RELEVANT
Pay Day- IRRELEVANT
A Busy Day- IRRELEVANT
Amarilly of Clothes-Line Alley- IRRELEVANT

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


5. Lost film, Cleopatra charms Caesar, plots world rule, treasures from
mummy, revels with Antony, tragic end with serpent in Alexandria.

The top 5 results for this query are: Cleopatra, 4 Devils, Disraeli, Bound in Morocco,Souls for Sale  

Out of these five results.
Cleopatra- RELEVANT
4 Devils- IRRELEVANT
Disraeli- IRRELEVANT
Bound in Morocco- RELEVANT
Souls for Sale- IRRELEVANT

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)


6. Denis Gage Deane-Tanner

The top 5 results for this query are: Captain Alvarez, Hold Everything, Rookies, Near the Rainbow's End, A Man from Wyoming

Out of these five results.
Captain Alvarez- RELEVANT
Hold Everything- IRRELEVANT
Rookies- IRRELEVANT
Near the Rainbow's End- RELEVANT    
A Man from Wyoming- IRRELEVANT

For this query, the reciprocal rank is
1/1 and MRR = 1 (as the first correct item is at position 1.)

After calculating the individual MRR's, we need to calculate their mean to get the MRR for the problem. Here we are having 5 Queries, so

MMR Total = [MRR(Query1)+ MRR(Query2)+ MRR(Query3)+ MRR(Query4) + MRR(Query5)]/Total queries
= 1+1+1+1+1/5
=1

MRR for this data is 1