<a href="https://colab.research.google.com/github/ZacharySoo01/I320D_TextMining-NLP_FinalProject/blob/main/compare_and_evaluate_different_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loading data

In [None]:
import pandas as pd

arxiv_df = pd.read_csv("arxiv_results.csv")
arxiv_df.head()

Unnamed: 0,id,title,summary
0,1,BLINK: Multimodal Large Language Models Can Se...,"We introduce Blink, a new benchmark for multim..."
1,2,"Reka Core, Flash, and Edge: A Series of Powerf...","We introduce Reka Core, Flash, and Edge, a ser..."
2,3,When LLMs are Unfit Use FastFit: Fast and Effe...,"We present FastFit, a method, and a Python pac..."
3,4,Large Language Models in Targeted Sentiment An...,In this paper we investigate the use of decode...
4,5,Reuse Your Rewards: Reward Model Transfer for ...,Aligning language models (LMs) based on human-...


In [None]:
print(arxiv_df.shape)

(10000, 3)


# Preprocessing text

In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import words
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('words')

# get a set of stopwords from NLTK
stops = set(stopwords.words('english'))
# get a set of words from the english dictionary
en_dict = set(words.words())


def pre_process_text(text):
  # 1) Lowercasing
  text = text.lower()

  processed_text = []

  # 2) Tokenize the text
  txt = word_tokenize(text)

  # 3) Lemmatize the text
  wnl = WordNetLemmatizer()
  lemmatized_words = [wnl.lemmatize(token) for token in txt]

  # 4) Filter out non-words and stopwords
  filtered_text = [token for token in lemmatized_words if token not in stops and token in en_dict]

  processed_text = " ".join (filtered_text)
  return processed_text

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


In [None]:
original_titles_list = arxiv_df["title"].tolist()
print(original_titles_list[:5])

['BLINK: Multimodal Large Language Models Can See but Not Perceive', 'Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models', 'When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes', 'Large Language Models in Targeted Sentiment Analysis', 'Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment']


In [None]:
summaries_list = arxiv_df["summary"].tolist()
print(summaries_list[:5])

['We introduce Blink, a new benchmark for multimodal language models (LLMs)that focuses on core visual perception abilities not found in otherevaluations. Most of the Blink tasks can be solved by humans "within a blink"(e.g., relative depth estimation, visual correspondence, forensics detection,and multi-view reasoning). However, we find these perception-demanding taskscast significant challenges for current multimodal LLMs because they resistmediation through natural language. Blink reformats 14 classic computer visiontasks into 3,807 multiple-choice questions, paired with single or multipleimages and visual prompting. While humans get 95.70% accuracy on average, Blinkis surprisingly challenging for existing multimodal LLMs: even thebest-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only13.17% and 7.63% higher than random guessing, indicating that such perceptionabilities have not "emerged" yet in recent multimodal LLMs. Our analysis alsohighlights that special

In [None]:
text_list = [pre_process_text(title+' '+summary) for title, summary in zip(original_titles_list, summaries_list)]
print(text_list[:5])

['blink multimodal large language model see perceive introduce blink new multimodal language model focus core visual perception ability found blink task human within blink relative depth estimation visual correspondence detection reasoning however find significant challenge current multimodal natural language blink classic computer question paired single visual human get accuracy average surprisingly multimodal even achieve accuracy higher random guessing yet recent multimodal analysis specialist model could solve problem much better suggesting potential pathway future improvement believe blink community help multimodal catch perception', 'core flash edge series powerful multimodal language model introduce core flash edge series powerful model trained scratch model able reason text image video audio input technical detail training model result show edge flash also outperform many much model value respective compute class meanwhile model core approach best frontier model evaluation blin

Three example search queries I came up with (to test+evaluate the word embeddings qualitatively)

In [None]:
# three example queries
original_queries_list = ["Semantic parsing techniques for natural language understanding", "Neural network architectures for sentiment analysis", "Named entity recognition models for medical text"]
queries_list = [pre_process_text(query) for query in original_queries_list]

# BERT

In [None]:
!pip install sentence_transformers

Collecting sentence_transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/171.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m163.8/171.5 kB[0m [31m5.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence_transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (

In [None]:
from sentence_transformers import SentenceTransformer

# This uses averaged word embeddings to create sentences of a single vector
bert_model = SentenceTransformer('bert-base-nli-mean-tokens')

text_embeddings = bert_model.encode(text_list)
query_embeddings = bert_model.encode(queries_list)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.99k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/625 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/399 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
import numpy as np
def cosine_distance_based_similarity (vector1, vector2):
    dot_product = np.dot(vector1, vector2)
    norm_vector1 = np.linalg.norm(vector1)
    norm_vector2 = np.linalg.norm(vector2)
    return dot_product / (norm_vector1 * norm_vector2)

def print_search(queries, titles):
    summary_list = arxiv_df["summary"].tolist()
    for idx, queryVector in enumerate(queries):
        similarity_scores = {}
        for i, title_vector in enumerate(titles):
          sim = cosine_distance_based_similarity(title_vector, queryVector)
          similarity_scores[i] = sim

        # Sorting in ascending order
        ranked_texts = sorted(similarity_scores.items(),key = lambda x: x[1], reverse=True)
        print (f"Query: {original_queries_list[idx]}")
        print ("----------------------------------------")

        # Rank texts based on the similarity score in ascending order. Print the top 5 most similar texts.
        for ranked_texts_idx, score in ranked_texts[:5]:
            print(f"Title: {original_titles_list[ranked_texts_idx]}")
            print(f"Summary: {summary_list[ranked_texts_idx]}")
            # dont need to display score...
            print(f"Score: {score}")
            print("----------------------------------------")
        print()


In [None]:
print_search(query_embeddings, text_embeddings)

Query: Semantic parsing techniques for natural language understanding
----------------------------------------
Title: FaBERT: Pre-training BERT on Persian Blogs
Summary: We introduce FaBERT, a Persian BERT-base model pre-trained on the HmBlogscorpus, encompassing both informal and formal Persian texts. FaBERT is designedto excel in traditional Natural Language Understanding (NLU) tasks, addressingthe intricacies of diverse sentence structures and linguistic styles prevalentin the Persian language. In our comprehensive evaluation of FaBERT on 12datasets in various downstream tasks, encompassing Sentiment Analysis (SA),Named Entity Recognition (NER), Natural Language Inference (NLI), QuestionAnswering (QA), and Question Paraphrasing (QP), it consistently demonstratedimproved performance, all achieved within a compact model size. The findingshighlight the importance of utilizing diverse and cleaned corpora, such asHmBlogs, to enhance the performance of language models like BERT in Persian

# TF-IDF

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()
tfidf_vectorizer.fit(text_list)
transformed_text = tfidf_vectorizer.transform(text_list).toarray()
transformed_queries = tfidf_vectorizer.transform(queries_list).toarray()

In [None]:
print_search(transformed_queries, transformed_text)

Query: Semantic parsing techniques for natural language understanding
----------------------------------------
Title: The Era of Semantic Decoding
Summary: Recent work demonstrated great promise in the idea of orchestratingcollaborations between LLMs, human input, and various tools to address theinherent limitations of LLMs. We propose a novel perspective called semanticdecoding, which frames these collaborative processes as optimization proceduresin semantic space. Specifically, we conceptualize LLMs as semantic processorsthat manipulate meaningful pieces of information that we call semantic tokens(known thoughts). LLMs are among a large pool of other semantic processors,including humans and tools, such as search engines or code executors.Collectively, semantic processors engage in dynamic exchanges of semantictokens to progressively construct high-utility outputs. We refer to theseorchestrated interactions among semantic processors, optimizing and searchingin semantic space, as seman

# Word2Vec

In [None]:
import gensim.downloader as api

# Download a pre-trained word2vec (trained on Google News data)
w2v_model = api.load("word2vec-google-news-300")



In [None]:
def w2v_average_word_embeddings(sentence):
    words = sentence.split()
    word_vectors = [w2v_model[word] for word in words if word in w2v_model]
    if not word_vectors:
        return np.zeros(w2v_model.vector_size)
    return np.mean(word_vectors, axis=0)

In [None]:
# Transform titles and queries
transformed_text = [w2v_average_word_embeddings(doc) for doc in text_list]
transformed_queries = [w2v_average_word_embeddings(query) for query in queries_list]

In [None]:
print_search(transformed_queries, transformed_text)

Query: Semantic parsing techniques for natural language understanding
----------------------------------------
Title: Universal Syntactic Structures: Modeling Syntax for Various Natural Languages
Summary: We aim to provide an explanation for how the human brain might connect wordsfor sentence formation. A novel approach to modeling syntactic representationis introduced, potentially showing the existence of universal syntacticstructures for all natural languages. As the discovery of DNA's double helixstructure shed light on the inner workings of genetics, we wish to introduce abasic understanding of how language might work in the human brain. It could bethe brain's way of encoding and decoding knowledge. It also brings some insightinto theories in linguistics, psychology, and cognitive science. After lookinginto the logic behind universal syntactic structures and the methodology of themodeling technique, we attempt to analyze corpora that showcase universality inthe language process of 

# ALL-MPNET

This is a sentence transformers model, similar to BERT. We wanted to test this advanced model to compare its results with BERT.

In [None]:
from sentence_transformers import SentenceTransformer

# Use model v2 of the ALL-MPNET model
all_mpnet_model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
text_embeddings = all_mpnet_model.encode(text_list)
query_embeddings = all_mpnet_model.encode(queries_list)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
print_search(query_embeddings, text_embeddings)

Query: Semantic parsing techniques for natural language understanding
----------------------------------------
Title: SLFNet: Generating Semantic Logic Forms from Natural Language Using Semantic Probability Graphs
Summary: Building natural language interfaces typically uses a semantic parser toparse the user's natural language and convert it into structured\textbf{S}emantic \textbf{L}ogic \textbf{F}orms (SLFs). The mainstream approachis to adopt a sequence-to-sequence framework, which requires that naturallanguage commands and SLFs must be represented serially. Since a single naturallanguage may have multiple SLFs or multiple natural language commands may havethe same SLF, training a sequence-to-sequence model is sensitive to the choiceamong them, a phenomenon recorded as "order matters". To solve this problem, wepropose a novel neural network, SLFNet, which firstly incorporates dependentsyntactic information as prior knowledge and can capture the long-rangeinteractions between context

# Evaluation
### the function

In [None]:
# load labeled data from csv
labeled_df = pd.read_csv('labeled_data.csv')
original_queries_list = labeled_df['query'].tolist()
processed_queries_list = [pre_process_text(text) for text in original_queries_list]

# Use the text from the test data
test_data = pd.read_csv('test_data.csv')
text_list = test_data['title'].tolist()
processed_titles_list = [pre_process_text(text) for text in text_list]

# evaluation function
def evaluate_results(query_vectors, text_vectors):
  """Inputs:
  \* query_vectors: list of vectorized queries
  \* text_vectors: list of vectorized text
  Output: (precision, recall)"""

  # hold calculated values
  precisions_list = []
  recalls_list = []

  # loop through queries
  for q_i, query_vec in enumerate(query_vectors):
    similarity_scores = {}

    # calculate similarity score for each text_vec
    for t_i, text_vec in enumerate(text_vectors):
      similarity_scores[t_i] = cosine_distance_based_similarity(text_vec, query_vec)

    # sort results and get list of matched ids
    ranked_texts = {id: text for id, text in sorted(similarity_scores.items(), key=lambda x: x[1], reverse=True)}
    matched_ids = list(ranked_texts.keys())[:5]
    print("Matched doc ids:", matched_ids)


    # began evaluation calculations!
    true_positives = 0

    # gather list of positives
    positives_list = labeled_df.iloc[q_i][1:].tolist()
    print("Actual doc ids:", positives_list)

    # count true positives
    for m_id in matched_ids:
      if m_id in positives_list:
        true_positives += 1

    # Precision means num of relevant docs retrieved / total num of relevant docs
    precisions_list.append(true_positives / 5)

    # Recall means num of relevant docs retrieved  / all relevant docs
    recalls_list.append(true_positives / 5)

  return (np.mean(precisions_list), np.mean(recalls_list))

### BERT

In [None]:
text_embeddings = bert_model.encode(processed_titles_list)
query_embeddings = bert_model.encode(processed_queries_list)

bert_precision, bert_recall = evaluate_results(query_embeddings, text_embeddings)
print(f'BERT precision: {bert_precision} and recall: {bert_recall}')

Matched doc ids: [10, 38, 47, 53, 1]
Actual doc ids: [1.0, 2.0, 3.0, 4.0, 5.0]
Matched doc ids: [7, 8, 17, 73, 72]
Actual doc ids: [6.0, 7.0, 8.0, 9.0, 10.0]
Matched doc ids: [9, 15, 62, 33, 49]
Actual doc ids: [11.0, 12.0, 13.0, 14.0, 15.0]
Matched doc ids: [9, 15, 62, 49, 33]
Actual doc ids: [16.0, 17.0, 18.0, 19.0, 20.0]
Matched doc ids: [24, 23, 22, 21, 31]
Actual doc ids: [21.0, 22.0, 23.0, 24.0, 25.0]
Matched doc ids: [28, 52, 79, 17, 47]
Actual doc ids: [26.0, 27.0, 28.0, 29.0, 30.0]
Matched doc ids: [30, 73, 63, 7, 95]
Actual doc ids: [31.0, 32.0, 33.0, 34.0, 35.0]
Matched doc ids: [95, 98, 7, 73, 63]
Actual doc ids: [36.0, 37.0, 38.0, 39.0, 40.0]
Matched doc ids: [21, 42, 44, 24, 11]
Actual doc ids: [41.0, 42.0, 43.0, 44.0, 45.0]
Matched doc ids: [90, 77, 37, 64, 12]
Actual doc ids: [40.0, 46.0, 47.0, 48.0, 49.0]
Matched doc ids: [8, 7, 17, 47, 61]
Actual doc ids: [50.0, 51.0, 52.0, 53.0, 54.0]
Matched doc ids: [15, 56, 55, 9, 40]
Actual doc ids: [55.0, 56.0, 57.0, 58.0, 59.0]

### TF-IDF

In [None]:
text_embeddings = tfidf_vectorizer.transform(processed_titles_list).toarray()
query_embeddings = tfidf_vectorizer.transform(processed_queries_list).toarray()

tfidf_precision, tfidf_recall = evaluate_results(query_embeddings, text_embeddings)
print(f'TF-IDF precision: {tfidf_precision} and recall: {tfidf_recall}')

Matched doc ids: [0, 3, 75, 1, 10]
Actual doc ids: [1.0, 2.0, 3.0, 4.0, 5.0]
Matched doc ids: [7, 8, 5, 9, 17]
Actual doc ids: [6.0, 7.0, 8.0, 9.0, 10.0]
Matched doc ids: [37, 16, 15, 49, 20]
Actual doc ids: [11.0, 12.0, 13.0, 14.0, 15.0]
Matched doc ids: [17, 71, 73, 52, 36]
Actual doc ids: [16.0, 17.0, 18.0, 19.0, 20.0]
Matched doc ids: [24, 22, 23, 21, 20]
Actual doc ids: [21.0, 22.0, 23.0, 24.0, 25.0]
Matched doc ids: [87, 25, 74, 28, 88]
Actual doc ids: [26.0, 27.0, 28.0, 29.0, 30.0]
Matched doc ids: [34, 73, 42, 31, 30]
Actual doc ids: [31.0, 32.0, 33.0, 34.0, 35.0]
Matched doc ids: [65, 96, 94, 64, 34]
Actual doc ids: [36.0, 37.0, 38.0, 39.0, 40.0]
Matched doc ids: [42, 40, 44, 0, 1]
Actual doc ids: [41.0, 42.0, 43.0, 44.0, 45.0]
Matched doc ids: [68, 65, 45, 89, 93]
Actual doc ids: [40.0, 46.0, 47.0, 48.0, 49.0]
Matched doc ids: [7, 8, 5, 9, 68]
Actual doc ids: [50.0, 51.0, 52.0, 53.0, 54.0]
Matched doc ids: [16, 15, 0, 49, 20]
Actual doc ids: [55.0, 56.0, 57.0, 58.0, 59.0]
Mat

  return dot_product / (norm_vector1 * norm_vector2)


### word2vec

In [None]:
text_embeddings = [w2v_average_word_embeddings(doc) for doc in processed_titles_list]
query_embeddings = [w2v_average_word_embeddings(doc) for doc in processed_queries_list]

w2v_precision, w2v_recall = evaluate_results(query_embeddings, text_embeddings)
print(f'word2vec precision: {w2v_precision} and recall: {w2v_recall}')

Matched doc ids: [3, 82, 10, 0, 75]
Actual doc ids: [1.0, 2.0, 3.0, 4.0, 5.0]
Matched doc ids: [8, 7, 17, 18, 73]
Actual doc ids: [6.0, 7.0, 8.0, 9.0, 10.0]
Matched doc ids: [37, 9, 16, 15, 62]
Actual doc ids: [11.0, 12.0, 13.0, 14.0, 15.0]
Matched doc ids: [71, 9, 36, 31, 16]
Actual doc ids: [16.0, 17.0, 18.0, 19.0, 20.0]
Matched doc ids: [24, 22, 23, 21, 93]
Actual doc ids: [21.0, 22.0, 23.0, 24.0, 25.0]
Matched doc ids: [28, 79, 88, 87, 74]
Actual doc ids: [26.0, 27.0, 28.0, 29.0, 30.0]
Matched doc ids: [34, 42, 33, 31, 73]
Actual doc ids: [31.0, 32.0, 33.0, 34.0, 35.0]
Matched doc ids: [98, 96, 65, 42, 54]
Actual doc ids: [36.0, 37.0, 38.0, 39.0, 40.0]
Matched doc ids: [42, 44, 58, 94, 22]
Actual doc ids: [41.0, 42.0, 43.0, 44.0, 45.0]
Matched doc ids: [65, 68, 31, 45, 83]
Actual doc ids: [40.0, 46.0, 47.0, 48.0, 49.0]
Matched doc ids: [47, 5, 81, 36, 9]
Actual doc ids: [50.0, 51.0, 52.0, 53.0, 54.0]
Matched doc ids: [54, 40, 20, 57, 56]
Actual doc ids: [55.0, 56.0, 57.0, 58.0, 59.

  return dot_product / (norm_vector1 * norm_vector2)


# ALL-MPNET

In [None]:
text_embeddings = all_mpnet_model.encode(text_list)
query_embeddings = all_mpnet_model.encode(processed_queries_list)
allmpnet_precision, allmpnet_recall = evaluate_results(query_embeddings, text_embeddings)
print(f'allmpnet precision: {allmpnet_precision} and recall: {allmpnet_recall}')

Matched doc ids: [0, 1, 3, 4, 12]
Actual doc ids: [1.0, 2.0, 3.0, 4.0, 5.0]
Matched doc ids: [17, 61, 43, 63, 7]
Actual doc ids: [6.0, 7.0, 8.0, 9.0, 10.0]
Matched doc ids: [33, 62, 16, 37, 49]
Actual doc ids: [11.0, 12.0, 13.0, 14.0, 15.0]
Matched doc ids: [62, 17, 9, 36, 15]
Actual doc ids: [16.0, 17.0, 18.0, 19.0, 20.0]
Matched doc ids: [22, 24, 20, 57, 21]
Actual doc ids: [21.0, 22.0, 23.0, 24.0, 25.0]
Matched doc ids: [25, 74, 12, 62, 17]
Actual doc ids: [26.0, 27.0, 28.0, 29.0, 30.0]
Matched doc ids: [34, 43, 30, 32, 63]
Actual doc ids: [31.0, 32.0, 33.0, 34.0, 35.0]
Matched doc ids: [59, 97, 43, 7, 42]
Actual doc ids: [36.0, 37.0, 38.0, 39.0, 40.0]
Matched doc ids: [40, 42, 44, 22, 48]
Actual doc ids: [41.0, 42.0, 43.0, 44.0, 45.0]
Matched doc ids: [45, 68, 65, 90, 91]
Actual doc ids: [40.0, 46.0, 47.0, 48.0, 49.0]
Matched doc ids: [52, 7, 47, 45, 63]
Actual doc ids: [50.0, 51.0, 52.0, 53.0, 54.0]
Matched doc ids: [56, 54, 40, 31, 20]
Actual doc ids: [55.0, 56.0, 57.0, 58.0, 59.