# Retriever tunner

A simple tool to compare and tune retriever performance, given a desired ranking to strive for. 
The goal is to provide a simple metric to measure how a given retriever is close to the 'ideal', generated for example
with a use of more expensive, slower or simply no-existant method. 

In [1]:
import sys
sys.path.append('../')
from python_modules.retriever_tunner import RetrieverTunner

### 1. Downloading test data

In [2]:
import nltk
from nltk.corpus import brown

# Download the necessary datasets from NLTK
nltk.download('brown')
nltk.download('punkt')

# Load the Brown Corpus as plain text
brown_corpus_text = ' '.join(brown.words())

# Split the corpus into sentences
sentences = nltk.sent_tokenize(brown_corpus_text)

# Display the number of sentences and first few sentences as a sample
print(f"Number of sentences: {len(sentences)}")
print("First few sentences:")
for sentence in sentences[:5]:
    print(sentence)

[nltk_data] Downloading package brown to
[nltk_data]     /Users/insani_dei/nltk_data...
[nltk_data]   Package brown is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/insani_dei/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Number of sentences: 56601
First few sentences:
The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election produced `` no evidence '' that any irregularities took place .
The jury further said in term-end presentments that the City Executive Committee , which had over-all charge of the election , `` deserves the praise and thanks of the City of Atlanta '' for the manner in which the election was conducted .
The September-October term jury had been charged by Fulton Superior Court Judge Durwood Pye to investigate reports of possible `` irregularities '' in the hard-fought primary which was won by Mayor-nominate Ivan Allen Jr. .
`` Only a relative handful of such reports was received '' , the jury said , `` considering the widespread interest in the election , the number of voters and the size of this city '' .
The jury said it did find that many of Georgia's registration and election laws `` are outmoded or inadequate and often ambiguous '' .


### 2. Initialize RetrieverTunner

In [3]:
rt = RetrieverTunner(
    # optional/ required
    search_values_list = sentences[0:2000],
    target_ranking_name = 'all-mpnet-base-v2',
    embedding_model_names = ['paraphrase-multilingual-mpnet-base-v2',
                             'all-mpnet-base-v2', 
                             'multi-qa-mpnet-base-dot-v1', 
                             'all-MiniLM-L6-v2'],
    # optional
    similarity_search_h_params = {'processing_type' : 'parallel',
                                                  'max_workers' : 8,
                                                   'tbatch_size' : 1000},
    n_random_queries = 100,
    seed = 23,
    metrics_params = {'n_results' : [1,3,5,10],   
                      'ceilings' : [10],
                      'prep_types' : ['correction', 'ceiling'],
                      'weights_ratio' : 0.6,
                      'weights_sum' : 1,
                      'inverted' : True},
    # for plotting
    plots_params = {'top_n' : 3,
                    'text_lim' : 10,
                    'alpha' : 0.5,
                    'save_comp_plot' : False}
    )


### 3. Construct ranking

In [4]:
rt.construct_rankings(
    # optional
    queries = rt.queries,
    queries_filters = None,
    search_values_dicts = None,
    search_values_list = rt.search_values_list,
    model_names = rt.embedding_model_names,
    handlers = rt.sim_search_handlers
)

### 4. Make scores

In [5]:
rt.make_scores_dict(
    # optional
    target_ranking = rt.ranking_dicts[rt.target_ranking_name],
    compared_rankings = {ranking : rt.ranking_dicts[ranking] for ranking in rt.embedding_model_names \
                if ranking != rt.target_ranking_name},
    n_results = [1,2], 
    ceilings=[], 
    prep_types=['correction'],
    weights_ratio=0.8, # weight skewed to right
    weights_sum = rt.metrics_params['weights_sum'],
    inverted = True)

{'all-mpnet-base-v2|paraphrase-multilingual-mpnet-base-v2': {'rdm|1|100|correction': 1.0,
  'rdm|2|100|correction': 0.5072,
  'rdm|100|100|correction': 0.0},
 'all-mpnet-base-v2|multi-qa-mpnet-base-dot-v1': {'rdm|1|100|correction': 0.99,
  'rdm|2|100|correction': 0.5348,
  'rdm|100|100|correction': 0.0},
 'all-mpnet-base-v2|all-MiniLM-L6-v2': {'rdm|1|100|correction': 0.99,
  'rdm|2|100|correction': 0.5136000000000001,
  'rdm|100|100|correction': 0.0}}

### 5. Plot rankings

In [10]:
rt.show_model_comparison_plot(
    # optional
    ranking_dicts = rt.ranking_dicts,
    target_model = rt.target_ranking_name,
    compared_model = 'paraphrase-multilingual-mpnet-base-v2',
    top_n = 3, 
    alpha = 0.5,
    text_lim = rt.plots_params['text_lim']
)

In [8]:
rt.show_model_comparison_plots(
    # optional
    ranking_dicts = rt.ranking_dicts,
    target_model = rt.target_ranking_name,
    compared_models = [model_name for model_name in rt.embedding_model_names \
                if model_name != rt.target_ranking_name],
    top_n = 3, 
    alpha = 0.5,
    text_lim = rt.plots_params['text_lim'])