# Evaluation Example
This notebook will show of how to evaluate a model by evaluating a simple TF-IDF Model based on scikit learn

## TF-IDF Test model

In [1]:
import scipy
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from src import init_data
from arqmath_code.topic_file_reader import Topic
from arqmath_code.Entities.Post import Post
from typing import List
import re

topic_reader, data_reader = init_data(task=1)

reading users
reading comments
reading votes
reading post links
reading posts


In [2]:
def title_tf_idf_model(query: Topic, answers: List[Post]) -> List[tuple[int, float]]:
    question_titles: List[str] = [question.title for question in answers]
    training_set: List[str] = question_titles.copy()
    training_set.append(query.title)

    vectorizer: TfidfVectorizer = TfidfVectorizer()
    vectorizer.fit(training_set)
    query_vector: scipy.sparse_csr.csr_matrix = vectorizer.transform([query.title])
    word_term_matrix: scipy.sparse_csr.csr_matrix = vectorizer.transform(question_titles)
    cos_sims: np.ndarray = cosine_similarity(query_vector, word_term_matrix)
    ranking: List[tuple[int, float]] = sorted(zip(range(cos_sims.shape[1]), cos_sims[0,]), key=lambda tuple: tuple[1], reverse=True)[:1000]
    return ranking

def binary_tag_retrieval(query: Topic) -> List[Post]:
    return list(set([question for tag in query.lst_tags for question in data_reader.get_question_of_tag(tag=tag)]))

def clean_post(query: Topic, answers: List[Post]) -> (Topic, List[Post]):
    query.title = re.sub(r"</?(p|span)[^>]*>", "", query.title)
    query.question = re.sub(r"</?(p|span)[^>]*>", "", query.question)
    for answer in answers:
        answer.body = re.sub(r"</?(p|span)[^>]*>", "", answer.body)
        answer.title = re.sub(r"</?(p|span)[^>]*>", "", answer.title)
    return query, answers

In [3]:
# retrieve Ranking
test_topic: Topic = topic_reader.get_topic('A.301')
answers: List[Post] = binary_tag_retrieval(test_topic)
test_topic, answers = clean_post(test_topic, answers=answers)
ranking = title_tf_idf_model(query=test_topic, answers=answers)
ranking

[(40502, 0.8749861678860322),
 (59962, 0.8262133443675792),
 (28084, 0.8246180178007355),
 (94484, 0.8221006334223673),
 (44481, 0.8181286823398406),
 (87008, 0.7704054386121624),
 (100209, 0.7704054386121624),
 (105547, 0.7704054386121624),
 (110239, 0.7674474189913771),
 (76714, 0.7664136092802841),
 (110758, 0.7503087157141877),
 (35260, 0.7322690388655665),
 (34986, 0.725126911527534),
 (74469, 0.7161845097524295),
 (92965, 0.7161845097524295),
 (26018, 0.7128283385502027),
 (23315, 0.7112627388135478),
 (46247, 0.7112627388135478),
 (46565, 0.7112627388135478),
 (67917, 0.7112627388135478),
 (81100, 0.7112627388135478),
 (40998, 0.7101750636604436),
 (57963, 0.7101750636604436),
 (72476, 0.7101750636604436),
 (87614, 0.7101750636604436),
 (109360, 0.7090624264136349),
 (11653, 0.6970979250604656),
 (75375, 0.6916667713548297),
 (18771, 0.6877624437990192),
 (75978, 0.6703310972396699),
 (25812, 0.6659733488818732),
 (5343, 0.6620726344466945),
 (9737, 0.6620726344466945),
 (7804, 

## ARQmath evaluation format
In order to evaluate the result of a retrieval pipeline, the results have to be saved in the ARQmath tsv format. Which is explained further in the [ArqMath2020-EvaluationProtocols.V1.1.pdf](documentation/ArqMath2020-EvaluationProtocols.V1.1.pdf):
```
Query_Id Post_Id Rank Score Run_Number
```
As shown below, I suggest generating a pandas Dataframe and then save it as a .tsv in ./results (this folder is in gitignore and thus not present in the repository), such that the folder structure look like this:
![../documentation/resutls_folders.png](../documentation/resutls_folders.png)

In [4]:
import pandas as pd

In [5]:
df_dict = {
    "Query_Id": [test_topic.topic_id for i in range(len(ranking))],
    "Post_Id": [answers[answer_idx].post_id for answer_idx, score in ranking],
    "Rank": [i for i in range(len(ranking))],
    "Score": [score for answer_idx, score in ranking],
    "Run_Number": [0 for i in range(len(ranking))]
}
df = pd.DataFrame(df_dict)
df

Unnamed: 0,Query_Id,Post_Id,Rank,Score,Run_Number
0,A.301,1497800,0,0.874986,0
1,A.301,1106437,1,0.826213,0
2,A.301,1662575,2,0.824618,0
3,A.301,2441344,3,0.822101,0
4,A.301,1371489,4,0.818129,0
...,...,...,...,...,...
995,A.301,437039,995,0.347314,0
996,A.301,2663914,996,0.347215,0
997,A.301,2438994,997,0.346995,0
998,A.301,1940985,998,0.346833,0


In [6]:
df.to_csv(path_or_buf="../results/model_results/tf-idf-test.tsv", sep='\t', index=False)

## Evaluating Results

In [7]:
from arqmath_code.evaluation.task1 import arqmath_to_prime_task1
from arqmath_code.evaluation.task1 import task1_get_results

As a first step the results have to be converted to trec format:

In [33]:
qrel_dictionary = arqmath_to_prime_task1.read_qrel_to_dictionary("../arqmath_dataset/evaluation/Task 1/Qrel Files/qrel_task1_2022_all.tsv")
arqmath_to_prime_task1.convert_result_files_to_trec(submission_dir="../results/model_results/", qrel_result_dic=qrel_dictionary, prim_dir="../results/ARQmath_prim/", trec_dir="../results/ARQmath_trec/")

In the second phase the actual performance metrics can be computed. For this u have to have trec_eval installed!!! On mac this can be done via homebrew.

In [34]:
number_topics = 78.0
task1_get_results.get_result(trec_eval_tool="trec_eval", qre_file_path="../arqmath_dataset/evaluation/Task 1/Qrel Files/qrel_task1_2022_all.tsv", prim_result_dir="../results/ARQmath_prim/", evaluation_result_file="../results/test1.tsv", number_topics=number_topics)

trec_eval.get_results: Cannot read results file '../results/ARQmath_prim/prime_tf-idf-test.tsv'
trec_eval: Quit in file '../results/ARQmath_prim/prime_tf-idf-test.tsv'


CalledProcessError: Command '['trec_eval', '../arqmath_dataset/evaluation/Task 1/Qrel Files/qrel_task1_2022_all.tsv', '../results/ARQmath_prim/prime_tf-idf-test.tsv', '-l2', '-m', 'ndcg', '-q']' returned non-zero exit status 2.

Don't be fooled by the output above.. The scripts etc. works.. however, if your ir-model is returning no Post_ID that is in the QREL for the topic, then the prime file will be empty and  the above error will be thrown.