# Pairwise Evaluation

In this notebook, pointwise can be combined with pairwise algorithms.
To start an experiment, define it using the following parameters:

<b>name</b>: Name of the experiment <br>
<b>model</b>: The model to use (Possible choices are nbg, lr, svm, dt, rf, ada, gb) <br>
<b>pca</b>: PCA components for dimensionality reduction (None with 0) <br>
<b>search_space</b>: Values to use in bayesian optimization (Optional) <br>
<b>trials</b>: Number of hyperparameter optimization trials (Optional) <br>
<b>pairwise_model</b> Pairwise model to use for finetuning (Possible choices are ranknet)
<b>pairwise_top_k</b> Number of top ranked results used for reranking

### Imports

In [1]:
import os
import sys
sys.path.append(os.path.dirname((os.path.abspath(""))))

In [2]:
from src.pipeline import Pipeline

[nltk_data] Downloading package punkt to /Users/tim/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /Users/tim/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/tim/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/tim/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [3]:
pipeline = Pipeline(
    collection='data/processed/30_5000_1000_collection.pkl',
    queries='data/processed/30_5000_1000_queries.pkl',
    queries_val='data/processed/30_5000_1000_queries_val.pkl',
    queries_test='data/processed/30_5000_1000_queries_test.pkl',
    features='data/processed/30_5000_1000_features.pkl',
    qrels_val='data/processed/30_5000_1000_qrels_val.pkl',
    qrels_test='data/processed/30_5000_1000_qrels_test.pkl',
    features_test='data/processed/30_5000_1000_features_test.pkl',
    features_val='data/processed/30_5000_1000_features_val.pkl',
)

In [4]:
pipeline.features

Unnamed: 0,qID,pID,y,w2v_cosine,w2v_euclidean,w2v_manhattan,w2v_tfidf_cosine,w2v_tfidf_euclidean,w2v_tfidf_manhattan,tfidf_cosine,...,polarity_doc,subjectivity_query,polarity_query,bm25,doc_nouns,doc_adjectives,doc_verbs,query_nouns,query_adjectives,query_verbs
0,603195,7050012,1,0.972107,144.641830,1124.871630,0.938781,2.765727,22.236694,0.537439,...,0.000000,0.00,0.00,-24.655536,23,6,4,3,1,1
1,474183,325505,1,0.971866,131.960266,1033.670312,0.985675,1.360485,11.347487,0.745907,...,0.450000,0.00,0.00,-33.129796,18,9,3,4,0,0
2,320545,1751825,1,0.947701,94.900002,756.378183,0.959522,2.236971,17.352688,0.409509,...,0.500000,0.20,0.20,-16.699603,20,2,14,2,1,1
3,89798,5069949,1,0.972710,161.470459,1273.643564,0.933304,1.714253,13.493497,0.541627,...,0.066667,0.25,0.00,-27.678576,25,10,5,3,1,0
4,1054603,2869106,1,0.965680,155.648453,1216.564726,0.941391,1.799412,14.369308,0.438115,...,0.000000,0.00,0.00,-28.497519,20,9,6,2,2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,128401,6127598,0,0.796978,85.670822,678.466760,0.555981,3.027138,24.841764,0.185056,...,-0.520833,0.00,0.00,-8.866170,16,6,13,2,1,0
4996,1044540,4616118,0,0.922095,157.044754,1238.354322,0.603788,2.167866,17.812756,0.140057,...,0.156250,0.00,0.00,-7.852468,25,9,16,0,0,1
4997,486146,1137390,0,0.946438,125.126984,972.330644,0.882998,4.161341,34.815641,0.314505,...,-0.100000,0.10,0.00,-15.909103,12,1,10,2,0,2
4998,532697,5161847,0,0.938939,99.808395,790.453814,0.893834,1.977307,16.122506,0.344173,...,0.284375,0.00,0.00,-16.617979,18,8,9,3,1,0


### RankNet

In [13]:
pipeline.evaluate(
    name='nb_pair',
    model='nbg', 
    pca=26,
    pairwise_model='ranknet',
    pairwise_top_k=80,
    models_path = 'models/ranknet_26pca.pth'
)

MRR: 0.8304409171075837
nDCG: 0.837206527358909
