# Experiments Overview

### Model Overview

Let's first take a look at the parabel model

* It is a label tree, with "distributions" not classifiers at each node.
    * Practically, this means we will be dealing in the probabilities not hard labels.
* Representation used is as implemented in utils.generate_parabel_representations.
* Partitioning algorithm is as implemented in partition.ParabelPartitioner
    * Hyperparameter : M = size of the leaf
* Each internal/leaf classifier is binary relevance of SVMs.
    * SVMs are trained with squared hinge loss/log loss & L2 regularization.
    * Hyperparameter : SHL vs LL
    * Hyperparameter : C = misclassification penalty.
* Prediction is done by beam search over the tree. 
    * Hyperparameter : P = width of beam/number of solutions
* Finally, parabel is also considered as an ensemble.
    * Hyperparameter : T = number of trees
    

Regarding the classifiers, we will use sklearn.svm.LinearSVC, which is based on liblinear implementation and is hopefully close to original results.

### Experiment 2

The paper reports P@1,P@3,P@5 on the following (open) datasets

- [ ] Eurlex-4k
- [ ] WikiLSHTC-325k
- [ ] Amazon-3M
- [ ] Wiki500k
- [ ] Amazon-670k

The hyperparameters used are

* M=100, P=10
* (loss,C,T) == (SHL,1,1) ; (SHL,1,3) ; (LL,10,3)


### Experiment 1


The XML repo additionally reports P@1,P@3,P@5 on the following small datasets

- [x] Mediamill
- [ ] Delicious
- [ ] Bibtex

Since hyperparameters are not reported, we will use the following values

* M=5
* P=10
* loss = SHL, C=1
* T=1 


In [1]:
%%HTML
<style>
td,th {
  font-size: 16px
}
</style>

The performances reported for Parabel are


| Dataset | P@1 | P@3 | P@5 |
|---------|-----|-----|-----|
|Mediamill|83.91|67.12|52.99|
|Delicious|67.44|61.83|56.75|
|Eurlex-4k|81.73|68.78|57.44|

## Initialization

In [2]:
import sys
sys.path.append("..")

In [3]:
import pandas as pd

In [4]:
def average_over_repetitions(results):
    mean_results=results.groupby(["num_trees","dataset"]).mean()
    mean_results=mean_results.drop(columns=["rep_num"])
    mean_results.reset_index(inplace=True)
    return mean_results

In [31]:
results=None
with open("../experiments/prabhu_results.csv","r") as fi:
    results=pd.read_csv(fi)

In [32]:
results

Unnamed: 0,p@1,p@3,p@5,ranking_loss,coverage_error,avg_prec_score,ndcg@1,ndcg@3,ndcg@5,dcg@1,dcg@3,dcg@5,dataset,split,rep_num,num_trees
0,0.551491,0.327899,0.242783,0.189450,50.363419,0.500738,0.551491,0.501579,0.521157,0.551491,0.801902,0.896626,bibtex,0.0,0.0,1.0
1,0.568588,0.347515,0.259801,0.122763,34.896223,0.526970,0.568588,0.526592,0.550226,0.568588,0.842321,0.947737,bibtex,0.0,0.0,2.0
2,0.574155,0.353214,0.260199,0.110035,31.765805,0.535899,0.574155,0.534970,0.555603,0.574155,0.855162,0.954925,bibtex,0.0,0.0,3.0
3,0.550298,0.330285,0.242624,0.185564,49.004771,0.502397,0.550298,0.503217,0.519961,0.550298,0.805248,0.896617,bibtex,0.0,1.0,1.0
4,0.562624,0.343804,0.255348,0.137608,38.144732,0.519425,0.562624,0.519753,0.541129,0.562624,0.834188,0.935210,bibtex,0.0,1.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,0.831733,0.669480,0.525306,0.089976,32.884931,0.714298,0.831733,0.748293,0.714287,0.831733,1.513889,1.768072,mediamill,2.0,3.0,2.0
146,0.837463,0.669325,0.529549,0.088493,31.395927,0.718075,0.837463,0.749311,0.717949,0.837463,1.516501,1.779330,mediamill,2.0,3.0,3.0
147,0.797661,0.660704,0.505405,0.128862,41.123742,0.682836,0.797661,0.732505,0.689343,0.797661,1.488028,1.712349,mediamill,2.0,4.0,1.0
148,0.837463,0.673791,0.531470,0.084843,31.080223,0.722578,0.837463,0.752333,0.720141,0.837463,1.523473,1.784902,mediamill,2.0,4.0,2.0


In [34]:
mean_results=average_over_repetitions(results)
filtered=mean_results.filter(["p@1","p@3","p@5","ndcg@1","ndcg@3","ndcg@5","dcg@1","dcg@3","dcg@5","num_trees","dataset"])
mediamill_results=filtered[filtered["dataset"]=="mediamill"]
delicious_results=filtered[filtered["dataset"]=="delicious"]
bibtex_results=filtered[filtered["dataset"]=="bibtex"]
eurlex_results=filtered[filtered["dataset"]=="eurlex"]

### Mediamill

In [21]:
display(mediamill_results)

Unnamed: 0,p@1,p@3,p@5,ndcg@1,ndcg@3,ndcg@5,dcg@1,dcg@3,dcg@5,num_trees,dataset
2,0.80843,0.658875,0.507601,0.80843,0.733224,0.693589,0.80843,1.487838,1.718968,1.0,mediamill
5,0.823804,0.66625,0.522637,0.823804,0.7427,0.709566,0.823804,1.505785,1.758438,2.0,mediamill
8,0.820634,0.669477,0.526332,0.820634,0.744593,0.712395,0.820634,1.509789,1.766082,3.0,mediamill


**Results are close to reported.**

### Delicious

In [22]:
display(delicious_results)

Unnamed: 0,p@1,p@3,p@5,ndcg@1,ndcg@3,ndcg@5,dcg@1,dcg@3,dcg@5,num_trees,dataset
1,0.621622,0.567389,0.524149,0.621622,0.58036,0.548087,0.621622,1.235881,1.612156,1.0,delicious
4,0.643203,0.586422,0.540689,0.643203,0.600071,0.565936,0.643203,1.277935,1.664711,2.0,delicious
7,0.648687,0.592493,0.545658,0.648687,0.606083,0.571185,0.648687,1.290747,1.680148,3.0,delicious


**P@k measures are off by about 4% from reported**. Possible reasons -

* We chose leaf size M=5 ad-hoc.
* Weight clipping (which is essentially a regularization trick) was applied in the original paper, which we did not implement

### Bibtex

In [36]:
display(bibtex_results)

Unnamed: 0,p@1,p@3,p@5,ndcg@1,ndcg@3,ndcg@5,dcg@1,dcg@3,dcg@5,num_trees,dataset
0,0.555176,0.332158,0.245042,0.555176,0.507202,0.527429,0.555176,0.810629,0.904631,1.0,bibtex
4,0.569463,0.345942,0.255968,0.569463,0.525591,0.54743,0.569463,0.840336,0.93995,2.0,bibtex
8,0.579642,0.354478,0.261848,0.579642,0.537892,0.560146,0.579642,0.859565,0.960747,3.0,bibtex


**Results are off by about 10% from reported.**

### Eurlex

In [35]:
display(eurlex_results)

Unnamed: 0,p@1,p@3,p@5,ndcg@1,ndcg@3,ndcg@5,dcg@1,dcg@3,dcg@5,num_trees,dataset
2,0.663691,0.529815,0.430654,0.663787,0.563047,0.508288,0.663787,1.193999,1.426257,1.0,eurlex
6,0.700814,0.571139,0.47057,0.700866,0.603868,0.550209,0.700866,1.280629,1.543641,2.0,eurlex
10,0.727067,0.591459,0.486931,0.727049,0.625575,0.569746,0.727049,1.326516,1.598203,3.0,eurlex


**Results are off by about 10% from reported**