# Listwise Regression

In this notebook, we examine the performance of listwise regression. First, we load the required dependencies and data.

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
import pandas as pd
from category_encoders.binary import BinaryEncoder
from category_encoders.one_hot import OneHotEncoder
from category_encoders.ordinal import OrdinalEncoder
from category_encoders.target_encoder import TargetEncoder
from datetime import timedelta
from sklearn.compose import ColumnTransformer
from time import time

from src import configuration as config
from src.features.encoder_utils import NoY
from src.pipeline.pipeline_factory import PipelineFactory, ModelType, EvaluationType
from src.pipeline.pipeline_transformers import *


# load the data
train_df = config.load_traindata_for_pointwise()
pipelineFactory = PipelineFactory()

## 1) Listwise Multidimensional Regression

In [2]:
start = time()

pipeline = pipelineFactory.create_pipeline(
    train_df,
    ModelType.LISTWISE_MULTIDIMENSIONAL_REGRESSION_NO_SEARCH,
    verbose_level=1,
    evaluation=EvaluationType.CROSS_VALIDATION,
    n_folds=5,
    workers=1,
    target="rank"
)
pipeline.run()

runtime = int(time() - start)
print('\nruntime: ' + str(timedelta(seconds=runtime)) + ' [' + str(runtime) + 's]')

Creating pipeline ...
Starting pipeline using method: EvaluationType.CROSS_VALIDATION


  return reduction(axis=axis, out=out, **passkwargs)
100%|██████████| 5/5 [00:37<00:00,  7.52s/it]

Finished running the pipeline
Evaluation metrics:
    validation_average_spearman_fold_0: 0.7693
    validation_average_spearman_fold_1: 0.7382
    validation_average_spearman_fold_2: 0.7188
    validation_average_spearman_fold_3: 0.7493
    validation_average_spearman_fold_4: 0.7273
    average of all folds: 0.7406 [std=0.0176]

runtime: 0:00:40 [40s]





### 1.1) Tuning with Bayes Regression

In [3]:
start = time()

# number of optimization rounds = n_iter / n_points (e.g. 50 rounds in our case)
n_iter = 200 # how many unique parameters to examine - our default: 200
n_points = 4 # how many unique parameter combinations per optimization round - our default: 4
cv = 4 # how many fits for each unique parameter combination - our default: 4
n_jobs = -1 # how many fits in parallel (only parallelizable per round) - our default: -1

pipeline = pipelineFactory.create_pipeline(
    train_df,
    ModelType.LISTWISE_MULTIDIMENSIONAL_REGRESSION_BAYES_SEARCH,
    verbose_level=1,
    target="rank",
    bayes_n_iter=n_iter,
    bayes_n_points=n_points,
    bayes_cv=cv,
    bayes_n_jobs=n_jobs
)

pipeline.run()

runtime = int(time() - start)
print('\nruntime: ' + str(timedelta(seconds=runtime)) + ' [' + str(runtime) + 's]')

Creating pipeline ...
Starting pipeline using method: EvaluationType.BAYES_SEARCH
Performing bayes search


  return reduction(axis=axis, out=out, **passkwargs)


Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates



Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
best score: 0.712702011104151
best params:
    dataset_transformer__encoder: OneHotEncoder()
    dataset_transformer__expected_pca_variance: 0.25
    dataset_transformer__nan_ratio_feature_drop_threshold: 0.5
    estimator__max_depth: 50
    estimator__max_features: None
    estimator__min_samples_leaf: 1
    estimator__min_samples_split: 2
    general_transformer__model_encoder: OneHotEncoder()
    general_transformer__scoring_encoder: OrdinalEncoder()
    general_transformer__tuning_encoder: OneHotEncoder()
Training pipeline with best parameters...
Evaluating pipeline with best parameters...


100%|██████████| 5/5 [00:20<00:00,  4.15s/it]

Finished running the pipeline
Evaluation metrics:
    validation_average_spearman_fold_0: 0.7223 [std=0.]
    validation_average_spearman_fold_1: 0.694 [std=0.]
    validation_average_spearman_fold_2: 0.7307 [std=0.]
    validation_average_spearman_fold_3: 0.7466 [std=0.]
    validation_average_spearman_fold_4: 0.7167 [std=0.]
    average_spearman (5-fold): 0.7221 [std=0.0173]

runtime: 0:43:57 [2637s]





## 2) Listwise Dimensionwise Regression

In [4]:
start = time()

pipeline = pipelineFactory.create_pipeline(
    train_df,
    ModelType.LISTWISE_DIMENSIONWISE_REGRESSION_NO_SEARCH,
    verbose_level=1,
    evaluation=EvaluationType.CROSS_VALIDATION,
    n_folds=5,
    workers=1,
    target="rank"
)
pipeline.run()

runtime = int(time() - start)
print('\nruntime: ' + str(timedelta(seconds=runtime)) + ' [' + str(runtime) + 's]')

Creating pipeline ...
Starting pipeline using method: EvaluationType.CROSS_VALIDATION


  return reduction(axis=axis, out=out, **passkwargs)
100%|██████████| 5/5 [00:25<00:00,  5.09s/it]

Finished running the pipeline
Evaluation metrics:
    validation_average_spearman_fold_0: 0.6331
    validation_average_spearman_fold_1: 0.6211
    validation_average_spearman_fold_2: 0.6168
    validation_average_spearman_fold_3: 0.6169
    validation_average_spearman_fold_4: 0.6082
    average of all folds: 0.6192 [std=0.0081]

runtime: 0:00:26 [26s]





### 2.1) Tuning with Bayes Regression

In [5]:
start = time()

# number of optimization rounds = n_iter / n_points (e.g. 50 rounds in our case)
n_iter = 200 # how many unique parameters to examine - our default: 200
n_points = 4 # how many unique parameter combinations per optimization round - our default: 4
cv = 4 # how many fits for each unique parameter combination - our default: 4
n_jobs = -1 # how many fits in parallel (only parallelizable per round) - our default: -1

pipeline = pipelineFactory.create_pipeline(
    train_df,
    ModelType.LISTWISE_DIMENSIONWISE_REGRESSION_BAYES_SEARCH,
    verbose_level=1,
    target="rank",
    bayes_n_iter=n_iter,
    bayes_n_points=n_points,
    bayes_cv=cv,
    bayes_n_jobs=n_jobs
)

pipeline.run()

runtime = int(time() - start)
print('\nruntime: ' + str(timedelta(seconds=runtime)) + ' [' + str(runtime) + 's]')

Creating pipeline ...
Starting pipeline using method: EvaluationType.BAYES_SEARCH
Performing bayes search


  return reduction(axis=axis, out=out, **passkwargs)


Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits




Fitting 4 folds for each of 4 candidates, totalling 16 fits
Fitting 4 folds for each of 4 candidates, totalling 16 fits
best score: 0.6256797269490846
best params:
    dataset_transformer__encoder: OneHotEncoder()
    dataset_transformer__expected_pca_variance: 0.542020095013966
    dataset_transformer__nan_ratio_feature_drop_threshold: 0.25
    estimator__estimator__max_depth: 250
    estimator__estimator__max_features: None
    estimator__estimator__min_samples_leaf: 1
    estimator__estimator__min_samples_split: 2
    general_transformer__model_encoder: OneHotEncoder()
    general_transformer__scoring_encoder: OrdinalEncoder()
    general_transformer__tuning_encoder: OrdinalEncoder()
Training pipeline with best parameters...
Evaluating pipeline with best parameters...


100%|██████████| 5/5 [00:17<00:00,  3.59s/it]

Finished running the pipeline
Evaluation metrics:
    validation_average_spearman_fold_0: 0.6513 [std=0.]
    validation_average_spearman_fold_1: 0.6415 [std=0.]
    validation_average_spearman_fold_2: 0.623 [std=0.]
    validation_average_spearman_fold_3: 0.6133 [std=0.]
    validation_average_spearman_fold_4: 0.6112 [std=0.]
    average_spearman (5-fold): 0.6281 [std=0.0158]

runtime: 0:41:09 [2469s]



