# Recommender Systems 2023/24

### Practice - Hyperparameter optimization with Optuna

### Hyperparameter optimization is essential to achieve the best recommendation quality!!

## How does it work
* Split the data in training, validation and test. Ensure that the distribution of the three sets is similar (e.g., the validation data you use should be split with the same approach used for the test data; if you know the test data contains cold items or users, ensure the validation data does too...)
* Choose a recommender model, identify its hyperparameters and choose a value range and distribution for each.
* Select a hyperparameter configuration and fit your model using the training data. Evaluate it using the validation data.
* Repeat the previous step by exploring many possible hyperparameter configurations. Several exploration strategies are possible.
* Select the hyperparameter configuration with the best recommendation quality on the validation data. Use that configuration to fit the model on the union of training and validation data. Evaluate the final model on the test data and report that result (or submit to the course challenge).

Using directly or indirectly any information on the composition of the test data at any stage of the training or optimization process will result in *information leakage* and cause you to overestimate the quality of your model.



## Hyperparameter optimization strategies

### Grid-search
For each hyperparameter define a list of possible values, then explore *all* possible combinations. For example, in a KNN using tversky similarity you could choose these values:
* number of neighbors: [10, 50, 100, 150, 200, 250]
* shrink term: [0, 10, 20, 50, 100]
* alpha: [0.1, 0.3, 0.5, 0.7, 0.9, 1.1, 1.3, 1.5]
* beta: [0.1, 0.3, 0.5, 0.7, 0.9, 1.1, 1.3, 1.5]

You end up with *1920* possible hyperparameter configurations. Furthermore, you cannot easily fine-tune. Say that you found the optimal number of neighbors is 200, but in that area of the search space you were using a step of 50. What if there is an even better result at 215? You have to define a *new* hyperparameter range around the optimal one you found and use a smaller step, e.g., [180, 190, 200, 210, 220] and so on. 

Overall, grid-search is very sensitive to the range and distribution you choose, very rigid, and often unpractically slow due to how the number of cases grows combinatorially. It has been known for more than 20 years that it is generally a bad idea.

### Random-search
In a random search you define the range and distribution of possible values, and then you pick at random from it each hyperparameter configuration you explore.
For example, in a KNN using tversky similarity you could choose these values:
* number of neighbors: uniform random from 10 to 500
* shrink term: uniform random from 0 to 500
* alpha: uniform random from 0.1 to 1.5
* beta: uniform random from 0.1 to 1.5

It is very parallelizable and effective (definitely better than Grid-search).


### Bayesian-search
More advanced strategy that uses a gaussian process to try to model the interdependencies between hyperparameters based on how they affect the result. It contains a part of random search where the search space is explored, then a second part that uses the gaussian process to choose the next hyperparameter configuration. It combines exploration with exploitation. 

Bayesian-search is less parallelizable (the gaussian process is sequential) but has a good exploration-exploitation tradeoff. It is the strategy used in our research activity.


### Other strategies
There is no shortage of hyperparameter optimization strategies, for example the Tree-Structured Parzen Estimator with Optuna (in this notebook), as well as techniques that use previous optimization runs to attempt transfer learning between models and datasets. All those are beyond the scope of the course.



### In the course repository you will find a BayesianSearch object in the HyperparameterTuning folder. That is a simple wrapper of another library and its purpose is to provide a very simple way to tune some of the most common hyperparameters. 




## Hyperparameter sensitivity

One of the most important problems you encounter when doing hyperparameter tuning is how to select the range and distribution of the hyperparameters. There is no universal rule and some of those decisions are based on experience. Generally the suggestions we can give you are:
* Keep the search space on the larger side, better have more room to manouver than less.
* Hyperparameters like the number of neighbors, the shrink term, the number of latent dimensions of matrix factorization models etc. work well with a uniform random distribution
* Hyperparameters like the learning rate and l1 l2 regularizations work better with a log-uniform distribution as the training tends to be affected by the orders of magnitude rather than the absolute values. You need to be able to explore maybe from 10^-9 to 10^-3, but is less important to choose the specific value.

### In the course repository you will find a script called run_hyperparameter_search that contains a list of commonly used hyperparameters with the corresponding range and distribution.

### Hint: If you see that the optimization yelds a hyperparameter that has a value at either end of the range (min or max) this may indicate that in that scenarion you need to expand the search space.



## Early-stopping

A special hyperparameter is the number of epochs you should train a machine learning model for.
It is possible to put this number as a hyperparameter, but that is not a super-effective strategy. A common strategy is *early-stopping*. It works as follows:
* Select a maximum number of epochs, say 500
* Train the model for a certain number of epochs, say 5
* Evaluate the recommendation quality of the model on the validation data. Create a clone of the model.
* Continue training and evaluate the model periodically. Every time you find a better recommendation quality update the model clone, which will then represent the "best" model you have.
* If the recommendation quality does not improve for a certain number of consecutive validation steps, say 5, stop the training. If you reach the maximum number of epochs, stop the training.
* Use the "best" model clone to generate the recommendations on the validation data.
* When you train the model on the union of training and validation data, use the optimal number of epochs selected by the early-stopping at the previous step.

Usually early-stopping allows to save a lot of computational time and fine-tune the optimal number of epochs, unless the validation step takes a very large amount of time. In those cases, it can be better to either use early-stopping on the algorithm loss function (which is however a different problem then recommendation and does not guarantee you that the best absolute loss will correspond to the model with the best recommendation quality) or just select the number of epochs as any other hyperparameter.


In [1]:
from Data_manager.split_functions.split_train_validation_random_holdout import split_train_in_two_percentage_global_sample
from Data_manager.Movielens.Movielens1MReader import Movielens1MReader

data_reader = Movielens1MReader()
data_loaded = data_reader.load_data()

URM_all = data_loaded.get_URM_all()
ICM_all = data_loaded.get_ICM_from_name("ICM_genres")

Movielens1M: Verifying data consistency...
Movielens1M: Verifying data consistency... Passed!
DataReader: current dataset is: Movielens1M
	Number of items: 3883
	Number of users: 6040
	Number of interactions in URM_all: 1000209
	Value range in URM_all: 1.00-5.00
	Interaction density: 4.26E-02
	Interactions per user:
		 Min: 2.00E+01
		 Avg: 1.66E+02
		 Max: 2.31E+03
	Interactions per item:
		 Min: 0.00E+00
		 Avg: 2.58E+02
		 Max: 3.43E+03
	Gini Index: 0.53

	ICM name: ICM_genres, Value range: 1.00 / 1.00, Num features: 18, feature occurrences: 6408, density 9.17E-02
	ICM name: ICM_year, Value range: 1.92E+03 / 2.00E+03, Num features: 1, feature occurrences: 3883, density 1.00E+00


	UCM name: UCM_all, Value range: 1.00 / 1.00, Num features: 3469, feature occurrences: 24160, density 1.15E-03


### How do we perform hyperparameter optimization?
* Split the data in three *disjoint* sets: training, validation and testing data
* Define a set of hyperparameters with the range and distribution
* Explore hyperparameter space and select those with the best recommendation quality on the *validation* data (including the number of epochs for ML algorithms)
* Given the best hyperparameters, fit the model again using the union of training and validation data.
* Evaluate this last model on the testing data.

### Step 1: Split the data and create the evaluator objects

In [2]:
from Evaluation.Evaluator import EvaluatorHoldout

URM_train_validation, URM_test = split_train_in_two_percentage_global_sample(URM_all, train_percentage = 0.8)
URM_train, URM_validation = split_train_in_two_percentage_global_sample(URM_train_validation, train_percentage = 0.8)

evaluator_validation = EvaluatorHoldout(URM_validation, cutoff_list=[10])
evaluator_test = EvaluatorHoldout(URM_test, cutoff_list=[10])

EvaluatorHoldout: Ignoring 16 ( 0.3%) Users that have less than 1 test interactions
EvaluatorHoldout: Ignoring 3 ( 0.0%) Users that have less than 1 test interactions


### Step 2: Define the objective function that will be run by the optimizer

The obective function should:
* Use the optuna trial instance to acquire the hyperparameter values required
* Fit the model
* Evaluate on the validation and return the result of a desired metric

In [3]:
import optuna
import pandas as pd
from Recommenders.KNN.ItemKNNCFRecommender import ItemKNNCFRecommender

def objective_function(optuna_trial):
    
    recommender_instance = ItemKNNCFRecommender(URM_train)
    recommender_instance.fit(topK = optuna_trial.suggest_int("topK", 5, 1000),
                             shrink = optuna_trial.suggest_int("shrink", 0, 1000),
                             similarity = "cosine",
                             normalize = optuna_trial.suggest_categorical("normalize", [True, False])
                            )
    
    result_df, _ = evaluator_validation.evaluateRecommender(recommender_instance)
    
    return result_df.loc[10]["MAP"]

### Step 3: Create an object/function that will be called after every call of the objective function to log the results in a dataframe for easier readability

In [4]:
class SaveResults(object):
    
    def __init__(self):
        self.results_df = pd.DataFrame()
    
    def __call__(self, optuna_study, optuna_trial):
        hyperparam_dict = optuna_trial.params.copy()
        hyperparam_dict["result"] = optuna_trial.values[0]
        
        self.results_df = self.results_df.append(hyperparam_dict, ignore_index=True)

### Step 4: Create an optuna Study and run

In [5]:
optuna_study = optuna.create_study(direction="maximize")
        
save_results = SaveResults()
        
optuna_study.optimize(objective_function,
                      callbacks=[save_results],
                      n_trials = 10)

[I 2023-11-22 22:24:05,453] A new study created in memory with name: no-name-40ec47b9-9c95-4bea-ae89-86d4d5a7ab50


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4174.47 column/sec. Elapsed time 0.93 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.82 sec. Users per second: 1035


[I 2023-11-22 22:24:12,354] Trial 0 finished with value: 0.07160745905267855 and parameters: {'topK': 431, 'shrink': 582, 'normalize': False}. Best is trial 0 with value: 0.07160745905267855.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 3755.17 column/sec. Elapsed time 1.03 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.76 sec. Users per second: 1045


[I 2023-11-22 22:24:19,274] Trial 1 finished with value: 0.07204786673412213 and parameters: {'topK': 372, 'shrink': 28, 'normalize': False}. Best is trial 1 with value: 0.07204786673412213.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4052.44 column/sec. Elapsed time 0.96 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.62 sec. Users per second: 1072


[I 2023-11-22 22:24:25,973] Trial 2 finished with value: 0.07099528473407989 and parameters: {'topK': 478, 'shrink': 26, 'normalize': False}. Best is trial 1 with value: 0.07204786673412213.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4601.66 column/sec. Elapsed time 0.84 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 6.22 sec. Users per second: 969


[I 2023-11-22 22:24:33,215] Trial 3 finished with value: 0.09818792949893541 and parameters: {'topK': 875, 'shrink': 597, 'normalize': True}. Best is trial 3 with value: 0.09818792949893541.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4603.63 column/sec. Elapsed time 0.84 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 4.86 sec. Users per second: 1241


[I 2023-11-22 22:24:39,019] Trial 4 finished with value: 0.07210546702080602 and parameters: {'topK': 366, 'shrink': 306, 'normalize': False}. Best is trial 3 with value: 0.09818792949893541.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4781.91 column/sec. Elapsed time 0.81 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.77 sec. Users per second: 1044


[I 2023-11-22 22:24:45,761] Trial 5 finished with value: 0.06986371972427771 and parameters: {'topK': 739, 'shrink': 585, 'normalize': False}. Best is trial 3 with value: 0.09818792949893541.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4812.19 column/sec. Elapsed time 0.81 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.86 sec. Users per second: 1028


[I 2023-11-22 22:24:52,605] Trial 6 finished with value: 0.09804362576150451 and parameters: {'topK': 852, 'shrink': 913, 'normalize': True}. Best is trial 3 with value: 0.09818792949893541.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4896.62 column/sec. Elapsed time 0.79 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 3.59 sec. Users per second: 1677


[I 2023-11-22 22:24:57,058] Trial 7 finished with value: 0.07484855551339645 and parameters: {'topK': 23, 'shrink': 869, 'normalize': False}. Best is trial 3 with value: 0.09818792949893541.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4858.90 column/sec. Elapsed time 0.80 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 4.10 sec. Users per second: 1471


[I 2023-11-22 22:25:02,034] Trial 8 finished with value: 0.10446944365395547 and parameters: {'topK': 142, 'shrink': 489, 'normalize': True}. Best is trial 8 with value: 0.10446944365395547.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4751.41 column/sec. Elapsed time 0.82 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 4.87 sec. Users per second: 1236


[I 2023-11-22 22:25:07,830] Trial 9 finished with value: 0.07245227423638793 and parameters: {'topK': 343, 'shrink': 828, 'normalize': False}. Best is trial 8 with value: 0.10446944365395547.


### Step 3: Given the best hyperparameters, train on the union of URM_train and validation, to evaluate on test

In [6]:
pruned_trials = [t for t in optuna_study.trials if t.state == optuna.trial.TrialState.PRUNED]
complete_trials = [t for t in optuna_study.trials if t.state == optuna.trial.TrialState.COMPLETE]

print("Study statistics: ")
print("  Number of finished trials: ", len(optuna_study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
print("  Value Validation: ", optuna_study.best_trial.value)


Study statistics: 
  Number of finished trials:  10
  Number of pruned trials:  0
  Number of complete trials:  10
Best trial:
  Value Validation:  0.10446944365395547


In [7]:
optuna_study.best_trial

FrozenTrial(number=8, state=TrialState.COMPLETE, values=[0.10446944365395547], datetime_start=datetime.datetime(2023, 11, 22, 22, 24, 57, 63501), datetime_complete=datetime.datetime(2023, 11, 22, 22, 25, 2, 33826), params={'topK': 142, 'shrink': 489, 'normalize': True}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'topK': IntDistribution(high=1000, log=False, low=5, step=1), 'shrink': IntDistribution(high=1000, log=False, low=0, step=1), 'normalize': CategoricalDistribution(choices=(True, False))}, trial_id=8, value=None)

In [8]:
optuna_study.best_trial.params

{'topK': 142, 'shrink': 489, 'normalize': True}

In [9]:
save_results.results_df

Unnamed: 0,normalize,result,shrink,topK
0,0.0,0.071607,582.0,431.0
1,0.0,0.072048,28.0,372.0
2,0.0,0.070995,26.0,478.0
3,1.0,0.098188,597.0,875.0
4,0.0,0.072105,306.0,366.0
5,0.0,0.069864,585.0,739.0
6,1.0,0.098044,913.0,852.0
7,0.0,0.074849,869.0,23.0
8,1.0,0.104469,489.0,142.0
9,0.0,0.072452,828.0,343.0


In [10]:
recommender_instance = ItemKNNCFRecommender(URM_train + URM_validation)
recommender_instance.fit(**optuna_study.best_trial.params)

result_df, _ = evaluator_test.evaluateRecommender(recommender_instance)
result_df

ItemKNNCFRecommender: URM Detected 209 ( 5.4%) items with no interactions.
Similarity column 3883 (100.0%), 3501.20 column/sec. Elapsed time 1.11 sec
EvaluatorHoldout: Processed 6037 (100.0%) in 4.39 sec. Users per second: 1374


Unnamed: 0_level_0,PRECISION,PRECISION_RECALL_MIN_DEN,RECALL,MAP,MAP_MIN_DEN,MRR,NDCG,F1,HIT_RATE,ARHR_ALL_HITS,...,COVERAGE_USER,COVERAGE_USER_HIT,USERS_IN_GT,DIVERSITY_GINI,SHANNON_ENTROPY,RATIO_DIVERSITY_HERFINDAHL,RATIO_DIVERSITY_GINI,RATIO_SHANNON_ENTROPY,RATIO_AVERAGE_POPULARITY,RATIO_NOVELTY
cutoff,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10,0.332135,0.352922,0.143949,0.238688,0.248658,0.603195,0.297081,0.200849,0.875435,1.136934,...,0.999503,0.875,0.999503,0.057569,8.161886,0.994893,0.164759,0.754885,1.716156,0.065786


### Optuna supports various tools such as a database to store the results which can be used for nice visualizations, try them!


Optuna also supports conditional hyperparameters, which may be required only based on another hyperparameter value. In the following function several similarity heuristics are optimized.

In [11]:
def objective_function_KNN_similarities(optuna_trial):
    
    recommender_instance = ItemKNNCFRecommender(URM_train)
    similarity = optuna_trial.suggest_categorical("similarity", ['cosine', 'dice', 'jaccard', 'asymmetric', 'tversky', 'euclidean'])
    
    full_hyperp = {"similarity": similarity,
                   "topK": optuna_trial.suggest_int("topK", 5, 1000),
                   "shrink": optuna_trial.suggest_int("shrink", 0, 1000),
                  }
    
    if similarity == "asymmetric":
        full_hyperp["asymmetric_alpha"] = optuna_trial.suggest_float("asymmetric_alpha", 0, 2, log=False)
        full_hyperp["normalize"] = True     

    elif similarity == "tversky":
        full_hyperp["tversky_alpha"] = optuna_trial.suggest_float("tversky_alpha", 0, 2, log=False)
        full_hyperp["tversky_beta"] = optuna_trial.suggest_float("tversky_beta", 0, 2, log=False)
        full_hyperp["normalize"] = True 

    elif similarity == "euclidean":
        full_hyperp["normalize_avg_row"] = optuna_trial.suggest_categorical("normalize_avg_row", [True, False])
        full_hyperp["similarity_from_distance_mode"] = optuna_trial.suggest_categorical("similarity_from_distance_mode", ["lin", "log", "exp"])
        full_hyperp["normalize"] = optuna_trial.suggest_categorical("normalize", [True, False])
        
    
    recommender_instance.fit(**full_hyperp)
    
    result_df, _ = evaluator_validation.evaluateRecommender(recommender_instance)
    
    return result_df.loc[10]["MAP"]

In [12]:
optuna_study = optuna.create_study(direction="maximize")
        
save_results = SaveResults()
        
optuna_study.optimize(objective_function_KNN_similarities,
                      callbacks=[save_results],
                      n_trials = 10)

[I 2023-11-22 22:25:13,558] A new study created in memory with name: no-name-63611310-da5b-4cb9-a1c4-a160b0dfd3f7


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4890.52 column/sec. Elapsed time 0.79 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 3.31 sec. Users per second: 1818


[I 2023-11-22 22:25:17,771] Trial 0 finished with value: 0.09518016558106211 and parameters: {'similarity': 'tversky', 'topK': 22, 'shrink': 200, 'tversky_alpha': 1.110833799073885, 'tversky_beta': 1.6059773085428688}. Best is trial 0 with value: 0.09518016558106211.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4503.62 column/sec. Elapsed time 0.86 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 6.09 sec. Users per second: 988


[I 2023-11-22 22:25:24,933] Trial 1 finished with value: 0.09481197564240376 and parameters: {'similarity': 'dice', 'topK': 761, 'shrink': 203}. Best is trial 0 with value: 0.09518016558106211.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4166.03 column/sec. Elapsed time 0.93 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.75 sec. Users per second: 1048


[I 2023-11-22 22:25:31,814] Trial 2 finished with value: 0.09307549563650164 and parameters: {'similarity': 'tversky', 'topK': 589, 'shrink': 474, 'tversky_alpha': 0.7055294001664003, 'tversky_beta': 1.1046547106671623}. Best is trial 0 with value: 0.09518016558106211.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.


  item_similarity = 1/(np.exp(item_distance) + self.shrink + 1e-9)


Similarity column 3883 (100.0%), 1240.67 column/sec. Elapsed time 3.13 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 3.73 sec. Users per second: 1614


[I 2023-11-22 22:25:38,728] Trial 3 finished with value: 0.08062787869685295 and parameters: {'similarity': 'euclidean', 'topK': 115, 'shrink': 789, 'normalize_avg_row': False, 'similarity_from_distance_mode': 'exp', 'normalize': True}. Best is trial 0 with value: 0.09518016558106211.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4306.40 column/sec. Elapsed time 0.90 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 5.82 sec. Users per second: 1035


[I 2023-11-22 22:25:45,676] Trial 4 finished with value: 0.0783178463395102 and parameters: {'similarity': 'tversky', 'topK': 815, 'shrink': 717, 'tversky_alpha': 0.124237137867107, 'tversky_beta': 1.10397433841753}. Best is trial 0 with value: 0.09518016558106211.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 3639.97 column/sec. Elapsed time 1.07 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 6.13 sec. Users per second: 983


[I 2023-11-22 22:25:53,127] Trial 5 finished with value: 0.092917397868842 and parameters: {'similarity': 'dice', 'topK': 963, 'shrink': 410}. Best is trial 0 with value: 0.09518016558106211.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4891.65 column/sec. Elapsed time 0.79 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 6.41 sec. Users per second: 940


[I 2023-11-22 22:26:00,512] Trial 6 finished with value: 0.09821019493454777 and parameters: {'similarity': 'cosine', 'topK': 957, 'shrink': 485}. Best is trial 6 with value: 0.09821019493454777.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 494.75 column/sec. Elapsed time 7.85 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 4.44 sec. Users per second: 1356


[I 2023-11-22 22:26:12,941] Trial 7 finished with value: 0.05769171330550828 and parameters: {'similarity': 'euclidean', 'topK': 933, 'shrink': 816, 'normalize_avg_row': True, 'similarity_from_distance_mode': 'exp', 'normalize': True}. Best is trial 6 with value: 0.09821019493454777.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4850.95 column/sec. Elapsed time 0.80 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 3.73 sec. Users per second: 1613


[I 2023-11-22 22:26:17,549] Trial 8 finished with value: 0.1028505883871917 and parameters: {'similarity': 'cosine', 'topK': 104, 'shrink': 170}. Best is trial 8 with value: 0.1028505883871917.


ItemKNNCFRecommender: URM Detected 230 ( 5.9%) items with no interactions.
Similarity column 3883 (100.0%), 4953.89 column/sec. Elapsed time 0.78 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 4.04 sec. Users per second: 1493


[I 2023-11-22 22:26:22,448] Trial 9 finished with value: 0.10452612170366156 and parameters: {'similarity': 'cosine', 'topK': 213, 'shrink': 179}. Best is trial 9 with value: 0.10452612170366156.


In [13]:
save_results.results_df

Unnamed: 0,result,shrink,similarity,topK,tversky_alpha,tversky_beta,normalize,normalize_avg_row,similarity_from_distance_mode
0,0.09518,200.0,tversky,22.0,1.110834,1.605977,,,
1,0.094812,203.0,dice,761.0,,,,,
2,0.093075,474.0,tversky,589.0,0.705529,1.104655,,,
3,0.080628,789.0,euclidean,115.0,,,1.0,0.0,exp
4,0.078318,717.0,tversky,815.0,0.124237,1.103974,,,
5,0.092917,410.0,dice,963.0,,,,,
6,0.09821,485.0,cosine,957.0,,,,,
7,0.057692,816.0,euclidean,933.0,,,1.0,1.0,exp
8,0.102851,170.0,cosine,104.0,,,,,
9,0.104526,179.0,cosine,213.0,,,,,


In [14]:
optuna_study.best_trial.params

{'similarity': 'cosine', 'topK': 213, 'shrink': 179}

### An example with earlystopping, for SVD++

When training machine learning models that require iterative training for a certain number of epochs one also has to select this value. There are two strategies:
* Consider the number of epochs as another hyperparameter to be determined by our optimizer
* Fix a maximum number of epochs and periodically evaluate the model during the training process. If the recommendation quality does not improve after a certain number of epochs (20, 50 ...) stop the training and select the number of epochs that gave the best result. This is called earlystopping.

In [44]:
from Recommenders.MatrixFactorization.Cython.MatrixFactorization_Cython import MatrixFactorization_SVDpp_Cython

class SaveResults(object):
    
    def __init__(self):
        self.results_df = pd.DataFrame()
    
    def __call__(self, optuna_study, optuna_trial):
        hyperparam_dict = optuna_trial.params.copy()
        hyperparam_dict["result"] = optuna_trial.values[0]
        
        # Retrieve the optimal number of epochs from the "user attributes" of the trial
        hyperparam_dict["epochs"] = optuna_trial.user_attrs["epochs"]
        
        self.results_df = self.results_df.append(hyperparam_dict, ignore_index=True)
        
        
def objective_function_funksvd(optuna_trial):

    # Earlystopping hyperparameters available in the framework
    full_hyperp = {"validation_every_n": 5,
                   "stop_on_validation": True,
                   "evaluator_object": evaluator_validation,
                   "lower_validations_allowed": 5,   # Higher values will result in a more "patient" earlystopping
                   "validation_metric": "MAP",
                   
                   # MAX number of epochs (usually 500)
                   "epochs": 500,
                  }
                          
        
    recommender_instance = MatrixFactorization_SVDpp_Cython(URM_train)
    recommender_instance.fit(num_factors = optuna_trial.suggest_int("num_factors", 1, 200),
                             sgd_mode = optuna_trial.suggest_categorical("sgd_mode", ["sgd", "adagrad", "adam"]),
                             batch_size = optuna_trial.suggest_categorical("batch_size", [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]),
                             item_reg = optuna_trial.suggest_float("item_reg", 1e-5, 1e-2, log=True),
                             user_reg =  optuna_trial.suggest_float("user_reg", 1e-5, 1e-2, log=True),
                             learning_rate = optuna_trial.suggest_float("learning_rate", 1e-4, 1e-1, log=True),
                             ** full_hyperp)
    
    # Add the number of epochs selected by earlystopping as a "user attribute" of the optuna trial
    epochs = recommender_instance.get_early_stopping_final_epochs_dict()["epochs"]
    optuna_trial.set_user_attr("epochs", epochs) 
        
    result_df, _ = evaluator_validation.evaluateRecommender(recommender_instance)
    
    return result_df.loc[10]["MAP"]

In [43]:
optuna_study = optuna.create_study(direction="maximize")
        
save_results = SaveResults()
        
optuna_study.optimize(objective_function_funksvd,
                      callbacks=[save_results],
                      n_trials = 10)

[I 2023-11-22 22:48:09,767] A new study created in memory with name: no-name-4145021a-2bd2-4dac-8650-6506bbf820bf


MatrixFactorization_SVDpp_Cython_Recommender: URM Detected 230 ( 5.9%) items with no interactions.
SVD++: Processed 640256 (100.0%) in 16.88 sec. MSE loss 1.45E+01. Sample per second: 37922
SVD++: Epoch 1 of 2. Elapsed time 16.04 sec
SVD++: Processed 640256 (100.0%) in 16.96 sec. MSE loss 1.41E+01. Sample per second: 37749
SVD++: Epoch 2 of 2. Elapsed time 32.12 sec
SVD++: Terminating at epoch 2. Elapsed time 32.12 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 3.33 sec. Users per second: 1808


[I 2023-11-22 22:48:45,308] Trial 0 finished with value: 0.002186887371150318 and parameters: {'num_factors': 147, 'sgd_mode': 'adagrad', 'batch_size': 128, 'item_reg': 7.29063316872942e-05, 'user_reg': 0.0005238842020210994, 'learning_rate': 0.00045976338958413803}. Best is trial 0 with value: 0.002186887371150318.


MatrixFactorization_SVDpp_Cython_Recommender: URM Detected 230 ( 5.9%) items with no interactions.
SVD++: Processed 640256 (100.0%) in 2.63 sec. MSE loss 1.63E+00. Sample per second: 243803
SVD++: Epoch 1 of 2. Elapsed time 2.28 sec
SVD++: Processed 640256 (100.0%) in 2.93 sec. MSE loss 7.43E-01. Sample per second: 218731
SVD++: Epoch 2 of 2. Elapsed time 4.58 sec
SVD++: Terminating at epoch 2. Elapsed time 4.58 sec
EvaluatorHoldout: Processed 6024 (100.0%) in 3.22 sec. Users per second: 1870


[I 2023-11-22 22:48:53,163] Trial 1 finished with value: 0.002453150361516894 and parameters: {'num_factors': 19, 'sgd_mode': 'adagrad', 'batch_size': 128, 'item_reg': 0.0005867918689560966, 'user_reg': 0.0012967358663248814, 'learning_rate': 0.043988020410102274}. Best is trial 1 with value: 0.002453150361516894.


In [45]:
optuna_study.best_trial.params

{'num_factors': 19,
 'sgd_mode': 'adagrad',
 'batch_size': 128,
 'item_reg': 0.0005867918689560966,
 'user_reg': 0.0012967358663248814,
 'learning_rate': 0.043988020410102274}

In [46]:
optuna_study.best_trial

FrozenTrial(number=1, state=TrialState.COMPLETE, values=[0.002453150361516894], datetime_start=datetime.datetime(2023, 11, 22, 22, 48, 45, 315537), datetime_complete=datetime.datetime(2023, 11, 22, 22, 48, 53, 162309), params={'num_factors': 19, 'sgd_mode': 'adagrad', 'batch_size': 128, 'item_reg': 0.0005867918689560966, 'user_reg': 0.0012967358663248814, 'learning_rate': 0.043988020410102274}, user_attrs={'epochs': 0}, system_attrs={}, intermediate_values={}, distributions={'num_factors': IntDistribution(high=200, log=False, low=1, step=1), 'sgd_mode': CategoricalDistribution(choices=('sgd', 'adagrad', 'adam')), 'batch_size': CategoricalDistribution(choices=(1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024)), 'item_reg': FloatDistribution(high=0.01, log=True, low=1e-05, step=None), 'user_reg': FloatDistribution(high=0.01, log=True, low=1e-05, step=None), 'learning_rate': FloatDistribution(high=0.1, log=True, low=0.0001, step=None)}, trial_id=1, value=None)

In [47]:
save_results.results_df

Unnamed: 0,batch_size,epochs,item_reg,learning_rate,num_factors,result,sgd_mode,user_reg
0,128.0,0.0,7.3e-05,0.00046,147.0,0.002187,adagrad,0.000524
1,128.0,0.0,0.000587,0.043988,19.0,0.002453,adagrad,0.001297
