# Demo
In the demo, we showcase the easy process of selecting the best-suited model in different scenarios with LensKit-Auto.
A LensKit-Auto developer simply has to call a single function call to select, tune and ensemble LensKit algorithms.

The demo is devided into two parts:

1. In the first part we are going to select and tune a Top-N recommender out of all of LensKit's
algorithms on the Movielens 100k dataset.
3. In the second part we are going to tune and ensemble the BiasedMatrixFactorization predictor for the Movielens 100k
dataset.

## Loading Data
First, we store the Movielens 100k dataset as a pandas dataframe.

In [2]:
from lenskit.datasets import ML100K
# read in the Movielens 100k dataset as a pandas dataframe
ml100k = ML100K('../data/ml-100k')
ratings = ml100k.ratings
ratings.head()

Unnamed: 0,user,item,rating,timestamp
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596


## Splitting Data
For this demo we use a row-based holdout split. 25% of the dataset rows are  contained by the test set and 75% of the
rows are contained by the
train set. A holdout split is not ideal and we would rather use a cross-fold split in an experiment. But for the sake of
this demo, a holdout split keeps the code simple and we do not have to calculate the mean error over all folds.

In [3]:
# perform holdout validation split
test = ratings.sample(frac=0.25, random_state=42)
train = ratings.drop(test.index)

## 1. Select and Tune a Recommender Model From LensKit
In the first part of our exeriment, we want to get the best performing recommender model on the Movielens 100k
dataset. This model should be selected from all LensKit's algorihtms and tuned based on the *NDCG@10* metric .

A LensKit developer simply calls the *get_best_recommender_model()* function to select and optimize a LensKit
model.

Note: To keep the demo easily executable, we reduced the search time from one hour to two minutes. Two minutes are
enough to demonstrate how LensKit-Auto works. In a real use case, we provide more time for the
optimization process.

In [4]:
from lkauto.lkauto import get_best_recommender_model

# call the get_best_recommender_model to automatically select and tune the best performing LensKit algorithm
optimized_model, configuration  = get_best_recommender_model(train=train, time_limit_in_sec=120)

2023-02-22 09:10:41,441 INFO ---Starting LensKit-Auto---
2023-02-22 09:10:41,442 INFO 	 optimization_time: 		 120 seconds
2023-02-22 09:10:41,442 INFO 	 num_evaluations: 			 500
2023-02-22 09:10:41,442 INFO 	 optimization_metric: 		 ndcg@10
2023-02-22 09:10:41,442 INFO 	 optimization_strategie: 	 bayesian
2023-02-22 09:10:41,443 INFO --Start Preprocessing--
2023-02-22 09:10:41,445 INFO --End Preprocessing--
2023-02-22 09:10:41,445 INFO --Start Bayesian Optimization--
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Numba is using threading layer omp - consider TBB
BLAS using multiple threads - can cause oversubscription
found 2 potential runtime problems - see https://boi.st/lkpy-perf
2023-02-22 09:10:52,835 INFO Run ID: 1 | ItemItem | ndcg@10: 0.17453138349348526
2023-02-22 09:10:56,469 INFO Run ID: 2 | UserUser | ndcg@10: 0.1644463000108411
2023-02-22 09:11:18,025 INFO Run ID: 3 | FunkSVD | ndcg@10: 0.00011664492811850631
2023-02-22 09:

After this step, the LensKit developer uses the optimized model as any other LensKit model.
The following lines are copied from the *Running the Evaluation* part of the
[LensKit Getting Started Chapter](https://lkpy.readthedocs.io/en/stable/GettingStarted.html)

First, we initialize a LensKit Recommender with our optimized model. Then, we fit and predict on the
train - and test set.

In [5]:
from lenskit import batch, topn, util
from lenskit.algorithms import Recommender

# initialize LensKit Recommender object
fittable = Recommender.adapt(optimized_model)
# fit the optimized model
fittable.fit(train)
users = test.user.unique()
# now we run the recommender
recs = batch.recommend(fittable, users, 10)

In the last step of the evaluation, we use the LensKit Top-N RecListAnalysis object to compute the *NDCG@10* metric
for every user.

In [6]:
# initialize RecSysAnalysis object for computing the NDCG@10 value
rla = topn.RecListAnalysis()
# add ndcg metric to the RecSysAnalysis tool
rla.add_metric(topn.ndcg)
# compute ndcg@10 values
results = rla.compute(recs, test)
# show the ndcg scores per user
results.head()

Unnamed: 0_level_0,nrecs,ndcg
user,Unnamed: 1_level_1,Unnamed: 2_level_1
877,10,0.147429
815,10,0.11737
94,10,0.220307
416,10,0.238576
500,10,0.177129


## 2. Tune and Ensemble a Predictor Model From LensKit
In the second part of this demo, we are going to tune and ensemble the *BiasedMatrixFactorization* algorithm with
LensKit-Auto. In comparison to the first part of the demo, we don't want to select an algorithm out of all LensKit
algorithms but tune a single predictor algorithm on the *RMSE* metric. Furthermore, we want to ensemble the best performing models to gain a
performance boost.

In this part of the demo we, need to create a configuration space that only contains the *BiasedMatrixFactorization*
algorithm.

Note: To keep the demo easily executable, we reduced the search time from one hour to two minutes. Two minutes are
enough to demonstrate how LensKit-Auto works. In a real use case, we provide more time for the
optimization process.

In [4]:
from ConfigSpace import Constant
from lkauto.algorithms.als import BiasedMF

# initialize BiasedMF ConfigurationSpace
cs = BiasedMF.get_default_configspace()
# declare, that the BiasedMF algorithm is the only algorithm contained in the configuration space
cs.add_hyperparameters([Constant("algo", "ALSBiasedMF")])
# set a random seed for reproducible results
cs.seed(42)

After we created the configuration space for the *BiasedMatrixFactorization* algorithm. We call the
*get_best_prediction_model* to automatically tune and ensemble BiasedMatrixFactorization models.

In [7]:
from lkauto.lkauto import get_best_prediction_model
# Provide the BiasedMF ConfigurationSpace to the get_best_recommender_model function.
optimized_model, configuration = get_best_prediction_model(train=train, cs=cs, time_limit_in_sec=120)

2023-02-22 09:02:05,537 INFO ---Starting LensKit-Auto---
2023-02-22 09:02:05,539 INFO 	 optimization_time: 		 120 seconds
2023-02-22 09:02:05,541 INFO 	 num_evaluations: 			 500
2023-02-22 09:02:05,541 INFO 	 optimization_metric: 		 rmse
2023-02-22 09:02:05,542 INFO 	 optimization_strategie: 	 bayesian
2023-02-22 09:02:05,543 INFO --Start Preprocessing--
2023-02-22 09:02:05,550 INFO --End Preprocessing--
2023-02-22 09:02:05,551 INFO --Start Bayesian Optimization--
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Numba is using threading layer omp - consider TBB
BLAS using multiple threads - can cause oversubscription
found 2 potential runtime problems - see https://boi.st/lkpy-perf
2023-02-22 09:02:12,521 INFO Run ID: 1 | ALSBiasedMF | rmse: 0.9393121280796051
2023-02-22 09:02:19,757 INFO Run ID: 2 | ALSBiasedMF | rmse: 0.9631079867170781
2023-02-22 09:02:24,235 INFO Run ID: 3 | ALSBiasedMF | rmse: 0.951654284405548
2023-02-22 09:02:25,69

After we have the optimized and ensembled *BiasedMatrixFactorization* models. We can use the ensemble like any other LensKit
predictor model to get predictions.

In [9]:
# fit the optimized model
optimized_model.fit(train)
# predict using the optimized model returned by LensKit-auto
preds = optimized_model.predict(test)

In the last step of this demo, we calculate the *RMSE* value for our optimized model.

In [10]:
from lenskit.metrics.predict import rmse

# print the RMSE value
print("RMSE: {}".format(rmse(predictions=preds, truth=test['rating'])))


RMSE: 0.9077549336005521
