# Model selection and prediction

In this notebook we will do the model selection and predictions.

# TOC

* [Confusion warning](#Confusion-warning)

## Loading the data

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

In [None]:
import pickle
import numpy as np
import pandas as pd
import tensorflow as tf
from pathlib import Path
from sklearn import model_selection
from sklearn import metrics

In [None]:
# Set the random seeds
seed = 42
np.random.seed(seed)
tf.set_random_seed(seed)

# Possible more fixes for non-determinism
# https://github.com/keras-team/keras/issues/2280#issuecomment-306959926
# https://github.com/keras-team/keras/issues/2280#issuecomment-366542480
import os
from keras import backend as k

os.environ['PYTHONHASHSEED'] = '0'
sess = tf.Session(graph=tf.get_default_graph())

# Limit operation to 1 thread for deterministic results.
# NOTE: This will slow down the operation
# session_conf = tf.ConfigProto(
#     intra_op_parallelism_threads=1,
#     inter_op_parallelism_threads=1)
# sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

k.set_session(sess)

In [None]:
generated_data = Path('.').absolute().joinpath('generated_data')

train = pd.read_hdf(generated_data.joinpath('train.hdf'), key='train')
target = pd.read_hdf(generated_data.joinpath('target.hdf'), key='target')
test = pd.read_hdf(generated_data.joinpath('test.hdf'), key='test')

In [None]:
train.head()

In [None]:
target.head()

In [None]:
test.head()

# Mean prediction

Before we start we should do some rudimental model predictions.

The file `sample_submission.csv.gz` contains the constant prediction `0.5`, and gives the score `1.23646` against the kaggle site.

Furthermore we know that the optimal prediction for a constant is a target mean (of the ground truth).
Nevertheless, we can probe the leaderboard with target mean of the training set.

In [None]:
mean_prediction = test.loc[:, ['ID']]
mean_prediction.loc[:, 'item_cnt_month'] = target.loc[:, 'target'].mean()

# Set ID as index
mean_prediction.set_index('ID', inplace=True)

mean_prediction.to_csv('mean_prediction.csv')
mean_prediction.head()

The mean prediction gave a score of `4.48306`, which is worse than our initial submission.
This means that on average, predictions with lower values are preferred over predicitions with higher values.

We can in fact use this to probe the leaderboard. As we know that the constant target mean of the ground thruth gives the lowest score, we can check whether `0.5` is a minimum (at least of the public test set).

In [None]:
prediction_04 = mean_prediction.copy()
prediction_04.loc[:, 'item_cnt_month'] = 0.4
prediction_04.to_csv('prediction_04.csv')
prediction_04.head()

This improved the score to `1.22295`.

We could continue to probe the leader board like this to find the minimum to get a direction on what our prediction mean should be close to. However, we must bear in mind that we are only probing the public part of the test set, so we must use this technique with care.

# Data preparation

We will here prepare the data for training and prediction

## Remove first months

We remove the first months as the lagged values are effectively NaNs (which we replaced with $0$ in [1_train_test_generation.ipynb](1_train_test_generation.ipynb))

In [None]:
highest_lag = max(set(int(col.split('_lag_')[-1]) for col in train.columns if '_lag_' in col))

train = train.loc[train.loc[:, 'date_block_num'] >= highest_lag]
target = target.loc[target.loc[:, 'date_block_num'] >= highest_lag]

## Downcasting

In order to save resources, we downcast the types (as they by default are loaded as double)

In [None]:
def downcast_dtypes(df):
    """
    Downcasts float64 to float32 and int64 to int32
    
    Paramters
    ----------
    df : DataFrame
        The data frame to downcast
    
    Returns
    -------
    df : DataFrame
        The downcasted date frame
    """
    
    # Select columns to downcast
    float_cols = [c for c in df.columns if df.loc[:, c].dtype == 'float64']
    int_cols = [c for c in df.columns if df.loc[:, c].dtype == 'int64']
    
    # Downcast
    df.loc[:, float_cols] = df.loc[:, float_cols].astype(np.float32)
    df.loc[:, int_cols] = df.loc[:, int_cols].astype(np.int32)
    
    return df

In [None]:
train = downcast_dtypes(train)
target = downcast_dtypes(target)
test = downcast_dtypes(test)

Copy the data for debugging purposes

In [None]:
train_original = train.copy()
target_original = target.copy()
test_original = test.copy()

## Remove superflous columns

In [None]:
all(test.index == test.loc[:, 'ID'])

We note that the `ID` (which we only need in the test prediction) are stored in the index, so we might as well drop it.

In [None]:
drop_cols = ['ID', 'date_block_num', 'item_id', 'shop_id']

train.drop(drop_cols, axis=1, inplace=True)
target.drop(drop_cols, axis=1, inplace=True)
test.drop(drop_cols, axis=1, inplace=True)

# Make scorer

It appears that the RSME is not available as a scorer out of the box, so we define it ourselves.
This also give us the oppurtunity to clip the predictions to $[0,20]$

In [None]:
def rmse(ground_truth, predictions):
    """
    Returns the root mean squared error of the predictions
    
    The root mean squared error is defined by:
    $\sqrt {\frac {\sum _{t=1}^{T}({\hat {y}}_{t}-y_{t})^{2}}{T}}$
    
    Parameters
    ----------
    ground_truth : array, shape (n_samples,)
        The correct prediction
    prediction : array, shape (n_samples,)
        The predictions
        
    Returns
    -------
    rmse : float
        The root mean squared error
    """
    return np.sqrt(metrics.mean_squared_error(ground_truth, predictions))

In [None]:
def rmse_clip(ground_truth, predictions):
    """
    Returns the root mean squared error of the predictions
    
    The root mean squared error is defined by:
    $\sqrt {\frac {\sum _{t=1}^{T}({\hat {y}}_{t}-y_{t})^{2}}{T}}$
    
    Note
    ----
    This version clips the predictions to [0, 20]
    
    Parameters
    ----------
    ground_truth : array, shape (n_samples,)
        The correct prediction
    prediction : array, shape (n_samples,)
        The predictions
        
    Returns
    -------
    rmse : float
        The root mean squared error
    """
    return np.sqrt(metrics.mean_squared_error(ground_truth, predictions.clip(0, 20)))

In [None]:
rmse_scorer = metrics.make_scorer(rmse, greater_is_better=False)
rmse_scorer_clip = metrics.make_scorer(rmse_clip, greater_is_better=False)

# Confusion warning

**NOTE**: The scorer in `GridSearchCV` can be utterly confusing. 

The scores returned by `GridSearchCV` are negative for scores as `GridSearchCV` by convention tries to maximize its score. This means that loss functions like MSE have to be negated.

See [here](https://stackoverflow.com/questions/21050110/sklearn-gridsearchcv-with-pipeline)
and [here](https://stackoverflow.com/questions/21443865/scikit-learn-cross-validation-negative-values-with-mean-squared-error).

For clarity let's run an experiment.
We will fit a linear regression classifier on data that intersects (0, 1) rather than origo.
Thus we know that including the intercept is better than not.

In [None]:
from sklearn.linear_model import LinearRegression

In [None]:
example_train = pd.DataFrame(np.array(range(100)))
example_target = pd.DataFrame(np.array(range(1, 101)))

parameters = {'fit_intercept': (True, False)}

In [None]:
rmse_scorer_greater = metrics.make_scorer(rmse, greater_is_better=True)
grid_lin_greater = model_selection.GridSearchCV(LinearRegression(), 
                                                parameters,
                                                scoring=rmse_scorer_greater,
                                                return_train_score=False)
grid_lin_greater.fit(example_train, example_target)
greater_best_model = grid_lin_greater.best_estimator_
greater_best_score = grid_lin_greater.best_score_
greater_mean_score = grid_lin_greater.cv_results_['mean_test_score']

In [None]:
rmse_scorer_lesser = metrics.make_scorer(rmse, greater_is_better=False)
grid_lin_lesser = model_selection.GridSearchCV(LinearRegression(), 
                                               parameters,
                                               scoring=rmse_scorer_lesser,
                                               return_train_score=False)
grid_lin_lesser.fit(example_train, example_target)
lesser_best_model = grid_lin_lesser.best_estimator_
lesser_best_score = grid_lin_lesser.best_score_
lesser_mean_score = grid_lin_lesser.cv_results_['mean_test_score']

In [None]:
print(f'Greater is best returns best a score of {greater_best_score:.2f} '
      f'of {greater_mean_score} '
      f'with the model\n{greater_best_model}')

In [None]:
print(f'Lesser is best returns best a score of {lesser_best_score:.2f} '
      f'of {lesser_mean_score} '
      f'with the model\n{lesser_best_model}')

# Validation generation

As we want to predict for the next month, we know that train-test is split by time (we would like to predict for month $34$).

In addition, from [0_EDA.ipynb](0_EDA.ipynb), we saw that different band of item ids were removed (i.e. non-random row numbers were removed in the training set). 

As a rule of thumb we should mimic the validation in the similar manner. The time component is fairly straigth forward. The question is whether it makes sense to take out bands of item id in addition. This is of course testable, and due to time constraints we will just split by time here.

**NOTE**: Training takes quite some time with extensive grid search with using this dataset. It would be optimal to have several splits, but as we a high number of samples we will only use one split here. 

In [None]:
cv_generator = model_selection.TimeSeriesSplit(n_splits=2)

# Hyperparameter optimization

In [None]:
from keras.models import load_model

def save_grid(model_grid, file_path):
    """
    Saves the model grid
    
    Parameters
    ----------
    model_grid : GridSearchCV
        The model grid to save
    file_path : Path
        The file to load
    """
    
    with file_path.open('wb') as f:
        pickle.dump(model_grid, f, pickle.HIGHEST_PROTOCOL)
        print(f'Saved fitted model grid to {file_path}')
    
def load_grid(file_path):
    """
    Loads the model grid
    
    Parameters
    ----------
    file_path : Path
        The file to load
        
    Returns
    -------
    model_grid : GridSearchCV
        The model grid
    """
    
    with file_path.open('rb') as f:
        model_grid = pickle.load(f)
        print(f'Loaded fitted model grid from {file_path}')
    return model_grid
        
def save_keras_grid(model_grid, file_path):
    """
    Saves the model grid and the best model seperately
    
    Parameters
    ----------
    model_grid : GridSearchCV
        The model grid to save
    file_path : Path
        The file to load
    """
    
    # Extract the best estimator and reset the value
    best_estimator = model_grid.best_estimator_
    model_grid.best_estimator_ = None
    
    # Pickle the grid without the model
    with file_path.open('wb') as f:
        pickle.dump(model_grid, f, pickle.HIGHEST_PROTOCOL)
        print(f'Saved fitted model grid to {file_path}')
        
    # Save the model seperately
    model_name = file_path.name.split('.')[0] + '.h5'
    model_path = file_path.parent.joinpath(model_name)
    best_estimator.model.save(model_path)
    print(f'Saved fitted model grid to {model_path}')
    
def load_keras_grid(file_path):
    """
    Loads the model and the grid seperately
    
    Parameters
    ----------
    model_grid : GridSearchCV
        The model grid to save
    file_path : Path
        The file to load
    """
    
    with file_path.open('rb') as f:
        model_grid = pickle.load(f)
        print(f'Loaded fitted model grid from {file_path}')
        
    # Load the model seperately
    model_name = file_path.name.split('.')[0] + '.h5'
    model_path = file_path.parent.joinpath(model_name)
    model_grid.best_estimator_ = load_model(model_path, 
                                            custom_objects={'rmse_keras_clip': rmse_keras_clip})
    print(f'Loaded best estimator from {model_path}')
    
    return model_grid

In [None]:
def cross_validate_skl(name,
                       estimator,
                       parameters,
                       train,
                       target,
                       scorer,
                       cv_generator,
                       save_dir,
                       overwrite=False
                      ):
    """
    Performs cross validation on a scikit learn estimator.
    
    The function will search for saved models and load them unless overwrite is True
    
    Notes
    -----
    Some models appear to have problem with pickling, 
    therefore different methods for saving and loading has been implemented
    
    Parameters
    ----------
    name : str
        Name to add to the model name for saving and loading
    estimator : estimator object
        The estimator to perform the cross validation on
    parameters : dict
        Parameters to tune
    train : array-like
        The training set
    target : array-like
        The target
    scorer : scorer object
        The scorer to use in the cross validation
    cv_generator : cv-generator object
        An object to use for the train-validation split in the cross validation
    save_dir : Path or str
        Directory to save the model to
    overwrite : bool
        Will overwrite existing pickled models, overrides the new_model parameter
        
    Returns
    -------
    best_estimator : estimator object
        The best estimator found by the search
    train_score : array
        The training score
    validation_score : array
        The validation score
    """

    model_name = str(estimator.__class__).split('.')[-1][:-2]
    file_path = Path(save_dir).joinpath(f'{model_name}_{name}.pkl')
    
    if file_path.is_file() and not overwrite: 
        if 'Keras' in file_path.name:
            model_grid = load_keras_grid(file_path)
        else:
            model_grid = load_grid(file_path)
    else:
        model_grid = model_selection.GridSearchCV(estimator, 
                                                  parameters,
                                                  scoring=rmse_scorer,
                                                  cv=cv_generator,
                                                  verbose=3,
                                                  return_train_score=True)
        model_grid.fit(train, target)

        if 'Keras' in file_path.name:
            save_keras_grid(model_grid, file_path)
        else:
            save_grid(model_grid, file_path)
    
    best_estimator = model_grid.best_estimator_
    
    train_score = model_grid.cv_results_["mean_train_score"]
    validation_score = model_grid.cv_results_["mean_test_score"]
    
    return best_estimator, train_score, validation_score

In [None]:
def plot_train_validation(train_scores, validation_scores, parameter):
    """
    Plots the training and validation curve as a function of the parameter
    
    Parameters
    ----------
    train_scores : array-like
        The scores obtained from the training set
    validation_scores : array-like
        The scores obtained from the validation set
    parameters : dict
        Dictionary of the tuned parameter on the form
        >>> {parameter: np.array}
        
    """
    
    # Select the first key
    key = next(iter(parameter.keys()))
    parameter_vals = parameter[key]
    
    fig, ax = plt.subplots()
        
    ax.plot(parameter_vals, train_scores, label='Train')
    ax.plot(parameter_vals, validation_scores, label='Validation')
    ax.set_xlabel(key)
    ax.set_ylabel('Error')
    ax.grid(True)
    ax.legend(loc='best', fancybox=True, framealpha=0.5)

In [None]:
Remember to negate the errors when sending them in to plot train validation

## Linear regression

There are no real hyperparameters to tune in linear regression (other than choosing wheter we should include the intersect or not).

In [None]:
lin_reg, lin_train_score, lin_validation_score = \
    cross_validate_skl('fit_intercept',
                       LinearRegression(), 
                       {'fit_intercept': (True,)}, 
                       train, 
                       target, 
                       rmse_scorer_clip, 
                       cv_generator, 
                       generated_data)

As explained in the [Confusion warning](#Confusion-warning) section, we need to negate the scores

In [None]:
print(f'Train score: {-lin_train_score[0]:.3f}')
print(f'Validation score: {-lin_validation_score[0]:.3f}')

Although the RMSE is quite high, it could be that it can add some information to a stacked data set.

## KNN

In [None]:
from sklearn.neighbors import KNeighborsRegressor

In [None]:
knn_reg, knn_train_score, knn_validation_score = \
    cross_validate_skl('k_1-4',
                       KNeighborsRegressor(), 
                       {'n_neighbors': (1, 2, 3, 4), 'n_jobs': (-1,)}, 
                       train, 
                       target, 
                       rmse_scorer_clip, 
                       cv_generator, 
                       generated_data)

## Regression tree

In [None]:
from sklearn.ensemble import ExtraTreesRegressor

We would like to benchmark our results with [this notebook](https://www.kaggle.com/the1owl/playing-in-the-sandbox/notebook), which achieves a RMSE aroun $0.27$

In [None]:
params = {'n_estimators': (25,),
          'n_jobs': (-1,),
          'max_depth': (15,),
          'random_state': (18,),
         }

xt_reg, xt_train_score, xt_validation_score = \
    cross_validate_skl('bench',
                       ExtraTreesRegressor(), 
                       params, 
                       train, 
                       target, 
                       rmse_scorer_clip, 
                       cv_generator, 
                       generated_data)

In [None]:
train.head()

In [None]:
idf_cols = [col for col in train_original.columns if 'lag' not in col and 'tf_idf' not in col and 'name' not in col]
own_train = train_original.loc[:, idf_cols]

In [None]:
own_train.head()

In [None]:
#Validation Hold Out Month
x1 = own_train.loc[own_train['date_block_num']<33]
x1.drop(drop_cols, axis=1, inplace=True)
y1 = target_original.loc[target_original['date_block_num']<33]
y1.drop(drop_cols, axis=1, inplace=True)

x2 = own_train.loc[own_train['date_block_num']==33]
x2.drop(drop_cols, axis=1, inplace=True)
y2 = target_original.loc[target_original['date_block_num']==33]
y2.drop(drop_cols, axis=1, inplace=True)

In [None]:
reg_own = ExtraTreesRegressor(n_estimators=25, n_jobs=-1, max_depth=15, random_state=18)
reg_own.fit(x1,y1)

In [None]:
print('RMSE:', np.sqrt(metrics.mean_squared_error(y2.clip(0.,20.),reg_own.predict(x2).clip(0.,20.))))

In [None]:
own_train.columns

In [None]:
drop_cols

In [None]:
#Validation Hold Out Month
x1 = own_train.loc[own_train['date_block_num']<33]
y1 = target_original.loc[target_original['date_block_num']<33]

x2 = own_train.loc[own_train['date_block_num']==33]
y2 = target_original.loc[target_original['date_block_num']==33]

In [None]:
reg_own = ExtraTreesRegressor(n_estimators=25, n_jobs=-1, max_depth=15, random_state=18)
reg_own.fit(x1,y1)

In [None]:
print('RMSE:', np.sqrt(metrics.mean_squared_error(y2.clip(0.,20.),reg_own.predict(x2).clip(0.,20.))))

## Gradient boosting decision tree

In [None]:
from xgboost import XGBRegressor

As with the other estimators, the `xgboost` estimator has several knobs to turn which can be used to find the optimal estimator.

To start with, we have:

Better fitting (increase for reducing underfit)
* max_depth
* subsample
* colsample_bytree
* colsample_bylevel
* eta 
* num_round

Impeeds fitting (increase for reducing overfitting)
* min_child_weight
* lambda
* alpha

We will start with the `max_depth` parameter to investigate the performance

In [None]:
# https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn

params = {'max_depth': (3, 4, 5),
          'n_jobs': (-1,),
          'random_state': (seed,),
         }

xg_reg, xg_train_score, xg_validation_score = \
    cross_validate_skl('depth_3-5',
                       XGBRegressor(), 
                       params, 
                       train, 
                       target, 
                       rmse_scorer_clip, 
                       cv_generator, 
                       generated_data)

## Neural network

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Activation
from keras.wrappers.scikit_learn import KerasRegressor
from keras import backend as K

**NOTE**: We also make a costum RMSE for keras

In [None]:
def rmse_keras_clip(y_true, y_pred):
    """
    Returns the root mean squared error of the predictions
    
    The root mean squared error is defined by:
    $\sqrt {\frac {\sum _{t=1}^{T}({\hat {y}}_{t}-y_{t})^{2}}{T}}$
    
    Note
    ----
    This version clips the predictions to [0, 20]
    
    Parameters
    ----------
    y_true : array, shape (n_samples,)
        The correct prediction (the ground truth)
    y_pred : array, shape (n_samples,)
        The predictions
        
    Returns
    -------
    rmse : float
        The root mean squared error
    """
    return K.sqrt(K.mean(K.square(tf.clip_by_value(y_pred, 0, 20) - y_true), axis=-1))

**NOTE**: RNNs would probably be the best fit for this task, we will for simplicity use plain old multilayer perceptrons

In [None]:
def build_mlp(input_dim, optimizer, hidden_layers=1, nodes=32, dropout=0):
    """
    Returns a keras model
    
    Parameters
    ----------
    input_dim : int
        The input dimension
    hidden_layers : int
        The number of hidden layers
    optimizer : str
        The optimizer to use
    nodes : int or array-like, shape (hidden_layers)
        Nodes for all the layers.
        If array-like, each element corresponds to the nodes in the hidden layer
        If int, all hidden layers will have the same number of nodes
    dropout : float or array-like, shape (hidden_layers)
        Dropout for all the layers.
        If array-like, each element corresponds to the dropout values after each hidden layer
        If int, all hidden layers will have the same dropout value
    """
    
    if type(nodes) == int:
        nodes = [nodes] * hidden_layers
    if type(dropout) == float or type(dropout) == int:
        dropout = [dropout] * hidden_layers
    
    model = Sequential()

    model.add(Dense(nodes[0], input_dim=input_dim))
    
    if len(nodes) > 1:
        model.add(Activation('relu'))
    
    for node, drop in zip(nodes[1:], dropout[:-1]):
        model.add(Dropout(drop))
        model.add(Dense(node))
        
        if node != nodes[-1]:
            model.add(Activation('relu'))

    # Add the final layer
    model.add(Dropout(dropout[-1]))
    model.add(Dense(1))
    # NOTE: We use identity as we are dealing with a regression problem
    model.add(Activation('linear'))

    model.compile(loss=rmse_keras_clip,
                  optimizer=optimizer,
                  metrics=[rmse_keras_clip])
    
    print(model.summary())

    return model

For the neural networks, it makes sense to investigate

Better fitting (increase for reducing underfit)
* Number of neurons per layer
* Number of layers
* Adam/Adadelta/Adagrad/... (observed to lead to more overfitting)
* Batch size

Impeeds fitting (increase for reducing overfitting)
* L2/L1 for weights
* Dropout/Dropconnect
* Static dropconnect

We start by optimizing one layer

In [None]:
mlp_model = KerasRegressor(build_fn=build_mlp, verbose=0)

In [None]:
params = {'input_dim': (train.shape[1],),
          'optimizer': ('adadelta',),
          'hidden_layers' : (1,),
          'nodes': (2,),
          'dropout': (0.3,),
          'batch_size': (32,),
          'epochs': (10,)
         }

map_reg, map_train_score, map_validation_score = \
    cross_validate_skl('hl_1_n_16-64_do_03',
                       mlp_model, 
                       params, 
                       train, 
                       target, 
                       rmse_scorer, 
                       cv_generator, 
                       generated_data, overwrite=True)

## Ensembling

In [None]:
lin_pred = lin_reg.predict(test)

Should be a different notebook?

If time: Submit target mean and see what score we get

# TODO

If mismatch between submission score and local validation score, check if:
    
* Too little data in public leader board
* We overfitted
* Chosen the correct splitting strategy
* Train/test comes from different distributions

Set the random seed

Report the submission sample, report score

Report score of optimal value (for RSME this is the target mean)

Clarity

- The clear step-by-step instruction on how to produce the final submit file is provided
- Code has comments where it is needed and meaningful function names



Validation

- Type of public/private split is identified (leaderboard probing)

Data leakages

- Data is investigated for data leakages and investigation process is described
- Found data leakages are utilized

Metrics optimization

- Correct metric is optimized

Advanced Features I: mean encodings

- Mean-encoding is applied
- Mean-encoding is set up correctly, i.e. KFold or expanding scheme are utilized correctly

Advanced Features II

- At least one feature from this topic is introduced

Hyperparameter tuning

- Parameters of models are roughly optimal

Ensembles

- Ensembling is utilized (linear combination counts)
- Validation with ensembling scheme is set up correctly, i.e. KFold or Holdout is utilized
- Models from different classes are utilized (at least two from the following: KNN, linear models, RF, GBDT, NN)

Clarity

- The clear step-by-step instruction on how to produce the final submit file is provided
- Code has comments where it is needed and meaningful function names

Feature preprocessing and generation with respect to models

- Several simple features are generated
- For non-tree-based models preprocessing is used or the absence of it is explained
- Feature extraction from text and images

Features from text are extracted

- Special preprocessings for text are utilized (TF-IDF, stemming, levenshtening...)

EDA

- Several interesting observations about data are discovered and explained
- Target distribution is visualized, time trend is assessed

Validation

- Type of train/test split is identified and used for validation
- Type of public/private split is identified

Data leakages

- Data is investigated for data leakages and investigation process is described
- Found data leakages are utilized

Metrics optimization

- Correct metric is optimized

Advanced Features I: mean encodings

- Mean-encoding is applied
- Mean-encoding is set up correctly, i.e. KFold or expanding scheme are utilized correctly

Advanced Features II

- At least one feature from this topic is introduced

Hyperparameter tuning

- Parameters of models are roughly optimal

Ensembles

- Ensembling is utilized (linear combination counts)
- Validation with ensembling scheme is set up correctly, i.e. KFold or Holdout is utilized
- Models from different classes are utilized (at least two from the following: KNN, linear models, RF, GBDT, NN)