# Random Search Implementation

In this exercise we will implement the random search algorithm from scratch. 

The goal is to understand how the hyperparameter tunning is performed. 

You will receive a dictionary of {parameter_name:values} to be used with a decision tree classifier.

Your algorithm need to randomly choose a combination of parameters and use it as input to train the Decision Tree.

The algorithm will choose n random combinations. n is a parameter given by the user.

You can use the following code to help your implementation or you can write your code from scratch.

Needles to say you should not use the "RandomizedSearchCV" from scikit-learn

Hands on!

In [52]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import datasets
from sklearn.linear_model import LogisticRegression


data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
print("Training set: ", X_train.shape, y_train.shape)
print("Testing set: ", X_test.shape, y_test.shape)

def random_search(model, param_grid, X, y, n_iter=100):
    """
    Perform random search
    :param model: Model to be used
    :param param_grid: Dictionary containing hyperparameters and their possible values
    :param X: Data (n_samples, n_features)
    :param y: Target (n_samples,)
    :param n_iter: Number of iterations
    :return: Best hyperparameters
    """
    M_precision = 0
    for _ in range(n_iter):
    # generation random du parametre depuis le dictionnaire
        parametres = {
            "max_depth": np.random.choice(param_grid["max_depth"]),
            "criterion": np.random.choice(param_grid["criterion"]),
            "min_samples_split": np.random.choice(param_grid["min_samples_split"]),
        }
    # On definis les paramétre de maniére aléatoire du model
    model.set_params(**parametres)

    #on entraine le model avec X_train, y_train 
    model.fit(X_train, y_train)

    # on prédit les valeur afin de connaitre par la suite la précision des parameter
    y_pred = model.predict(X_test)

    # On calcul l'accuracy afin d'avoir les parametre optimaux
    precision = accuracy_score(y_test, y_pred)

    # On vérifie que la nouvelle précission ne soit pas inférieur a l'ancienne
    if precision > M_precision:
        M_precision = precision
        M_parametres = parametres
    return M_parametres

# Define model and hyperparameters
model = DecisionTreeClassifier()
param_grid = {
    "max_depth": [3, 5, 7, None],
    "criterion": ["gini", "entropy"],
    "min_samples_split": [2, 3, 4, 5],
}

# Perform random search
Param_final = random_search(model, param_grid, X, y, n_iter=100)

# Train the model with best hyperparameter 
# ces deux lignes permette d'afficher les output des parametre optimaux
model.set_params(**Param_final)
model.fit(X_train, y_train)


Training set:  (455, 30) (455,)
Testing set:  (114, 30) (114,)
