# HyperParameter Tuning

When building machine learning models, often times we do not know of the best architecture to choose from, best parameters to use to get better performing model. As a result of there being a vast number of parameters to try and choose from, we typically instruct the computer to perform this section automatically.

Parameters we used to define a model are called **Hyperparameters**. Hence the process of automatically letting the computer select the best of this parameters is what we call **Hyperparameter tunning**.

Selecting the best **Hyperparameters** helps in answering different model design questions such as:

- How many trees should you use  in a random forest classifier
- How many layers should your neural network have
- How many neurons should be used in your neural network
- What should you set the learning rate to in a gradient descent
- What should be the **K** in your **KNN** model
- What should be the maximum depth for a decision tree

## Hyperparameters and Model Parameters

**Hyperparameters** are not **model parameters** in anyway. Model parameters are learnt during training the model and the model parameters specify how to transform the input data into the desired output. While as hyperparameters define how our model is actually structured.

Example take a neural network, the weights and biases are the **model parameters**. The total number of layers and the neurons in each layer that define the structure of the neural network are the actual **hyperparameters**.

There is no way to calculate how to configure your hyperparamers. The only way is through experimentation, trial and error with an aim of featuring out what works for us best.

- Define a model
- Set the range of possible values for all hyperparameters
- Define a method for sampling hyperparameter values
- Define an evaluative criteria to judge the model performance on
- Define a cross-validation method

## Data Splitting And Model Evaluation

The main aim of splitting data into different sets is that, in machine learning we mainly want the model to learn from examples and be able to generalize well on unseen datasets. Training and testing model performance of the same dataset will not give an accurate representation of the model accuracy and performance.

To avoid this, we split the data into training, validation and testing sets. This witholding of data prevents **data leakage** in which we train the model and evaluate/test it on the same data.

### Training And Testing Split

When we are just training and testing a model, this approach is the best to use since, we withhold some data that the model has not seen to evaluate the model. When it comes to **hyperparameter tunning**, we need to be able to also evaluate the model's ability to generalize well on unseen data.

Using testing data on this evaluation we end up fitting the model on the test data as well, a term called **data leakage**.

### Training, Validation And Testing Split

To avoid the data leakage as we discussed, we need to have another dataset to evaluate the **optimized** model on before finally performing the actual model testing using the test dataset, this helps to avoid **datta leakage**.


**Training dataset(optimizes the model's params)**

**Validation dataset(optimizes the model's architecture)**

**Testing dataset(evaluates the optimized model)**

There are other advanced techinques you can employ to achieve the same like: The use of **K-Fold** validation. That is able to train the model and perform evaluations without introducing data leakage.

### Hyperparameter Tuning

Hyperparameter tunning is how we sample possible model architecture candidates from the space of possible hyperparameter values. There are couple of ways to perform this search for the best parameters of the model architecture.

1. Grid Search
2. Random Search
3. Bayesian Optimization

## Grid Search Using Sklearn Python

In [43]:
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV, train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

In [44]:
iris_dataset = load_iris()
iris_dataset

{'data': array([[5.1, 3.5, 1.4, 0.2],
        [4.9, 3. , 1.4, 0.2],
        [4.7, 3.2, 1.3, 0.2],
        [4.6, 3.1, 1.5, 0.2],
        [5. , 3.6, 1.4, 0.2],
        [5.4, 3.9, 1.7, 0.4],
        [4.6, 3.4, 1.4, 0.3],
        [5. , 3.4, 1.5, 0.2],
        [4.4, 2.9, 1.4, 0.2],
        [4.9, 3.1, 1.5, 0.1],
        [5.4, 3.7, 1.5, 0.2],
        [4.8, 3.4, 1.6, 0.2],
        [4.8, 3. , 1.4, 0.1],
        [4.3, 3. , 1.1, 0.1],
        [5.8, 4. , 1.2, 0.2],
        [5.7, 4.4, 1.5, 0.4],
        [5.4, 3.9, 1.3, 0.4],
        [5.1, 3.5, 1.4, 0.3],
        [5.7, 3.8, 1.7, 0.3],
        [5.1, 3.8, 1.5, 0.3],
        [5.4, 3.4, 1.7, 0.2],
        [5.1, 3.7, 1.5, 0.4],
        [4.6, 3.6, 1. , 0.2],
        [5.1, 3.3, 1.7, 0.5],
        [4.8, 3.4, 1.9, 0.2],
        [5. , 3. , 1.6, 0.2],
        [5. , 3.4, 1.6, 0.4],
        [5.2, 3.5, 1.5, 0.2],
        [5.2, 3.4, 1.4, 0.2],
        [4.7, 3.2, 1.6, 0.2],
        [4.8, 3.1, 1.6, 0.2],
        [5.4, 3.4, 1.5, 0.4],
        [5.2, 4.1, 1.5, 0.1],
  

In [45]:
df = pd.DataFrame(iris_dataset.data, columns=iris_dataset.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [46]:
df["target"] = iris_dataset.target

In [47]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


### X-y split

In [48]:
X = df.drop("target", axis = 1)
y = df["target"]

In [49]:
X.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [50]:
y.head()

0    0
1    0
2    0
3    0
4    0
Name: target, dtype: int64

In [51]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=90)

In [52]:
X_train.shape

(120, 4)

In [53]:
X_test.shape

(30, 4)

### Define A Model

In [54]:
rfc = RandomForestClassifier()

### Set the range of possible values for all hyperparameters

In [55]:
parameters = {'criterion': ['gini', 'entropy'], 
              'n_estimators':[10, 100, 150, 200], 
              'min_samples_leaf': [1, 2, 4, 6],
              'max_features': ['auto', 'sqrt', 'log2']
             }

In [56]:
kfold = StratifiedKFold(n_splits = 5)

In [57]:
clf = GridSearchCV(rfc, parameters, scoring = 'accuracy', n_jobs = 2, cv=kfold)

In [58]:
clf.fit(X_train, y_train)

GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=None, shuffle=False),
             estimator=RandomForestClassifier(), n_jobs=2,
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_features': ['auto', 'sqrt', 'log2'],
                         'min_samples_leaf': [1, 2, 4, 6],
                         'n_estimators': [10, 100, 150, 200]},
             scoring='accuracy')

In [59]:
clf.get_params()

{'cv': StratifiedKFold(n_splits=5, random_state=None, shuffle=False),
 'error_score': nan,
 'estimator__bootstrap': True,
 'estimator__ccp_alpha': 0.0,
 'estimator__class_weight': None,
 'estimator__criterion': 'gini',
 'estimator__max_depth': None,
 'estimator__max_features': 'auto',
 'estimator__max_leaf_nodes': None,
 'estimator__max_samples': None,
 'estimator__min_impurity_decrease': 0.0,
 'estimator__min_samples_leaf': 1,
 'estimator__min_samples_split': 2,
 'estimator__min_weight_fraction_leaf': 0.0,
 'estimator__n_estimators': 100,
 'estimator__n_jobs': None,
 'estimator__oob_score': False,
 'estimator__random_state': None,
 'estimator__verbose': 0,
 'estimator__warm_start': False,
 'estimator': RandomForestClassifier(),
 'n_jobs': 2,
 'param_grid': {'criterion': ['gini', 'entropy'],
  'n_estimators': [10, 100, 150, 200],
  'min_samples_leaf': [1, 2, 4, 6],
  'max_features': ['auto', 'sqrt', 'log2']},
 'pre_dispatch': '2*n_jobs',
 'refit': True,
 'return_train_score': False,
 '

In [60]:
clf.best_params_

{'criterion': 'gini',
 'max_features': 'sqrt',
 'min_samples_leaf': 1,
 'n_estimators': 10}

In [61]:
clf.best_score_

0.9666666666666666

### Make prediction with the best found params

In [62]:
clf.predict([X.iloc[0].values])

  "X does not have valid feature names, but"


array([0])

### Model Test Score

In [63]:
clf.score(X_test, y_test)

0.9666666666666667

## Random Search Using Sklearn Python

In [22]:
from sklearn.datasets import load_iris
from sklearn.model_selection import RandomizedSearchCV, train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

In [4]:
iris_dataset = load_iris()

In [8]:
df = pd.DataFrame(data = iris_dataset.data, columns=iris_dataset.feature_names)
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [10]:
df["target"] = iris_dataset.target
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [11]:
X = df.drop("target", axis=1)
y = df["target"]

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [14]:
X_train.shape

(120, 4)

In [15]:
X_test.shape

(30, 4)

### Define A Model

In [20]:
ran_forest = RandomForestClassifier()

### Set the range of possible values for all hyperparameters

In [26]:
parameters = {'criterion': ['gini', 'entropy'], 
              'n_estimators':[10, 100, 150, 200], 
              'min_samples_leaf': [1, 2, 4, 6],
              'max_features': ['auto', 'sqrt', 'log2']
             }

In [27]:
kfold = StratifiedKFold(n_splits = 5)

In [28]:
clf = RandomizedSearchCV(ran_forest, parameters, scoring = 'accuracy', n_jobs = 2, cv=kfold)

In [29]:
clf.fit(X_train, y_train)

RandomizedSearchCV(cv=StratifiedKFold(n_splits=5, random_state=None, shuffle=False),
                   estimator=RandomForestClassifier(), n_jobs=2,
                   param_distributions={'criterion': ['gini', 'entropy'],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2'],
                                        'min_samples_leaf': [1, 2, 4, 6],
                                        'n_estimators': [10, 100, 150, 200]},
                   scoring='accuracy')

In [30]:
clf.get_params()

{'cv': StratifiedKFold(n_splits=5, random_state=None, shuffle=False),
 'error_score': nan,
 'estimator__bootstrap': True,
 'estimator__ccp_alpha': 0.0,
 'estimator__class_weight': None,
 'estimator__criterion': 'gini',
 'estimator__max_depth': None,
 'estimator__max_features': 'auto',
 'estimator__max_leaf_nodes': None,
 'estimator__max_samples': None,
 'estimator__min_impurity_decrease': 0.0,
 'estimator__min_samples_leaf': 1,
 'estimator__min_samples_split': 2,
 'estimator__min_weight_fraction_leaf': 0.0,
 'estimator__n_estimators': 100,
 'estimator__n_jobs': None,
 'estimator__oob_score': False,
 'estimator__random_state': None,
 'estimator__verbose': 0,
 'estimator__warm_start': False,
 'estimator': RandomForestClassifier(),
 'n_iter': 10,
 'n_jobs': 2,
 'param_distributions': {'criterion': ['gini', 'entropy'],
  'n_estimators': [10, 100, 150, 200],
  'min_samples_leaf': [1, 2, 4, 6],
  'max_features': ['auto', 'sqrt', 'log2']},
 'pre_dispatch': '2*n_jobs',
 'random_state': None,
 

In [31]:
clf.best_params_

{'n_estimators': 10,
 'min_samples_leaf': 2,
 'max_features': 'log2',
 'criterion': 'entropy'}

In [32]:
clf.best_score_

0.9666666666666666

In [33]:
clf.score(X_test, y_test)

0.9333333333333333

## References:

[Jeremy Jordan Hyperparameter tuning for machine learning models](https://www.jeremyjordan.me/hyperparameter-tuning/)