<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="figures/MLPG-Book-Cover-Small.png"><br>

This notebook contains an excerpt from the **`Machine Learning Project Guidelines - For Beginners`** book written by *Balasubramanian Chandran*; the content is available [on GitHub](https://github.com/BalaChandranGH/Books/ML-Project-Guidelines).

<br>
<!--NAVIGATION-->

<[ [Stage-6: Model Training](12.00-mlpg-Stage-6-Model-Training.ipynb) | [Contents and Acronyms](00.00-mlpg-Contents-and-Acronyms.ipynb) | [Stage-8: Model Evaluation](14.00-mlpg-Stage-8-Model-Evaluation.ipynb) ]>

# 13. Stage-7: Model Refinement

ML team owns full responsibility for this stage. During this stage, two main activities are performed:
  1. Based on the metrics generated at Stage 6, the models are compared and initial selections are made
  2. The selected models are refined to improve their performances using _`Training datasets`_

## 13.1. Hyperparameters optimization
### 13.1.1. Differences between Model Parameters and Model Hyperparameters

![](figures/MLPG-DiffParamsHyperparams.png)

### 13.1.2. Hyperparameters-Tuning for Classification Algorithms
`A model is a hypothesis and its parameters allow us to tailor the hypothesis (i.e., the behavior of the algorithm) to a specific dataset.`
* The more hyperparameters of an algorithm that one needs to tune, the slower the tuning process is. Therefore, it is desirable to select a minimum subset of model hyperparameters to search or tune
* Not all model hyperparameters are equally important. Some hyperparameters have an outsized effect on the behavior, and in turn, the performance of an ML algorithm
* As an ML practitioner, one must know which hyperparameters to focus on to get a good result quickly

The following table summarizes the suggestions for hyperparameters-tuning for 7 Classification algorithms. Please note, that the list of algorithms and the hyperparameters are not exhaustive; these are just examples.

![](figures/MLPG-HyperparamsTuning.png)

### 13.1.3. Hyperparameters Optimization with Random Search and Grid Search
#### 13.1.3.1. Scikit-Learn API for Hyperparameter Optimization
* Sklearn provides the _`RandomizedSearchCV`_ for random search and _`GridSearchCV`_ for grid search.
* Both techniques evaluate models for a given hyperparameter vector using cross-validation, hence the “CV” suffix of each class name
* Both classes require two arguments: `Model` and `Search space`; This is defined as a `dictionary` where the names are the hyperparameter arguments to the model and the values are discrete values or distribution of values to sample in the case of a random search
* Use _cross-validation_ objects for model evaluation,
  - Use _`RepeatedStratifiedKFold`_ class for the **Classification** tasks
  - Use _`RepeatedKFold`_ class for the **Regression** tasks
* Both hyperparameter optimization classes also provide a scoring argument that takes a string indicating the metric to optimize
  - For classification, this is a positive measure, such as _`accuracy`_, and the metric must be maximizing, meaning better models result in larger scores
  - For regression, this is a negative error measure, such as _`neg_mean_absolute_error`_ for a negative version of the mean absolute error, where values closer to zero represent less prediction error by the model
* The search can be made parallel, e.g. use all of the CPU cores by specifying the _`n_jobs`_ argument as an integer with the number of cores in your system, e.g. 8. Or you can set it to be -1 to automatically use all of the cores in your system
* Once defined, the search is performed by calling the _`fit()`_ function and providing a dataset used to train and evaluate model hyperparameter combinations using cross-validation
* Running the search may take minutes or hours, depending on the size of the search space and the speed of your hardware. You’ll often want to tailor the search to how much time you have rather than the possibility of what could be searched
* At the end of the search, you can access all of the results via attributes on the class. Perhaps the most important attributes are the **best score** observed and the **hyperparameters** that achieved the best score
* Once you know the set of hyperparameters that achieve the best result, you can then define a new model, set the values of each hyperparameter, then fit the model on all available data. This model can then be used to make predictions on new data
* _`GridSearchCV`_ is time efficient and produces good results
* _`RandomizedSearchCV`_ takes more time but produces better results
* In addition to the hyperparameters mentioned in the above table, the _`alpha`_  & _`gamma`_ hyperparameters can also be used for SVMs

#### 13.1.3.2. Random Search for Classification (sample code)

#### 13.1.3.3. Grid Search for Classification (sample code)
* Using the grid search is much like using the random search for classification
* The main difference is that the search space must be a discrete grid to be searched. This means that instead of using a log-uniform distribution for C, we can specify discrete values on a log scale

#### 13.1.3.4. Random Search for Regression (sample code)

#### 13.1.3.5. Grid Search for Regression (sample code)

**IMPORTANT NOTE**: _Apply any algorithm fine-tuning of hyperparameter technique on training & testing datasets separately to prevent DATA LEAKAGE. In other words, DO NOT apply algorithm fine-tuning techniques before splitting the datasets into training & testing datasets._

## 13.2. Deliverables from Stage-7
* Selected & refined models ready for evaluation using test datasets
* Evaluation Metrics

## 13.3. Notebook development tips

<!--NAVIGATION-->
<br>

<[ [Stage-6: Model Training](12.00-mlpg-Stage-6-Model-Training.ipynb) | [Contents and Acronyms](00.00-mlpg-Contents-and-Acronyms.ipynb) | [Stage-8: Model Evaluation](14.00-mlpg-Stage-8-Model-Evaluation.ipynb) ]>