# Hyper-Heuristic A.I. Workshop

Note for Event 1 by [Contributors](#Contributors)

## Contents

- [Coefficient of Determination](#Coefficient-of-Determination)

- [RMSLE](#RMSLE)

- [Overfitting](#Overfitting)

- [Hyperparameter](#Hyperparameter)

- [Bootstrap Aggregating](#Bootstrap-Aggregating)

- [Out-Of-Bag Error](#Out-Of-Bag-Error)

## Coefficient of Determination

#### Definition

- Denoted $R^2$


- Used to explain how many percentage of the variance of the original data the model could cover

#### Formula

- $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$


- $SS_{tot} = \sum_{i=1}^{n}(y_i-\bar{y})^2$


- $SS_{reg} = \sum_{i=1}^{n}(p_i-\bar{y})^2$


- $SS_{res} = \sum_{i=1}^{n}(y_i-p_i)^2$


#### The range of $R^2$

- $R^2 \in (-\infty, 1]$


- If $SS_{res} > SS_{tot}$, then $R^2$ become negative


- The $R^2 < 0$ means the model get a worse result than using the mean value

    
#### How to determine that higher $R^2$ is not overfitting?

- Adjusted $R^2$

- AIC

- BIC

#### How to use R² in business interpretation?

- Determine the model is good or not

## RMSLE

#### RMSE

- RMSE $= \sqrt{MSE}$


- MSE $= \frac{1}{n}\sum_{t=1}^{n}(obs_t - pre_t)^2$


#### How to compute RMSLE?

- RMSLE is Root Mean Squared Logarithmic Error, which is RMSE with $\log$


- RMSLE $= \sqrt{ \frac{1}{n} \sum_{i=1}^{n}{( \log(p_i + 1) - \log(a_i + 1) )^2} }$

    - $n$ is the total number of observations in the data set
    
    - $p$ is prediction
    
    - $a$ is actual response

#### Why use $\log$?

- It is the way to make predictions smaller and more comparable


- To scale the predictions (outputs), making them less sensitive to the difference on small digits (e.g. the difference of digits in tens or in thousands when the outputs are showed in digits in millions)


- To transfer the outputs and focus on the increasing ratio rather that increasing quantity

## Overfitting

- Low training error, but high testing error


- Random Forest helps eliminate the problem of overfitting which exists in decision tree

## Hyperparameter

- Is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training.


- Cannot be learned directly from the data in the standard model training process and need to be predefined.


- Can be decided by setting different values, training different models, and choosing the values that test better


- Some examples of hyperparameter:


    - k in k-nearest neighbors

    - Number of clusters in a k-means clustering

    - Number of leaves or depth of a decision tree
    
    - Number of hidden layers in a deep neural network

    - Learning Rate (in many models)


- We can tune hyperparameter again and again in the model to find the fittest ones.

## Bootstrap Aggregating

- Also called Bagging


- Is a machine learning ensemble algorithm designed to improve the stability and accuracy of machine learning algorithms used in classification and regression.


- Could also use to reduce variance and help to avoid overfitting


- Help reduce the impact of outlier on the model if using bootstrapping and create many trees to make predictions.

## Out-Of-Bag Error

- Also called OOB error


- A method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging) to sub-sample data samples used for training


- The mean prediction error on each training sample xᵢ, using only the trees that did not have xᵢ in their bootstrap sample


- If we use OOB to evaluate the model, we don’t need to use cross validation.


- In random forest model, OOB is usually lower than the $R^2$ of training set.

## Contributors

*The list is sorted by last name, first name, and nickname.*