# MAchine Learning Intro

Machine learning is the scientific study of algorithms and statistical models to perform a specific task effectively without using explicit instructions. Machine learning algorithms include – supervised and unsupervised algorithms

## Supervised Learning 
In supervised learning, the target is already known and is used in the model prediction.

**Classification**: When target variable is categorical 
**Regression**: When target variable is continuous 


## Unsupervised Learning 
In unsupervised learning, the target is not known and is supposed to be determined through the models.

**Clustering**: Customer segmentation  
**Association**: Market basket analysis 

![image.png](attachment:image.png)

# Hyper-parameters optimization 

The objective of the learning algorithm is to **find a function that reduces error** over a dataset. Hyper-parameters are not directly learned by the model and are **important to prevent over-fitting**. Hyper-parameters are specified outside the training procedure and they **control the flexibility** of the model. 

The **process of finding the best hyper-parameter** for a given dataset is called hyper-parameter optimization. The objective is to **minimize the generalization error**. Generalization is the ability of an algorithm to be effective across various inputs. The search for the best hyper-parameter consists of **hyper-parameter space, method of sampling, cross-validation scheme and performance metrics**

- Challenges 
- Search algorithms 
- Cross validation

![image.png](attachment:image.png)

## Challenges

It is impossible to define a formula to get the best hyper-parameter; hence different combinations of hyper-parameters need to be evaluated. Some hyper-parameters affect performance a lot, however, most of the hyper-parameters do not have a huge effect on the performance. Hence, it is important to identify those hyper-parameters that impact the performance of machine learning and to optimize them. 

Below is the list of hyper-parameters that were found to have a huge effect on the performance of respective machine learning algorithms:



**Decision Tree**
- **Max depth** – The maximum depth of the tree. 
- **Min samples leaf** – The minimum number of samples required to be at a leaf node. 

**Random Forest**
- **N estimators** – The number of trees in the forest
- **Max depth** – The maximum depth of the tree. 
- **Min samples leaf** – The minimum number of samples required to be at a leaf node.

**Gradient Boosting**
- **Loss** – The loss function to be optimized, ‘deviance’ refers to logistic regression and ‘exponential’ refers to AdaBoost algorithm.
- **N estimators** – The number of boosting stages to perform. Gradient boosting is fairly ***robust to over-fitting so a large number usually results in better performance.***
- **Max depth** – The maximum depth of the tree. 
- **Min samples leaf** – The minimum number of samples required to be at a leaf node.

**K Nearest Neighbors**
- **N neighbors** – Number of neighbors to use by default for knn

**Artificial Neural Network**
- **Activation** – Activation function for the hidden layer (default is relu).
- **Solver** – The solver for weight optimization (default is adam)
- **Hidden layer sizes** – The ith element represents the number of neurons in the ith hidden layer.

## Search Algorithms 

- **Manual Search**: Identify regions of promising hyper-parameters to delimit Grid Search. Also helps to get familiar with hyper-parameter and their effect on the model. However, it lacks reproducibility and does not explore the entire hyper-parameter space. 

- **Grid Search**: Exhaustive search through a **specified subset of hyper-parameters** of a learning algorithm. Examines all possible combinations. However, it is computationally expensive, hyper-parameter values are determined manually and **not ideal for continuous hyper-parameters**

- **Random Search**: Hyper-parameter values are **selected by random draws from a uniform distribution**. Examines some combinations and the user determines the number of combinations to examine. However, there is only a small reduction in efficiency in low dimension spaces. 


![image.png](attachment:image.png)

## Cross Validation 

- The training set is divided into k-folds. 
- Model is trained on (k-1) folds and tested on the k-th fold. 
- The process is repeated k-times. 
- The final performance is averaged. 

- Cross-validation can be used to select the best hyper-parameters, select the best performing model and estimate the generalization error of a given model. 

- **Stratified k-fold** cross-validation is useful when the dataset is imbalanced. Each fold has a similar proportion of the observation of each class and there is no overlap of the test sets. 

The performance of the machine learning model should be constant across different datasets. When the model performs well on train set but not on live data the model over-fits the train data. 

- Under-fit: High Bias
- Over-fit: High Variance

# Reference

- https://analyticsindiamag.com/ai-trends/study-notes-on-machine-learning-pipeline-feature-engineering-feature-selection-and-hyper-parameters-optimization/

- https://analyticsindiamag.com/ai-trends/common-feature-engineering-techniques-to-tackle-real-world-data/