# Fundamentals of Machine Learning

## Four branches of Machine Learning

### Supervised Learning

It consists of learning to map input data to known targets (also called annotations), given a set of examples(often annotated by humans.

Examples:
   * Optical Character Recognition.
   * Speech Recognition.
   * image Classification.
   * Language Translation.
   * Sequence Generation.
   * Syntax tree prediction.
   * Object detection.
   * Image segmentation.

### Unsupervised Learning

This consists of finding interesting transformations of the input data without the help of any targets.
Unsupervised learning is often a necessary step in better understanding a dataset before attempting to solve a supervised-learning problem. 
Dimensionality reduction and clustering are well-known categories of unsupervised learning.


### Self Supervised Learning

Self supervised learning is supervised learning without human-annotated labels. It is supervised learning without human in the loop.

### Reinforcement Learning

In reinforcement learning, an agent receives information about its environment and learns to choose actions that will maximize some reward.

## Evaluating machine learning models

Evaluating a model always boils down to splitting the available data into three sets: training, validation and test. You train on th training data and evaluate your model on the validation data. Once your model is ready for prime time, you test it one final time on the test data.

Hyperparameters are tuned on the validation set and not on the test set as it will result in overfitting.

### Simple hold out validation
Set apart some fraction of your data as your test set. Train on the remaining data, and evaluate on the test set. 

### K fold Validation 
Split the data into K partitions of equal size. For each partition i, train a model on the remaining K-1 partitions, and evaluate it on partition i. Final score is then the average of the K scores obtained. 

### Data representativeness 
Both training set and test set should be representative of the data at hand.

### The arrow of time
If the model is trying to predict the future given the past, then you should not randomly shuffle the data before splitting it, because it will create a temporal leak.

### Redundancy in data
Make sure your training set and validation set are disjoint.

## Data preprocessing

### Vectorization
All input and targets in neural networks must be tensors of floating point data. Converting the input data into tensors is called data vectorization.


### Value Normalization
To make learning easier for the network, the data should have the following characteristics:
* Take small values - most values should be in the range of 0-1.
* Be homogenous - all features should take values in roughly the same range.

### Feature Engineering

Feature engineering is the process of using your own knowledge about the data and about the machine-learning algorithm at hand ( in this case, neural network) to make the algorithm work better by applying hardcoded (nonlearned) transformations to the data before it goes into the model.

## Overfitting and Underfitting

Optimization refers to the process of adjusting model to get the best preformance possible on the training data.
Generalization refers to how well the trained model performs on data it has never seen before.

At the begining of training, optimization and generalization are correlated: the lower the loss on training data, the lower the loss on test data. While this is happening, model is said to be underfit. There is still progress to be made; the network hasn't yet modeled all relevent patterns in the training data. But after a certain number of iternations on the training data, generalization stops improving, and validation metrics stall and then begin to degrade: the model is starting to overfit. That is, it's begining to learn patterns that are specific to the training data but that are misleading or irrelevant when it comes to new data.

### Regularization
If a network can only afford to memorize a small number of patterns, the optimization process will force it to focus on the most prominent patters, which have better change of generalizing well.
The process of fighting overfitting this way is called regularization.

#### Reducing the network's size
The simpleest way to prevent overfitting is to reduce the size of the model.

#### Adding weight regularization
Weight regularization provides an approach to reduce the overfitting of a deep learning neural network model on the training data and improve the performance of the model on new data. Its done by adding to the loss function of the network a cost associated with having large weights.
L1 regularization -  The cost added is proportional to the absolute value of the weight coefficients
L2 regularization - The cost added is proportional to the square of the value of the weight coefficients. Its also called weight decay.

#### Adding dropout
Dropout is one of the most effective and most commonly used regularization techniques for neural networks. Dropout applied to a layer, consists of randomly dropping out a number of output features of the layer during training. Dropout rate is generally set between 0.2-0.5.