# Structuring Machine Learning Project

### Machine Learning Strategy

![](ml-development-cycle.png)

#### Ideas to improve the the system after training

1. Collect more data
2. more diverse data
3. train longer with gradient descent
4. try different optimization algorithm
5. Try a bigger network or small
6. Try dropout or L2 regularization

Choosing the correct idea is very critical !

### Chain of assumptions in ML

1. Fit Training set well on cost function

    > Train a bigger network or switch to a better optimization algorithm
    
2. Fit dev set well on cost function
    
    > Regularization
    > Bigger Training set
    
3. Fit test set well on cost function

    > Bigger Dev set 
    
4. Performs well in real world

    > Change either the dev set or cost function.
    

### Single number evaluation metric

![](single.png)

### Satisficing and Optimizing metric

![](hqdefault.jpg)

if you have N matrix that you care about it's sometimes reasonable to pick one of them to be optimizing. So you want to do as well as is possible on that one. And then N minus 1 to be satisficing, meaning that so long as they reach some threshold such as running times faster than 100 milliseconds, but so long as they reach some threshold, you don't care how much better it is in that threshold, but they have to reach that threshold.

### Train/dev/test set distributions

choose a dev set and test set to reflect data you expect to get in future and consider important to do well on. 

And, in particular, the dev set and the test set here, should come from the same distribution.

Setting up the dev set, as well as the validation metric, is really defining what target you want to aim at. And hopefully, by setting the dev set and the test set to the same distribution, you're really aiming at whatever target you hope your machine learning team will hit. 

### Size of the dev and test sets

#### train/dev/test set

Conventionally, 70/30 % or 70/20/20%

However depending on the size of the dataset, this can be varied.

Eg. Dataset = 1000000

therefore, a 98/1/1% will be enough !

### Improving model Performance

1. Fit the training set well 
    > Achieve low avoidable bias
    
2. Training set performance generalises pretty well in the dev/test set.
    > Variance is not too bad
    

#### Reducing (avoidable) bias and variance

Human level

> Avoidable Bias

    Train bigger model
    Train longer/better optimization algorithm - momentum, rmsprop, adam etc
    Better NN architecture / hyperparameters search - RNN, CNN
    

Training Error

> Variance

    More Data - generalises better
    regularization - L2, dropout, data augmentation
    Better NN architecture / hyperparameters search - RNN, CNN

Dev Error


## Error Analysis

### Ceiling Analysis

Check mislabeled results (eg. classification problem)

Consider all the incorrect cases and possible improvements and account for the percentage of error of each.
Include incorrect labels as a column to analyse.

Work on improving problems that has significant improvement potential, thus increasing the overall accuracy o the model.

### Build your first system quickly, then iterate

## Training and testing on different distributions

### Option 1

1. Take images from both the distributions and randomly shuffle them into a train, dev, and test set. 

#### advantage 

dev and test sets will all come from the same distribution, so that makes it easier to manage.

#### huge disadvantage, 

dev set, large amount of data can be from one distribution which is not your target distribution.

### Option 2

Training set = both distribution  
dev set and test set = target distribution

### Bias and Variance with mismatched data distributions

Human error  
|  
##### Avoidable Bias
|    
Train error  
|  
##### Variance problem
|  
Train-val error  
|  
##### Data mismatch problem
|  
Val error  
|
##### Degree of overfitting to Val set  
|  
Test error  

#### Dataset

Train set
Train-val set
Val set
Test set

#### Variance Problem 
##### Example

Train error      = 1%  
Train-val error  = 9%  
Val error        = 10%  

#### Data Mismatch Problem 
##### Example

Train error      = 1%  
Train-val error  = 1.5%  
Val error        = 10%  

#### Bias Problem (Avoidable) 
##### Example

Human error      = 0%
Train error      = 10%  
Train-val error  = 11%  
Val error        = 12%


### Addressing data mismatch

1. carry out manual error analysis and try to understand the differences between the training set and the dev/test sets. To avoid overfitting the test set, technically for error analysis, you should manually only look at a dev set and not at the test set. 

##### When you have insight into the nature of the dev set errors, or you have insight into how the dev set may be different or harder than your training set

2. make the training data more similar.

###### Artificial Data Synthesis
be cautious and bear in mind whether or not you might be accidentally simulating data only from a tiny subset of the space of all possible examples.

### Transfer Learning

take this last output layer of the neural network and just delete that and delete also the weights feeding into that last output layer and create a new set of randomly initialized weights just for the last layer and have that as output now

![](TL.jpeg)

#### Pretraining vs fine tuning

You have a model m.
#### Pre-training: 
You have a dataset A on which you train m.
You have a dataset B. Before you start training the model, you initialize some of the parameters of m with the model which is trained on A.
#### Fine-tuning:  
You train m on B.

![](transfer.png)

### End to end DL

![](end.jpg)



![](pros.png)