# Structuring Machine Learning projects

## Table of Contents

* [1. Machine Learning Strategy](#chapter1)
    * [1.1 Introduction](#section_1_1)
    * [1.2 Setting up your goal](#section_1_2)
    * [1.3 Comparing to Humand level Performance](#section_1_3)
    * [1.4 Improving the model Performance ](#section_1_4)
* [2. Machine Learning Strategy - Part 2](#chapter2)
    * [2.1 Error Analysis](#section_2_1)
    * [2.2 Mismatched Training and Dev/test set](#section_2_2)
    * [2.3 Learning from Multiple tasks](#section_2_3)
    * [2.4 End-to-End Deep learning](#section_2_4)

# 1. Machine Learning Strategy <a class="anchor" id="chapter1"></a>

## 1.1 Introduction <a class="anchor" id="section_1_1"></a>

**How to structure a Machine Learning project?**

What is machine Learning strategy?

Let's say we are working on an application to recognize cat. After working it for some time, we have gotten an 90% accuracy, but now we want to improve our model.

We have several ideas to improve our system like:
- collecting more data
- collecting more diverse training set
- training algorithm longer with gradient descent
- Trying Dropout
- Trying Adam
...


When we try to improve deep learning system we have often lot of ideas we can try. The problem is that if we choose poorly, it is possible that we spend a lot of time testing an poor idea without good results at the end.


We need to find a number of strategies, that is ways of analyzing machine learning problem that will point us in the direction of the most promising things to try.


<center><img src="images/10-ML strategy/introduction.PNG" width ="600px"></center>

**Orthogonalization**

<table>
    <thead>
        <tr>
            <th>ML</th>
            <th>knob to tune</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Fit Training set well on cost function</td>
            <td>Bigger Network, Adam ...</td>
        </tr>
        <tr>
            <td>Fit dev set well on cost function</td>
            <td>Regularization, Bigger Training set ...</td>
        </tr>
         <tr>
            <td>Fit Test set well on cost function</td>
            <td>Bigger dev set</td>
        </tr>
         <tr>
            <td>Perform well in real world</td>
            <td>Change dev set or cost function</td>
        </tr>
    </tbody>
</table>

In machine learning, it's nice if you can look at your system and say, this piece of it is wrong. It does not do well on the training set, it does not do well on the dev set, it does not do well on the test set, or it's doing well on the test set but just not in the real world. But figure out exactly what's wrong, and then have exactly one knob, or a specific set of knobs that helps to just solve that problem that is limiting the performance of machine learning system.

## 1.2 Setting up your goal <a class="anchor" id="section_1_2"></a>

**<u>Single Number Evaluation Metric</u>**

When teams starting on a machine learning project, It is often recommended to set up a single real number evaluation metric for the problem.


<u>Example :</u>
<center><img src="images/10-ML strategy/example-metric.PNG" width ="300px"></center>

In this example we have two classifier of cat, A and B. One way to evaluate the performance of the classifiers is to look at its precision and recall.

According to this two metrics, classifier A is better than B at Recall, but B is better than A at Precision. Then we are not sure which classifier is better.

If we are trying a lot of ideas, a lot of different hyperparameters, we want to quickly try out several classifiers and pick out the best ones. But with two evaluations metrics it is difficult to know how to quickly pick one. 

So in this example, rather than using recall and precision, we should take a metrics that combines this two. In machine learning a way to combine precision and recall is F1 score.

<center><img src="images/10-ML strategy/example-metric2.PNG" width ="300px"></center>

One Evaluation metric allows us to quickly tell if classifier A or classifier B is better. And so it speeds up the iterative process improving our machine learning algorithm.

**<u>Train/Dev/Test distributions</u>**

1. Your dev and Test set should come from the same distribution

2. Choose a dev set and test set to reflect data you expect to get in the future and consider important to do well on.

**<u>Size of the Dev and Test sets</u>**

Old way of splitting data set (small datasets):

- 70% Train, 30% Test
- 60% Train, 20% dev, 20% test

Big dataset :

- 98% train set, 1% dev set, 1% test set
 

Size of test set:
- Set your test set to be big enough to give high confidence in the overall performance of your system.


## 1.3 Comparing to Human-level Performance <a class="anchor" id="section_1_3"></a>

**<u>Human-level performance</u>**

So long as Machine Learning is worse than humans, you can:

- Get labeled data from Humans.
- Gain insight from manual error analysis: Why did a person get this right?
- Better analysis of bias / variance

**<u>Avoidable Bias</u>**

Let's take for example a cat Classifier :

- 1st Example : Net images where a Human can easily tell whether there's a cat in the picture or not.
- 2nd Example: Blurry images where even human can't tell whether there'is a cat in this picture or not.

<center>
<table>
    <tbody>
        <tr>
            <td>Humans(~Bayes) Error</td>
            <td>1 %</td>
            <td>7.5 %</td>
        </tr>
         <tr>
            <td>Training Error</td>
            <td>8 %</td>
            <td>8 %</td>
        </tr>
         <tr>
            <td>Dev Error</td>
            <td>10 %</td>
            <td>10 %</td>
        </tr>
    </tbody>    
</table>
</center>

- We called the difference between the Bayes Error or approximation of Bayes Error and the training error : <b>the avoidable bias</b>
- The difference between the training error and the dev error is the <b>variance</b>

As we can see, in the first example we have an avoidable error of 7% and a variance of 2%. In the second example we have an avoidable error of 0.5% and a variance of 2%.

<center>
<table>
    <tbody>
        <tr>
            <td>avoidable Error</td>
            <td>7 %</td>
            <td>0.5 %</td>
        </tr>
        <tr>
            <td>variance</td>
            <td>2 %</td>
            <td>2 %</td>
        </tr>
        <tr>
            <td>solution</td>
            <td>Focus on bias</td>
            <td>Focus on variance</td>
        </tr>
    </tbody>    
</table>
</center>





- Human-level error is a proxy for Bayes error

In practive we want to improve our training performance until we get down to Bayes error but we don't want to do better than Bayes error. 

n the example of the left there is much more potential in focusing on reducing that avoidable bias.

In the right example there is much more potential in focusing on reducing the variance.

**<u>Understanding Human-level performance</u>**

<center><img src="images/10-ML strategy/example-image-classification.PNG" width ="500px"></center>

<br>
<center>
<table>
    <tbody>
        <tr>
            <td>Humans(~Bayes) Error</td>
            <td>1% - 0.7% - 0.5%</td>
            <td>1% - 0.7% - 0.5%</td>
            <td>0.5%</td>
        </tr>
         <tr>
            <td>Training Error</td>
            <td>5 %</td>
            <td>1 %</td>
            <td>0.7 %</td>
        </tr>
        <tr>
            <td>Dev Error</td>
            <td>6 %</td>
            <td>5 %</td>
            <td>0.8 %</td>
        </tr>
        <tr>
            <td>Focus on</td>
            <td>Bias</td>
            <td>Variance</td>
            <td>Much harder to choose</td>
        </tr>
    </tbody>    
</table>
</center>

As you approach Human-level performance it is actually much harder to tease out the bias and variance effects.


To recap, having an estimate of human-level performance gives you an estimate of Bayes error. And this allows you to more quickly make decisions as to whether we should focus on trying to reduce a bias of trying to reduce the variance of our algorithm.

## 1.4 Improving the model Performance<a class="anchor" id="section_1_4"></a>

In order to improve the model performance, we have to reduce the bias and variance of our model.


**Reducing Bias (avoidable bias)**

Reducing the gap between Training error and Human error:

- Train bigger model
- Train longer / better optimization algorithms (Momentum, RMSprop, Adam)
- NN architecture / hyperparamters search 

**Reducing Variance**

Reducing the gap between training error and Dev error:

- More data : getting more data to train on can help us generalize better to dev set data
- Regularization (L2, dropout, data augmentation)
- NN architecture / Hyperparameters search



# 2. Machine Learning Strategy - Part 2<a class="anchor" id="chapter2"></a>

## 2.1 Error Analysis<a class="anchor" id="section_2_1"></a>

<u>Carrying out Error Analysis :</u>

Analyze the error of our case using a sheet and look at the mislabeled examples in the development set.


<center><img src="images/10-ML strategy/error-analysis.PNG" width ="500px"></center>

And look at the mislabeled examples for false positives and false negatives. And just count up the number of errors that fall into various different categories. During this process, you might be inspired to generate new categories of errors, like we saw. If you're looking through the examples and you say gee, there are a lot of Instagram filters, or Snapchat filters, they're also messing up my classifier. You can create new categories during that process. But by counting up the fraction of examples that are mislabeled in different ways, often this will help you prioritize.


## 2.2 Mismatched Training and Dev/test set<a class="anchor" id="section_2_2"></a>

## 2.3 Learning from Multiple tasks<a class="anchor" id="section_2_3"></a>

## 2.4 End-to-End Deep learning<a class="anchor" id="section_2_4"></a>