# 1. Error Analysis

## 1.1 Carrying out error analysis

**Goal:**
- To carry out error analysis, you should find **a set of mislabeled examples** in your dev set. And look at the mislabeled examples for **false positives** and **false negatives**. And just count up the number of errors that fall into various different categories. During this process, you might be inspired to generate new categories of errors. If you're looking through the examples and you say, there are a lot of Instagram filters, or Snapchat filters, they're also messing up my classifier. You can create new categories during that process. But by counting up the fraction of examples that are mislabeled in different ways, often this will help you prioritize. Or give you inspiration for new directions to go in.

![](./imgs/error-analysis1.png)

## 1.2 Cleaning up incorrectly labeled data

**Incorrectly Labeled Samples:**

- DL algorithms are quite robust to random errors in the training set but less robust in systematic errors. 

- **Error analysis**: if the proportion of errors due incorrect labels is large, need to modify train and test set.


**Correcting incorrect dev/test set examples**:

- apply same process to dev&test sets so that they are crom teh same distribution

- consider examining examples your algorithm got right as well as ones it got wrong

- Train and dev/test data may now come from slightly different distributions


## 1.3 Build your first system quickly, then iterate

**Steps**:
- Set up dev/test set and metric
- Build initial system quickly
- Use Bias/Variance analysis & Error analysis to prioritize next steps

# 2. Mismatched Training and Dev/Test Set

## 2.1 Training and testing on different distributions

**Scenario: mobile app image classifiaction**
- classify low-resolution mobile app derived images, also a small amount
- have high-resoluton images crawled online, a great amount

![](./imgs/mistached-train-test-distribution.png)

**Option1**:
- Advantage: both dev and test come from the same distributions 
- Disadvantage: the larger proportion in dev/test are high resolution images which are not the target
- **reject**!

**Option2**:
- Advantage: hit the target: mobile app images
- Disadvantage: the distribution of train set is different from dev/test set
- **Accepted**

**Another scenario**:

![](./imgs/mistached-train-test-distribution1.png)

## 2.2 Bias and Variance with mismatched data and distributions

**Error analysis in mismatched data:**
- Def: **Train-dev set**, which is drawn from the training set but not train on it, only with the same distributions. 
- Set human error: ~0%


1. train error = 1%, train-dev error = 9%, dev error = 10% 
    - high variance, overfitting the train set
2. train error = 1%, train-dev error = 1.5%, dev error = 10%
    - data mismatch issue
3. train error = 10%, train-dev error = 11%, dev error = 12%
    - avoidable bias, underfitting the train set
4. train error = 10%, train-dev error = 11%, dev error = 20%
    - avoidable bias + data mismatch
    
![](./imgs/mistached-train-test-error-analysis.PNG)


![](./imgs/mistached-train-test-error-analysis-exp.PNG)


## 2.3 Addressing data mismatch 

**Attempts:**
- carry out manual error analysis to try to understand difference between training adn dev/test sets
- make training data more similar: or collect more data similar to dev/test sets
    - **artificial data synthesis**
        - potential drawback: only synthesizing a small part of the overall label set. 

# 3. Learning from Multiple Tasks 

## 3.1 Transfer learning

**Def:**
- Pre-training: use the pre-trained neural networks for other tasks
- Fine-tuining: tune the weights for the target object
- **The reason is that the frist layers usually extract the nature of images such as the poitns/edges/cornors.**

**When transfer learning makes sense:**
- When transfer learning works is that there are a lot of data you're transfering from but usually relatively less data for the problem you're transferring to.
- **Low level features** from A could be helpful for learning B.

## 3.2 Multi-task learning

**Def:**
- Unlike Softmax which assigns single label to an object, one object can have many labels in multi-task learning.
- e.g., self-driving car detection: four vectors for an image (pedestrain, car, stop sign, traffic lights)
- We can train several models to classify each vector but if the first layers have the same low-level features, we can use them together in multi-task learning. 
    - In the label set, if some objects do not have the labels for certain attribute, we can just omit them in classification and cost function.

![](./imgs/multi-task-learning.PNG)

**When multi-task learning makes sense:**
- Training on a set of tasks that could benefit from having shared lower-level features
- Ususlly: amount of data you have for each task is quite similar
    - it can boost the train data for each samples
- Can train a big enough neural network to do well on all the tasks

    ![](./imgs/multi-task-learning-ad.PNG)

# 4. End-to-end Deep Learning

## 4.1 What is end-to-end deep learning?

 **Def:**
 - An end-to-end model learns all the features that can occur between the original inputs (x) and the final outputs (y).
 - A machine learning model can directly convert an input data into an output prediction bypassing the intermediate steps that usually occur in a traditional pipeline.
 
    ![](./imgs/end2end-exp1.PNG)
    
    ![](./imgs/end2end-exp2.PNG)
    
    ![](./imgs/end2end-exp3.PNG)

## 4.2 Whether to use end-to-end deep learning?

**Pros:**
- Let the data speak
- Less hadn-designing of components needed

**Cons:**
- May need large amount of data
- Excludes potentially useful hand-designed components

**Applying end-to-end deep learning:**
- **key question**: do you have sufficient data to learn a function of the complexity needed to map x to y? 
- Limited in some complex cases such as self-driving car.

-----------------

# Quiz

**1. "Based on table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think?"**

- The algorithm does better on the distribution of data it trained on. But you don’t know if it’s because it trained on that no distribution or if it really is easier. To get a better sense, measure human-level error separately on both distributions.
- In other words, if human-level error is smaller, the distribution is easy to learn. 

---

# Assignments

No assignment in this week. 