# Structure Machine Learning Projects


## Error Analysis

## Mismatched training and dev/test set

### Training and testing on different distributions
**cat app example**
- data from weblogs, e.g., 200,000
- data from mobile app, e.g. 10,000
- goal: accurate detection of cats in mobile app
  
Option 1: shuffle all data and split into train/dev/test (say, 205000, 2500, 2500)
- pro: simple, train/dev/test come from same distribution
- cons: a lot of data are from weblogs, so dev/test set are not representative of mobile app data

Option 2: 
- because the test goal is to detect cats in "mobile app", we can put all mobile app data into dev/test set
- training set: 205000 with 200000 from weblogs and 5000 from mobile app
- dev/test set: 2500 from mobile app for each
- pro: dev/test set are representative of mobile app data
- cons: training distribution is different from dev/test set
  - can be addressed using techniques such as XX

### Bias and Variance with mismatched data distribution

- assume humand error is 0% (optimal error)
- training error = 1%
- dev error = 10%

**Diagnostics**:
- if the training and dev set are from the same distribution, then the model has high variance (overfitting)
- if the training and dev set are from different distributions, there might not be a high-variance problem. 
  - the dev set contains images that are much more difficult to classify than the training set.
  - Two things has changed:
    - the algorithm saw data in the training but not in the dev set - variance part
    - the distribution of data in dev set is different from the training set - dev set is just different

**solution**:
- further split the trainig set into train set and tran-dev set, where there two have the same distribution
  - train set is used to train models
  - tran-dev set is used to evaluate models and tune hyperparameters
  - dev set is used to evaluate models as well
- error analysis
  - `variance problem`: tran-dev error - training error
    - training error: 1%
    - tran-dev error: 9%
    - dev error: 10%
  - `data mismatch problem`: dev error - tran-dev error
    - training error: 1%
    - tran-dev error: 1.5%
    - dev error: 10%
  - `variance + data mismatch problem`
    - human error: 0%
    - training error: 10%
    - train-dev error: 11%
    - dev error: 20%
  - `overfit to dev test`: test error - dev error
    - dev error: 1%
    - test error: 10%
    - maybe bigger dev data set can help

`better test error`: need check the dev set to see if it is representative of the test set
- human-level error: 4%
- training error: 7%
- train-dev error: 10%
- dev error: 6%
- test error: 6%

|error| general speech data | rearview mirror speech data|
|---|---|---|
|human-level error| 4% | 6% |
|training error| 7% | 6% |
|train-dev error| 10% | 6% |


### Addressing data mismatch
- carry out manual error anaysis to try to understand difference between training and dev/test sets
  - should look at dev set, not test set to avoid data leakage
- make training data more similar to dev/test sets, or collect more data similar to dev/test sets
  - artificial data synthesis
    - add noise to existing data
  - e.g., cat app example, add more mobile app data to training set
  - car object detection
    - car images from real world (e.g., so many different cars)
    - car images from computer graph techniques (e.g., 20 unique cars)
    - if train on such similators, the model will not generalize well to real world

## Learning from multiple tasks

### Transfer learning

### Multi-task learning

object dection example: classication and regression

## End-to-end deep learning

speech recognition example
- audio -> features -> phonemes -> words -> transcript

learn multiple stages at once