# Error Analysis for Cat Classification Model

## Overview
Error analysis is a critical step in improving the performance of our cat classification model. By examining the errors made by our algorithm, we can gain valuable insights into what aspects need refinement.


## Example
   ### Current Model Performance
   - The model currently has a 10% error rate on the development set.

   ### Discovery
   - A review of mislabeled images revealed that some errors are due to the model confusing dogs with cats.

   ### Proposed Error Analysis Approach

   1. **Sample Analysis**
      - Randomly select 100 mislabeled images from the development set.
      - Manually count how many of these are dogs.

   2. **Decision Criteria**
      - If a significant portion (e.g., 50%) of the errors are due to dog images being misclassified as cats, prioritizing improvements in this area could substantially reduce the error rate.

   3. **Outcome**
      - If only a small fraction (e.g., 5 out of 100) are dog images, this may not be the most effective use of resources.
      - If a large fraction (e.g., 50 out of 100) are dog images, focusing on distinguishing dogs from cats could reduce the error rate by up to half (the ceiling effect).

   4. **Next Steps**
      - Based on the findings from the error analysis, develop a plan to enhance the model's ability to differentiate between cats and dogs.
      - Implement the improvements and re-evaluate the model's performance.


# Cleaning up Incorrectly Labeled Data

- Deep learning algorithms are robust to random errors in the training set but less so to systematic errors. However, if possible, you should correct these labels.

- If you want to check if the labels in the dev/test set are incorrectly assigned, you should also check the tally with the incorrect label column. For example:

### Error Analysis Table

| Image | Dog | Great Cat | Blurry | Incorrectly labeled | Comments |
|-------|-----|-----------|--------|---------------------|----------|
| ...   |     |           |        |                     |          |
| 98    |     | ✓         |        |                     | Labeler missed cat in background |
| 99    |     |           | ✓      |                     |          |
| 100   |     |           |        | ✓                   | Drawing of a cat; Not a real cat. |
| % of total | 8% | 43% | 61% | 6% | |


- Overall dev set error: 10%

- Errors due to incorrect labels: 0.6%

- Errors due to other causes: 9.4%

The goal of the dev set is to help you select between two classifiers A & B.


- If the total error in the development set is 10%:
  
  - The error due to incorrect data is: 0.6%
  
  - The error due to other causes is: 9.4%

- You should focus on the 9.4% rather than the incorrectly labeled data.


# Build your first system quickly, then iterate


- The steps you take to make your deep learning project:
  
  - Setup dev/test set and metric
  
  - Build initial system quickly
  
  - Use Bias/Variance analysis & Error analysis to prioritize next steps.

# Training and testing on different distributions


- A lot of teams are working with deep learning applications that have training sets that are different from the dev/test sets due to the hunger of deep learning to data.

- There are some strategies to follow up when training set distribution differs from dev/test sets distribution.
  
  - Option one (not recommended): shuffle all the data together and extract randomly training and dev/test sets.
    
    - Advantages: all the sets now come from the same distribution.
    
    - Disadvantages: the other (real world) distribution that was in the dev/test sets will occur less in the new dev/test sets and that might be not what you want to achieve.
  
  - Option two: take some of the dev/test set examples and add them to the training set.
    
    - Advantages: the distribution you care about is your target now.
    
    - Disadvantage: the distributions in training and dev/test sets are now different. But you will get a better performance over a long time.


# Bias and Variance with mismatched data distributions


- Bias and Variance analysis changes when training and Dev/test set is from a different distribution.


- Example: the cat classification example. Suppose you've worked in the example and reached this:
  
  - Human error: 0%
  
  - Train error: 1%
  
  - Dev error: 10%

- In this example, you'll think that this is a variance problem, but because the distributions aren't the same you can't tell for sure. Because it could be that train set was easy to train on, but the dev set was more difficult.


- To solve this issue we create a new set called train-dev set as a random subset of the training set (so it has the same distribution) and we get:
  
  - Human error: 0%
  
  - Train error: 1%
  
  - Train-dev error: 9%
  
  - Dev error: 10%
    
    - Now we are sure that this is a high variance problem.


- Suppose we have a different situation:
  
  - Human error: 0%
  
  - Train error: 1%
  
  - Train-dev error: 1.5%
  
  - Dev error: 10%
    
    - In this case we have something called Data mismatch problem.

## Conclusions:

1. Human-level error (proxy for Bayes error)

2. Train error
   - Calculate avoidable $bias\space =\space training\space error\space -\space human\space level\space error$
     - If the difference is big then its Avoidable bias problem then you should use a strategy for high bias.

3. Train-dev error
   - Calculate $variance\space =\space training-dev\space error\space -\space training\space error$
     - If the difference is big then its high variance problem then you should use a strategy for solving it.

4. Dev error
   - Calculate $data\space  mismatch\space  =\space  dev\space  error\space  -\space  train-dev\space  error$
     - If difference is much bigger then train-dev error its Data mismatch problem.

5. Test error
   - Calculate degree of overfitting to $dev\space set\space =\space test\space error\space -\space dev\space error$
     - If the difference is big (positive) then maybe you need to find a bigger dev set (dev and test set come from the same distribution, so the only way for there to be a huge gap here, for it to do much better on the dev set than the test set, is if you somehow managed to overfit the dev set).

Unfortunately, there aren't many systematic ways to deal with data mismatch. There are some things to try about this in the next section.


# Addressing data mismatch

- There aren't completely systematic solutions to this, but there are some things you could try.

1. Carry out manual error analysis to try to understand the difference between training and dev/test sets.
2. Make training data more similar, or collect more data similar to dev/test sets.


- If your goal is to make the training data more similar to your dev set one of the techniques you can use is $Artificial\space data\space synthesis$ that can help you make more training data.
  
  - Combine some of your training data with something that can convert it to the dev/test set distribution.
    
    - Examples:
      
      a. Combine normal audio with car noise to get audio with car noise example.
      
      b. Generate cars using 3D graphics in a car classification example.
  
  - Be cautious and bear in mind whether or not you might be accidentally simulating data only from a tiny subset of the space of all possible examples because your NN might overfit these generated data (like particular car noise or a particular design of 3D graphics cars).


# Multi-task Learning

## Definition

- Involves training a neural network to handle multiple tasks at once.

- Tasks mutually enhance learning by sharing features.

- Appropriate when sufficient data is available for each task and when features are universally applicable.


## Comparison of Transfer and Multitask Learning

- Transfer learning is more commonly favored in practice.

- Multitask learning, while not as widespread as transfer learning, remains vital and effective, especially with large-scale network training.


![Image](./image/Multi-Tasking.png)
