# Comparing human-level performance

## Why human-level performance?

Bayes optimal error: 
- The best possible error when we map from X to Y (*theoritical*)
- <img src="./images/struct_01.png" alt="Drawing" style="width: 550px;"/>
- Human level performance is not that far from bayes optimal error, which can be understood from the graph.
    - One reason performance slows after "humans" is that humans are good at tasks. Hence, there's not a lot of improvement after humans.
    - Second reason is that humans can find ways to reach the human-level threshold. The following methods are effective until reaching the human-level...
        - Humans are quite good at a a lot of tasks. So long as ML is worse than humans, you can:
            1. Get labeled data from humans
            2. Gain insight from manual error analysis.
            3. Better analysis bias/variance

## Avoidable bias

Example 1:

| Dataset | Error | 
| --- | --- | 
| Humans | 1% |
| Training | 8% | 
| Development | 10% | 


- **Solution**: Focus on fixing bias. We need to reduce the training error because the error term between the traiing set and humans is large. When focusing on bias, we understand the bias entails a bias across all datasets, where it won't even perform well on the training set.

Example 2:

| Dataset | Error | 
| --- | --- | 
| Humans | 7.5% |
| Training | 8% | 
| Development | 10% | 

- **Solution**: Despite that we have a 8 training error (like in the previous example), human error is 7.5%. Hence, bias is not a problem in this example. Instead, we need to focus on lowering the dev error (*We need to focus on variance*). Variance entails that we are overfitting our dataset and hence, there's a variance among the datasets that are **NOT** the training set.
    - Fixes: use regularization
    - There's a greater beneifit in reducing 10 percent to 8 percent than 8 percent to 7.5 percent


Even though we have the same training and dev error, we should focus on the human error. This can help us gauge how we approach to improve the model (either bias or variance)

**Avoidable bias**: The difference or approximation of bayes error and training error. We should lower this as much as we can but performing better than bayes error is difficult. The avoidable bias can showcase what's possible and not. Even if the training error is 8%, it might be that 8% is a good error for the training set based on humans (bayes error)


## Understanding human-level performance

Human-level error as a proxy for Bayes error

Medical image classification example:

| Dataset | Error | 
| --- | --- | 
| Typical human | 3% |
| Typical doctor | 1% | 
| Experienced doctor | 0.7% | 
| Team of experienced doctors | 0.5% | 


What is "human-level" error?
- If you want a proxy for bayes error, then the bayes error should less or equal to 0.5% error.
- Another defintion could be a typical doctor in some contexts...

Error analysis example:

| Dataset | Error | 
| --- | --- | 
| Human | 1%, 0.7%, 0.5% |
| Training error | 5% | 
| Dev error | 6% | 

- Human (proxy for Bayes error)
    - Btw human and training error is called the Avoidable bias
- Training error
    - Btw the training and dev error is called the variance
- Dev error
- The difference between human and training error is larger than the difference between the training and dev error
- It does not matter the definition of the human error, you should focus on fixing the avoidable bias.
- **Fixing on bias reduction techniques**

Error analysis example:

| Dataset | Error | 
| --- | --- | 
| Human | 1%, 0.7%, 0.5% |
| Training error | 1% | 
| Dev error | 5% | 

- In the example above, we need to focus on decreasing the dev error (variance)

If the values for human, training, and dev error are close, then we would have to choose the best accurate value for the human error (in our case, it would be 0.5%)

**Summary of bias/variance with human-level performance**
- Human-level error
    - Proxy for bayes error
    - Btw the human-level error and the training error is the avoidable bias
- Training error
    - Btw the training werror and dev error is the variance
- Dev error

## Surpassing human-level performance

Example:

| Dataset | Error | 
| --- | --- | 
| Team of humans | 0.5% |
| One human | 1% | 
| Training error | 0.6% | 
| Dev error | 0.8% | 

- Avoidable bias is 0.1% (should not use one human for bayes error, use the team of humans) and the variance is 0.2%.
- Variance is 0.2% percent

Example:

| Dataset | Error | 
| --- | --- | 
| Team of humans | 0.5% |
| One human | 1% | 
| Training error | 0.3% | 
| Dev error | 0.4% | 

- Notice how the training error is already better than the team of humans
    - Thus, we have to rethink our approach since we now need to find what's the bayes error in our example


Prior to this lecture, we were assuming that humans (experts) will outperform computers. This could be the case for image recogntion task. However, what happens when see that the training error outperforms experts.
- **Will the human or training error by considered the bayes error?**

Problems where ML signficantly surpasses human-level performance
- The following are structured data where it's not a natural perception problem (humans tends to do well on image and speech recognition problems). 
    - Online advertising
    - Product recommendation
    - Logistics (predicting transite time)
    - Loan approvals
    
- But even now, we ML outperforming humans in other tasks
    - speech recognition
    - some image recognition
    - Medical (detecting cancer)

## Improving your model performance

The two fundamental assumptions of supervised learning
1. You can fit the training set pretty well. 
2. The training set performance generalizes pretty well to the dev/test set 

Reducing (avoidable) bias and variance
1. Difference btw. human-level and training error (*avoidable bias*)
    - Train bigger model
        - We could train with a bigger model with hopes that the algorithm will gain better insight of the data
    - Train longer/better optimization algorithms
        - Momentum, RMSprop, Adam
    - NN architecture/hyperparameters search (RNN, CNN)
2. Difference btw. training error and dev error (*variance*)
    - More data
        - With more data, we are providing the algorithm with hopes that it will also learn to optimize on the dev set and not just on the training set
    - Regularization
        - L2, dropout, data augmentation