# The `Bayes Theorem` (Optional Challenge)

In [None]:
import pandas as pd
import numpy as np

Do you remember this theorem covered during the lecture ? 

* The Bayes Theorem allows you to compute `a conditional probability`.
* It is widely used in Machine Learning to `update your knowledge`
* Despite its pretty simple formula, it can `highlight unexpected insights`

🧑🏻‍🏫 What is the `Bayes Theorem` ? According to [Brilliant](https://brilliant.org/wiki/bayes-theorem/) :

> Bayes' theorem is a formula that describes how to update the probabilities of hypotheses (A) when given evidence (Data).


🧮 The formula is the following:

$$ \mathbb{P}(A | Data) =  \mathbb{P}(A) \times \frac{\mathbb{P}(Data | A) }{\mathbb{P}(Data)}$$

## 0) Challenge context: Should we play sport outside expecting some weather conditions ?

* In this challenge, we'll try to recompute this formula.

* We have a dataset with `weather conditions` (Rain, Sunny, Overcast) and `play` (Yes, No) suggesting whether a sport game should be played based on the weather conditions.

In [None]:
weather_data_example = ['Sunny', 'Overcast', 'Rainy', 'Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Sunny',
'Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy']

play_data_example = ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No']

data = {'weather': weather_data_example, 'play': play_data_example}

df = pd.DataFrame(data = data)
df

🚀 Let's compute $ \large P(play \mid weather) = P(play) \times \frac{P(weather \mid play)}{P(weather)} $

## 1) Warm-up : understanding the numbers with a `frequency table`

✍️ Grab a pen + a piece of paper and complete the **`frequency table`**:

| Weather  | Played | No | Total |
|----------|------|----|-------|
| Sunny    |     |   |      |
| Overcast |     |   |      |
| Rainy    |     |   |      |
| Total    |     |   |   14  |

<details>
    <summary>Answer here</summary>
    
| Weather  | Played | No | Total |
|----------|------|----|-------|
| Sunny    | 3    | 2  | 5     |
| Overcast | 4    | 0  | 4     |
| Rainy    | 2    | 3  | 5     |
| Total    | 9    | 5  | 14    |     
</details>

## 2) Prior probability : $ \mathbb{P}(play)$

🤔 What is the theoretical probability of a game being played ❓

Look at the numbers in your previous table.

<details>
    <summary>Answer</summary>
    
| Weather  | Played | No | Total |
|----------|------|----|-------|
| Total    | 9    | 5  | 14    |     
    
$ \mathbb{P}(played) = \frac{9}{14} = 64.29 \% $
</details>

👩🏻‍💻 Code the `prior_probability` function to compute the result.

In [None]:
def prior_probability(played: str, play_data: list) -> float:
    '''
    Returns P(played)
    '''
    pass  # YOUR CODE HERE
    
# 👀 Run the following to test your code.
# If nothing shows, your function works. Otherwise, inspect your code to fix it!
assert(round(prior_probability("Yes", play_data_example),4) == 0.6429)

☝️ FYI: These strange notations
```python
def prior_probability(played: bool, play_data: list) -> float:
```
are called **typing hints**

They are optional in python, and used to let the reader know what type of arguments and output the function should accept/return. 

There also exist python libraries that enforce respect for these types, and raise error when not. 
It's a good practice to use these hits in very large projects to make sure nothing breaks silently.



## 3) Likelihood :  $ \mathbb{P}(weather | play)$

🤔 What is the theoretical probability of the weather being rainy knowing that a game was NOT played ❓

Look at the numbers in your previous table.

<details>
    <summary>Answer</summary>
    
| Weather  | No | 
|----------|----|
| Sunny    | 2  | 
| Overcast | 0  | 
| Rainy    | 3  | 
| Total    | 5  |         
    
$ \mathbb{P}(play) = \frac{3}{5} = 60 \% $
</details>

In [None]:
def likelihood(weather, played, weather_data, play_data):
    '''TO DO: return P(weather|play)'''
    pass  # YOUR CODE HERE   

# 👀 Run the following to test your code.
# If nothing shows, your function works. Otherwise, inspect your code to fix it!
assert(likelihood("Rainy", "No", weather_data_example, play_data_example) == 0.60)

## 4) Posterior probability : $ \large P(play \mid weather ) = P(play) \times \frac{P(weather \mid play)}{P(weather)} $

🔥 We can finally compute the posterior probability as: 

$$\large \text{posterior probability} = \text{prior probability} \times \text{likelihood} \times \beta $$ 

where $ \large \beta = \frac{1}{P(weather)} $ is _normalization factor_.
 
<details>
    <summary>Expected results</summary>

Remember the table that you completed earlier ? 

| Weather  | Played | No | Total |
|----------|------|----|-------|
| Sunny    | 3    | 2  | 5     |
| Overcast | 4    | 0  | 4     |
| Rainy    | 2    | 3  | 5     |
| Total    | 9    | 5  | 14    |   
    
Based on this table, we can compute $ \mathbb{P}(played | weather) $
    
| Weather  | Proba(Played\|Weather) | Proba(No\|Weather) |
|----------|----------------------|--------------------|
| Sunny    | 0.6                  | 0.4                |
| Overcast | 1.0                  | 0.0                |
| Rainy    | 0.4                  | 0.6                |
    
</details>

In [None]:
def posterior_probability(played, weather, weather_data, play_data):
    '''TO DO: return P(play|weather)
    '''
    pass  # YOUR CODE HERE

In [None]:
# 👀 Run the following cell to test your code
assert(posterior_probability("Yes", "Sunny", weather_data_example, play_data_example)==0.6)
assert(posterior_probability("No", "Sunny", weather_data_example, play_data_example)==0.4)
assert(posterior_probability("Yes", "Overcast", weather_data_example, play_data_example)==1.0)
assert(posterior_probability("No", "Overcast", weather_data_example, play_data_example)==0)
assert(round(posterior_probability("Yes", "Rainy", weather_data_example, play_data_example),1)==0.4)
assert(posterior_probability("No", "Rainy", weather_data_example, play_data_example)==0.6)

## 5) Taking a step back to understand what you've done

Thanks to what you’ve learned in this challenge, could you answer these questions :

1. _"Matches are more likely to be played than not if the weather is sunny"_ 👉 Is this statement correct ?
2. If you know for sure that it will be raining during the next game 🤔, what is your best guess (probability) that the game will be canceled ?

In [None]:
# YOUR CODE HERE

🏁 Congrats, you have a better idea of how the `Bayes formula` work !

💾 Do not forget to `git add/commit/push` your notebook

📆 We will revisit this concept again during the modules of :
* `Decision Science - Inferential Stastistics`
* `Machine Learning - Performance Metrics - Confusion Matrix`
* `Natural Language Processinsg`


▶️ If you are curious and/or impatient, you can already watch the [15-min Youtube video from 3Blue1Brown](https://www.youtube.com/watch?v=HZGCoVF3YvM) that we already mentioned in the lecture.