# `Probability`

Calculating probabilities in each different scenario. In probability it is important to analyse the problem first and from there we can evaluate and select an appropraite solution  


***

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

## <font color='red'>Example 1</font>
Rolling a 10 sided dice

![ven.png](attachment:ven.png)

Determine the scenario first:
- Not mutually exclusive event (Not MEE)
- Independent Event (Commonly not MME is an independent event)  

In [2]:
A = {1,2,3,4,5,6,7}
B = {5,6,7,8,9,10}
C = {}
A_and_B = A.intersection(B)
Total = len(A) + len(B) + len(C) - len(A_and_B)

In [3]:
# Check if mutualy exclusive or not
# no value means MEE else Not MEE
A_and_B

{5, 6, 7}

### Single Event
p(A), p(B), p(A_and_B)

In [4]:
A = len(A) / Total
B = len(B) / Total
A_and_B = len(A_and_B) / Total
print(A, B, A_and_B)

0.7 0.6 0.3


### Two Events
p(A U B) probabilty of A or B


In [5]:
# p (A U B ) = p(A) + p(B) - p(A_and_B)
A_or_B = A + B - A_and_B
A_or_B

0.9999999999999998

p(A ⋂ B) probability of A intersection B

In [6]:
# Events A intersection B means that both events happen simultaneously

# The first event is a value in A then the second event value must be also a value in A 
# (1st event value intersect with 2nd event)
# reminder: p(A and B) is a single event while p(A ⋂ B) is two event happening simultaenously
#  p(A ⋂ B) = p(A) * p(B)
A_inter_B = A * B
A_inter_B

0.42

p(A Δ B)

In [7]:
# the probability of and A or B but excluding the intersection (A_inter_B)
# p(A Δ B) = P(A) + P(B) - 2 * P(A∩B) 
# p(A Δ B) = P(A) + P(B) - 2 * P(A) * P(B)
(A + B) - (2 * (A * B)) 

0.45999999999999985

## <font color='red'>Example 2</font>
Rolling 10 side dice

![ven2.png](attachment:ven2.png)

Determine the scenario first:
- Mutually exclusive event (MEE)
- Technicaly rolling a dice is an `Independent Event`. 

In [8]:
A = {1,2,3,4,5,6}
B = {8,9,10}
C = {7}
A_and_B = A.intersection(B)
S = len(A) + len(B) + len(C) - len(A_and_B)

In [9]:
# Check if mutualy exclusive or not
# no value means MEE else Not MME
A_and_B

set()

### Single Event
• p(A), p(B), p(C), p(A_and_B)

In [10]:
A = len(A) / Total
B = len(B) / Total
C = len(C) / Total
A_and_B = len(A_and_B) / Total
print(A, B, C,  A_and_B)

0.6 0.3 0.1 0.0


### Two Events
• p(A U B) probabilty of A or B


In [11]:
# p (A U B ) = p(A) + p(B) - p(A_and_B)
A_or_B = A + B - A_and_B
A_or_B

0.8999999999999999

• p(A ⋂ B) probability of A intersection B

Events A intersection B means that both events happen simultaneously. Since the two mutually exclusive events A and B cannot occur together, therefore the probability of both the events A and B happening together is 0.

Hence, the probability of A intersection B is 0 if A and B are mutually exclusive.

We cannot use this formula p(A ⋂ B) = p(A) * p(B) in 'Mutually Exclusive Events'

### Conditional probability in MEE
• p(A|B) = is always p(A)

• p(B|A) = is always p(B)

Since they are mutually exclusive each event won't affect each other. 

### Compound probability (MEE, Independent)
Multiple events happens simultaenously

• p(AB) and P(BA)

`Order doesn't matter in this probability`

In [12]:
# p(AB)
# the probability of two event simultaenously
# first roll event A then second roll is event B
#
print(A * B)
# p(BA)
print(B * A)

0.18
0.18


• p(AAB) and p(ABA)

`Order  matter in this probability` though they have same probability result.

In [13]:
# Three events 
# first roll is event A, 2nd roll is event A again then the last roll is event B
print(A * A * B)
#
# first roll is event A, 2nd roll is event B then the last roll is event A
print(A * B * A)

0.108
0.108


• Probability of two A and one B

In [14]:
# 
# list all possible outcome
# [ p(AAB), p(ABA), p(BAA) ]
#
p_AAB = A*A*B 
p_ABA = A*B*A
p_BAA = B*A*A

# adding probabilities
# p(AAB, ABA, BAA)
p_AAB + p_ABA + p_BAA

0.324

## <font color='red'>Example 3</font>
52 deck card

We strongly need a domain knowledge in cards for this scenario.
First lets understand what consist of a 52 deck cards
1. There are 4 sets of cards (clubs, spades, hearts, and diamonds)
2. 13 cards in each set
3. 26 cards are black and 26 cards are red


![52.jpg](attachment:52.jpg)

### `Question #1`

What is the probability of selecting a queen.

- p(queen)
- Single event

Don't mind the queen wrong typo.
![ven4.png](attachment:ven4.png)

In [15]:
# Analyse the question first
# Single Event
# In single event we don't need to determime if MME or not MME and Dependent or Independet. This is just a 
# probability of one event occuring.
#
# There are 4 queens in a 52 deck cards therefore
# p(queen)
4/52

0.07692307692307693

We have 7.69% probability of selecting a queen card regardless of type and color.

### `Question #2`

What is the probability of selecting a black card that is greater than 3 but is less than or equal to 9? 
- p(black >3 and <=9)
- Single Event
- We are looking  for the probability of black cards thats is greater than 3 and less than or equal to 9.
- x = [4,5,6,7,8,9]

In [16]:
# There are 2 sets of black cards (spades and club)
# each set has 6 posible outcomes
# 6 * 2 = 12
# then divide by the total cards
12/52

0.23076923076923078

There are 23% probability of selecting a black cards ranging from 4 to 9.

### `Question #3`

What is the probability of selecting a `red king` on the first draw then a `diamond` card on the second draw? `Without replacing the cards.`

Analyse the problem first;

- p(red_king * diamond)
- Compound probability
- Multiple events
- Conditional probability
- `Not MEE`
- `Dependent Events`
- There are 2 red kings (red king from heart sets and from diamond sets) therefore we need to find the probability for each of these first.
    - p(red_king_heart | diamond)
    - p (red_king_diamond | diamond))
    
`Formula:`

p(red_king_heart | diamond) + p(red_king_diamond | diamond)

Below is a diagram for refence.

![tree.png](attachment:tree.png)

`1. p(red_king_heart | diamond)`

In [17]:
# p(1/52 * 13/51)
# there is 1 king diamond and 13 diamonds cards
# first draw has 52 cards then the second draw has now only 51 cards 
# (no replacing cards and given that we draw a king of heart in the 1st event)
#
A = 1/52
B = 13/51
first_scenario_king_heart = (A * B) 
first_scenario_king_heart

0.004901960784313725

`2. p(red_king_diamond | diamond)`

In [18]:
# p(1/52 * 12/51)
# there is 1 king diamond and 13 diamonds cards
# given that the first draw is a king diamond card our second draw is only now have a 12 diamond cards
# first draw has 52 cards then the second draw has now only 51 cards (dependent event)
# (no replacing cards and given that we draw a king of heart in the 1st event)
#
A = 1/52
B = 12/51
second_scenario_king_diamond =(A * B) 
second_scenario_king_diamond

0.004524886877828055

In [19]:
# Adding the two probabilties 
p_redking_diamond = first_scenario_king_heart + second_scenario_king_diamond
print(p_redking_diamond)

0.009426847662141781


We have almost 1% probability of selecting a red king card first then any diamond cards in the second draw.

***

# Probability using contingency table

## <font color='red'>Example 4</font>

In [20]:
import pandas as pd
b = pd.read_csv('birds.csv')

`Note:` 

This scenario is `not mutually exclusive` (with intersection, both events can occur at the same time) and we can conclude that this is an `independent` type.

In [21]:
b.head(3)

Unnamed: 0,Gender,Eyes_color
0,male,brown
1,male,brown
2,male,brown


In [22]:
ct = pd.crosstab(b.Gender, b.Eyes_color,
                 margins=True, 
#                  normalize=True
                )
ct

Eyes_color,blue,brown,All
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,10,100,110
male,20,70,90
All,30,170,200


#### Single Event

In [23]:
# probability of female
# p(female)
110 / 200

0.55

In [24]:
# probability of male with brown eyes
# p(male_and_brown)
70/200

0.35

#### Conditional Probability


In [25]:
# probability of female given the total brown eyes
# how much female are there in total brown eyes
# p (female | brown)
100/170

0.5882352941176471

<img src="cond1.png"
     align="left" 
     width="500" />

There is a 58.82% chance of selecting a female bird given the total brown eyes bird. Out of all the 170 brown eyes birds 58.82% of those are female.  

In [26]:
# probability of male given the total blue eyes
# how many male are there in total blue eyes
# p (male | blue)
# just use the general formula

# p (male | blue) = p(male and blue)
#                 ---------------------
#                      p(blue)
20/30

0.6666666666666666

In [27]:
# probability of blue eyes given female
# how many blue eyes birt are there is total female
# p(blue | female)
10 / 110

0.09090909090909091

There are 9% of blue eyes bird out of all the females.

## <font color='red'>Example 5</font>

In a math class of 30 students 17 are boys and 13 are girls. On a unit test 4 boys and 5 girls made an A grade. If a student is choosen randomly from the class, what is the probability of choosing a girl or an A grade? 

Determinde the scenario first
- Not MEE (mutually exclusive)
- Independent event

![ven3.png](attachment:ven3.png)

In [28]:
S = 30
B = 17
G = 13
A = 9 
G_and_A = 5

p(girl or A grade)

In [29]:
# p(girl or A) = p(g) + p(A) - p(G_and_A)
# 13 / 30  +   9/30  -  5/30
#
17/30

0.5666666666666667

The probability of choosing a girl or A grade (A grade either from boy or girl as long as grade is A) is simple 56.66%

***

# Bayes Theorem sample

## <font color='red'>Example 6</font>

In [30]:
df = pd.read_csv('strep_throat.csv')
print(df.shape)
df.head(3)

(17, 2)


Unnamed: 0,Condition,Checkup_result
0,no_strep_thoat,negative
1,no_strep_thoat,negative
2,no_strep_thoat,negative


### Contingency table

In [31]:
#
# contingency table frequency
Contingency_table = pd.crosstab(df.Checkup_result, 
                                df.Condition,
                                margins=True,
#                                normalize=True
                               )
Contingency_table

# contingency table proportion
# Contingency_table = pd.crosstab(df.Checkup_result, 
#                                 df.Condition,
#                                 margins=True,
#                                 normalize=True
#                                )

Condition,no_strep_thoat,with_strep_thoat,All
Checkup_result,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
negative,6,1,7
positive,2,8,10
All,8,9,17


Let's assign variables for easy appointing our probabilities

1. no_strep_throat = no_st
2. with_strep_throat = w_st
3. positve = +
4. negative = -

![strep_throat.jpg.png](attachment:strep_throat.jpg.png)

### Probability of w_st 
(positive and negative results)

In [32]:
# p(w_st)
w_st = 9/17
w_st

0.5294117647058824

In [33]:
# p(+ | w_st)

# general formula

# p(+ | w_st) = p( + and w_st)
#               ---------------
#                   p(w_st)

#             = p(8 / 17)  (cancelled out 17 to become 8/9)
#               ----------
#                 p(9/17)
#
pos_w_st = 8/9
pos_w_st

0.8888888888888888

In [34]:
# p(- | w_st)

neg_w_st = 1/9
neg_w_st

0.1111111111111111

In [35]:
# verify
pos_w_st + neg_w_st

1.0

### Probability of no_st 
(positive and negative results)

In [36]:
# p(no_st)
no_st = 8/17
no_st

0.47058823529411764

In [37]:
# p( + | no_st)
pos_no_st = 2/8
pos_no_st

0.25

In [38]:
# p( + | no_st)
neg_no_st = 6/8
neg_no_st

0.75

### Compound probability

In [39]:
# p(w_st +|w_st)
w_st * pos_w_st

0.47058823529411764

#### Bayes Theorem

With our contingency table above, bayes theorem is no big deal because all data are showed. What if we don't know some of the data like the p(+)? can we still know the probabilties of p(w_st | +) ?

![strep_throat_crop.png](attachment:strep_throat_crop.png)

These are the data availabe to us:
1. p(w_st) = .529
2. p(+ | w_st) = .888

In [40]:
# bayes formula:
# p(B|A) = p(A|B) * p(B)
#         --------------
#              p(A)

# A is the value we don't know. p(+)
# B is w_st

# p( w_st | + ) = (.888) * (.529)
#                -----------------
#                    p(+) we don't know

For the p(+) we can only make a guess for now. 

My reference is we have `9/17 w_st` so the other data `no_st must be 8/17`.      Let's say p(+|no_st) is 2/8(.25). note: this can be any value like 3/8, 4/8, I'm just using 2/8 for simplicity.

![strep_throa3.jpg](attachment:strep_throa3.jpg)

I'm only interested in the probability of positive ragardless of w_st or no_st.

In [41]:
#  p(+)
.47 + .118

0.588

I will take .588 as p(+)

In [42]:
# p( w_st | + ) = (.888) * (.529)
#                -----------------
#                    .588

#               =      .47
#                  -------------
#                      .588
.47/.588

0.7993197278911565

p(w_st|+) = 80%

The probability of with strep throat given the positives results is 80%

# Bayes Theorem Part2

- On any given day, there is a 20% chance it rains in a certain area. There is an 80% chance it doesn’t rain.
- Given it rains, there is a 65% chance the train in the area runs late. There is a 35% chance the train still runs on time despite the rain.
- Given it doesn’t rain, there is a 95% chance the train in the area runs on time. There is a 5% chance it still runs late despite the lack of rain.

![bayes3.png](attachment:bayes3.png)

`Questions`
1. What is the probability of on time `p(On Time)`? The probability of the the that won't be late
2. Probability of Rain given On Time `p(Rain | On Time)`? The probability of it rained given we know that the train arrived on time (let's say we are far away from the location and we don't know the weather in that area we just got the report that the train arrived on time)

`Contingency Table`

![bayes4.png](attachment:bayes4.png)

In [43]:
# p(On time)
# p(Rain, On Time) = .07
# p(No Rain, On time) = .76
.07 + .76

0.8300000000000001

We can take a guess for calculating the probability of train will arrived on time. For this scenario I will sum up the probability of On Time for both events (Rain and No Rain events).

The probability of train will arrived on time is 83%

Let's now calculate the probability of it rained given the report of the trained arrived on time(let's say we don't have access to the weather on that particular area)

In [44]:
# p(Rain | On time)

# Bayes Theorem

# p(B|A) = p(A|B) * p(B)
#         --------------
#              p(A)

# A is the value we don't know. p(On time)
# B is p(Rain)

# p(Rain | On time) = (.35) * (.20)
#                -----------------
#                       (.83)

#                  =    .07
#                    ---------
#                       .83

.07/.83
print((.07/.83) * 100)

8.433734939759036


- There is an 8.43% probability that it rained given that we know that the train arrived on time. 
- Therefore we can also say that 91.57% probability of it doesn't rain when the train arrived on time.