# Bayers's Theorem

$$ P(A|B) = \frac{P(A)P(B|A)}{P(B)} $$

## 쿠키문제

* $B_1$ : 바닐라 $V$ 30개, 초코 $C$ 10개 
* $B_2$ : 바닐라 $V$ 20개, 초코 $C$ 10개

$P(B_1|V)$ : 랜덤한 그릇에서 바닐라를 꺼쟀을때 $B_1$인 확률

$P(V|B_1)$ : $B_1$ 그릇 중에서 바닐라를 꺼낼 확률 (3/4)

$P(V|B_2)$ : $B_2$ 그릇 중에서 바닐라를 꺼낼 확률 (1/2)

$P(B_1)$ : $B_1$ 확률 (1/2)

$P(B_2)$ : $B_2$ 확률 (1/2)

$P(V)$ : 바닐라를 꺼낼 전체 확률 $ P(V) = P(B_1)~P(V|B_1) ~+~ P(B_2)~P(V|B_2) = (1/2)~(3/4) ~+~ (1/2)~(1/2) = 5/8$


따라서 $ P(B_1|V) = \frac{P(B_1)P(V|B_1)}{P(V)}  = (1/2)~(3/4)~/~(5/8) = 3/5$


## Diachronic Bayes

1. $P(H)$ : prior 사전확률 hypothesis(가설)
2. $P(D|H)$ : likelihood(가능도) 가설H에서 데이터D가 나올 확률
3. $P(H)P(D|H)$ : 사전확률 * 가능도
3. $P(D) = \sum_i P(H_i)~P(D|H_i)$ : 사건D의 전체확률 data
3. $P(H|D)$ : posterior 사후확률

## Bayes Tables

$P(H)$

In [2]:
import pandas as pd

table = pd.DataFrame(index=['Bowl 1', 'Bowl 2'])
table['prior'] = 1/2, 1/2 # 사전확률 여기서는 1/2로 가정
table

Unnamed: 0,prior
Bowl 1,0.5
Bowl 2,0.5


$P(D|H)$ 

In [3]:
table['likelihood'] = 3/4, 1/2 # 가능도
table

Unnamed: 0,prior,likelihood
Bowl 1,0.5,0.75
Bowl 2,0.5,0.5


 $P(H)P(D|H)$

In [4]:
table['unnorm'] = table['prior'] * table['likelihood']
table

Unnamed: 0,prior,likelihood,unnorm
Bowl 1,0.5,0.75,0.375
Bowl 2,0.5,0.5,0.25


$P(D) = \sum_i P(H_i)~P(D|H_i)$

In [5]:
prob_data = table['unnorm'].sum()
prob_data

0.625

 $P(H|D) = \frac{P(H)~P(D|H)}{P(D)}$

In [6]:
table['posterior'] = table['unnorm'] / prob_data # normalization
table

Unnamed: 0,prior,likelihood,unnorm,posterior
Bowl 1,0.5,0.75,0.375,0.6
Bowl 2,0.5,0.5,0.25,0.4


## 주사위 문제

6면, 8면, 12면 주사위중 하나를 뽑았을 때 1일 경우 6면 주사위일 확률?

In [7]:
from fractions import Fraction

table2 = pd.DataFrame(index=[6, 8, 12])
table2['prior'] = Fraction(1, 3)
table2

Unnamed: 0,prior
6,1/3
8,1/3
12,1/3


In [8]:
table2['likelihood'] = [Fraction(1, 6), Fraction(1, 8), Fraction(1, 12)]
table2

Unnamed: 0,prior,likelihood
6,1/3,1/6
8,1/3,1/8
12,1/3,1/12


In [9]:
def update(table):
    table['unnorm'] = table['prior'] * table['likelihood']
    prob_data = table['unnorm'].sum()
    table['posterior'] = table['unnorm'] / prob_data
    return prob_data

In [10]:
prob_data = update(table2)
prob_data

Fraction(1, 8)

In [11]:
table2

Unnamed: 0,prior,likelihood,unnorm,posterior
6,1/3,1/6,1/18,4/9
8,1/3,1/8,1/24,1/3
12,1/3,1/12,1/36,2/9


## 몬티홀 문제

문 1,2,3중에 한 곳에 경품 자동차가 있고 나머지는 염소가 있음. 당신이 문 하나를 선택해서 맞추면 자동차 경품을 획득. 문 1,2,3중에 선택하면 진행자는 나머지 둘 중 자동차가 없는 문을 열고 선택을 바꿀 수 있는 기회를 줌. 여기서 당신이 문1을 선택하고 진행자가 문2를 열어서 자동차가 아님을 확인했을 경우 문1을 유지하는 것과 문3으로 변경하는 것 중 어느것이 유리한가? 

$P(D_1)$ $P(D_2)$ $P(D_3)$ 각각의 문뒤에 자동차가 있을 사전 확률은 1/3

In [12]:
table3 = pd.DataFrame(index=['door 1', 'door 2', 'door 3'])
table3['prior'] = Fraction(1, 3)
table3

Unnamed: 0,prior
door 1,1/3
door 2,1/3
door 3,1/3


* $P(M_3|D_1)$ : 문1에 차가 있을 때 몬티가 문3를 열었을 확률은 1,3 중 하나 1/2
* $P(M_3|D_2)$ : 문2에 차가 있을 때 몬티가 문3를 열었을 확률은 1
* $P(M_3|D_3)$ : 문3에 차가 있을 때 몬티가 문3를 열었을 확률은 0

In [13]:
table3['likelihood'] = [Fraction(1, 2), 1, 0]
table3

Unnamed: 0,prior,likelihood
door 1,1/3,1/2
door 2,1/3,1
door 3,1/3,0


$ P(D_i)P(M_3|D_i) $

In [14]:
table3['unnorm'] = table3['prior'] * table3['likelihood']
table3

Unnamed: 0,prior,likelihood,unnorm
door 1,1/3,1/2,1/6
door 2,1/3,1,1/3
door 3,1/3,0,0


$P(M_3) = \sum_i P(D_i)P(M_3|D_i) $

In [15]:
prob_data = table3['unnorm'].sum()
prob_data

Fraction(1, 2)

$P(D_i|M_3) = \frac{P(D_i)P(M_3|D_i)}{P(M_3)}$

In [16]:
table3['posterior'] = table3['unnorm'] / prob_data
table3

Unnamed: 0,prior,likelihood,unnorm,posterior
door 1,1/3,1/2,1/6,1/3
door 2,1/3,1,1/3,2/3
door 3,1/3,0,0,0


문2의 사후확률이 2/3으로 문1의 1/3보다 높으므로 바꾸는 것이 유리함.

In [17]:
prob_data = update(table3)
prob_data

Fraction(1, 2)

In [18]:
table3

Unnamed: 0,prior,likelihood,unnorm,posterior
door 1,1/3,1/2,1/6,1/3
door 2,1/3,1,1/3,2/3
door 3,1/3,0,0,0


## 연습문제

**Exercise:** Suppose you have two coins in a box.
One is a normal coin with heads on one side and tails on the other, and one is a trick coin with heads on both sides.  You choose a coin at random and see that one of the sides is heads.
What is the probability that you chose the trick coin?

In [19]:
coin = pd.DataFrame(index=['normal', 'trick'])
coin['prior'] = 1/2, 1/2
coin

Unnamed: 0,prior
normal,0.5
trick,0.5


In [20]:
coin['likelihood'] = 1/2, 1
coin

Unnamed: 0,prior,likelihood
normal,0.5,0.5
trick,0.5,1.0


In [21]:
coin['unnorm'] = coin['prior'] * coin['likelihood']
coin

Unnamed: 0,prior,likelihood,unnorm
normal,0.5,0.5,0.25
trick,0.5,1.0,0.5


In [22]:
prob_data = coin['unnorm'].sum()
prob_data

0.75

In [23]:
coin['posterior'] = coin['unnorm'] / prob_data
coin

Unnamed: 0,prior,likelihood,unnorm,posterior
normal,0.5,0.5,0.25,0.333333
trick,0.5,1.0,0.5,0.666667


**Exercise:** Suppose you meet someone and learn that they have two children.
You ask if either child is a girl and they say yes.
What is the probability that both children are girls?

Hint: Start with four equally likely hypotheses.

In [34]:
s = pd.DataFrame(index=['bb', 'bg', 'gb', 'gg'])
s['prior'] = 1/4
s

Unnamed: 0,prior
bb,0.25
bg,0.25
gb,0.25
gg,0.25


In [35]:
s['likelihood'] = 0, 1, 1, 1
s['unnorm'] = s['prior'] * s['likelihood']
prob_data = s['unnorm'].sum()
s['posterior'] = s['unnorm'] / prob_data
s

Unnamed: 0,prior,likelihood,unnorm,posterior
bb,0.25,0,0.0,0.0
bg,0.25,1,0.25,0.333333
gb,0.25,1,0.25,0.333333
gg,0.25,1,0.25,0.333333


**Exercise:** There are many variations of the [Monty Hall problem](https://en.wikipedia.org/wiki/Monty_Hall_problem).  
For example, suppose Monty always chooses Door 2 if he can, and
only chooses Door 3 if he has to (because the car is behind Door 2).



If you choose Door 1 and Monty opens Door 2, what is the probability the car is behind Door 3?


문1,2,3에 차가 있을 사전 확률은 전부 1/3

In [25]:
monty1 = pd.DataFrame(index=['D1', 'D2', 'D3'])
monty1['prior'] = 1/3, 1/3, 1/3
monty1

Unnamed: 0,prior
D1,0.333333
D2,0.333333
D3,0.333333


문1을 선택시 몬티가 문2를 열었을 확률
* 문1에 차가 있으면 무조건 문2를 열음 따라서 1
* 문2에 차가 있으면 문2를 열지 않으므로 0
* 문3에 차가 있으면 항상 문2를 열것이므로 1

In [26]:
monty1['likelihood'] = 1, 0, 1
monty1

Unnamed: 0,prior,likelihood
D1,0.333333,1
D2,0.333333,0
D3,0.333333,1


In [27]:
monty1['unnorm'] = monty1['prior'] * monty1['likelihood']
monty1

Unnamed: 0,prior,likelihood,unnorm
D1,0.333333,1,0.333333
D2,0.333333,0,0.0
D3,0.333333,1,0.333333


In [28]:
prob_data = monty1['unnorm'].sum()

In [29]:
monty1['posterior'] = monty1['unnorm'] / prob_data
monty1

Unnamed: 0,prior,likelihood,unnorm,posterior
D1,0.333333,1,0.333333,0.5
D2,0.333333,0,0.0,0.0
D3,0.333333,1,0.333333,0.5



If you choose Door 1 and Monty opens Door 3, what is the probability the car is behind Door 2?

In [30]:
monty2 = pd.DataFrame(index=['D1', 'D2', 'D3'])
monty2['prior'] = 1/3, 1/3, 1/3

In [31]:
monty2['likelihood'] = 0, 1, 0
monty2

Unnamed: 0,prior,likelihood
D1,0.333333,0
D2,0.333333,1
D3,0.333333,0


In [32]:
monty2['unnorm'] = monty2['prior'] * monty2['likelihood']
prob_data = monty2['unnorm'].sum()
monty2['posterior'] = monty2['unnorm'] / prob_data
monty2

Unnamed: 0,prior,likelihood,unnorm,posterior
D1,0.333333,0,0.0,0.0
D2,0.333333,1,0.333333,1.0
D3,0.333333,0,0.0,0.0


**Exercise:** M&M's are small candy-coated chocolates that come in a variety of colors.  
Mars, Inc., which makes M&M's, changes the mixture of colors from time to time.
In 1995, they introduced blue M&M's.  

* In 1994, the color mix in a bag of plain M&M's was 30\% Brown, 20\% Yellow, 20\% Red, 10\% Green, 10\% Orange, 10\% Tan.  

* In 1996, it was 24\% Blue , 20\% Green, 16\% Orange, 14\% Yellow, 13\% Red, 13\% Brown.

Suppose a friend of mine has two bags of M&M's, and he tells me
that one is from 1994 and one from 1996.  He won't tell me which is
which, but he gives me one M&M from each bag.  One is yellow and
one is green.  What is the probability that the yellow one came
from the 1994 bag?

Hint: The trick to this question is to define the hypotheses and the data carefully.

# Hypotheses:
# A: yellow from 94, green from 96
# B: yellow from 96, green from 94

P(1994) 1/2
P(1996) 1/2
P(Y|1994) 0.2
P(Y|1996) 0.14
P(G|1994) 0.1
P(G|1996) 0.2
P(Y) = P(Y|1994) + P(Y|1996) = 0.34
P(G) = P(G|1994) + P(G|1996) = 0.26

In [44]:
MnM = pd.DataFrame(index=['Y94&G96', 'G94&Y96'])
MnM['prior'] = 1/2, 1/2
MnM['likelihood'] = 0.2*0.2, 0.1*0.14
MnM['unnorm'] = MnM['prior'] * MnM['likelihood']
prob_data = MnM['unnorm'].sum()
MnM['posterior'] = MnM['unnorm'] / prob_data
MnM

Unnamed: 0,prior,likelihood,unnorm,posterior
Y94&G96,0.5,0.04,0.02,0.740741
G94&Y96,0.5,0.014,0.007,0.259259
