### Bayes' theorem

>to solve inverse probability(역확률)

>ex) when we know $P(B|A), P(A), P(B)$, to know $P(A|B)$

$$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

- P(A) : prior (사전확률), 
- P(A|B) : posterior (사후확률); when B realized, prior(P(A)) become posterior
- P(B|A) : likelihood
- P(B) : normalizing constant (trivial stuff)

---

### Normalize

when event$A_i$ fulfill next conditions,
- no intersection with each other

$$ A_i \cap A_j = \emptyset$$
- A's union is entire sample space

$$ A_1 \cup A_2 \cup \cdots = \Omega$$

by law of total probability, 

$$P(A_1|B) = \dfrac{P(B|A_1)P(A_1)}{P(B)} = \dfrac{P(B|A_1)P(A_1)}{\sum_i P(A_i, B)}= \dfrac{P(B|A_1)P(A_1)}{\sum_i P(B|A_i)P(A_i)} $$


when $A_1 = A, A_2 = A^C $,

$$\begin{eqnarray}
P(A|B) 
&=& \dfrac{P(B|A)P(A)}{P(B)} \\
&=& \dfrac{P(B|A)P(A)}{P(B,A) + P(B,A^C)} \\
&=& \dfrac{P(B|A)P(A)}{P(B|A)P(A) + P(B|A^C)P(A^C)} \\
&=& \dfrac{P(B|A)P(A)}{P(B|A)P(A) + P(B|A^C)(1 - P(A))} 
\end{eqnarray}$$

---

### example: the Medical Test

Event
> D : chance of having sick

>S : Positive test result

> S|D : Probability of positive test result within sick people

> D|S : Probability of having sick given a positive test result


Question: 

when $P(S|D) = 0.99, \;\; P(D) = 0.002,\;\; P(S|D^C) = 0.05$

$P(D|S)$? 


Answer:

$$\begin{eqnarray}
P(D|S) 
&=& \dfrac{P(S|D)P(D)}{P(S)} \\
&=& \dfrac{P(S|D)P(D)}{P(S,D) + P(S,D^C)} \\
&=& \dfrac{P(S|D)P(D)}{P(S|D)P(D) + P(S|D^C)P(D^C)} \\
&=& \dfrac{P(S|D)P(D)}{P(S|D)P(D) + P(S|D^C)(1-P(D))} \\
&=& \dfrac{0.99 \cdot 0.002}{0.99 \cdot 0.002 + 0.05 \cdot (1 - 0.002)} \\
&=& 0.038
\end{eqnarray}$$


---

## pgmpy

In [1]:
from pgmpy.factors.discrete import TabularCPD

TabularCPD(variable, variable_card, value, evidence, evidence_card)

In [13]:
cpd_D = TabularCPD('D', 2, [[1-0.002, 0.002]])
print(cpd_D)

╒═════╤═══════╕
│ D_0 │ 0.998 │
├─────┼───────┤
│ D_1 │ 0.002 │
╘═════╧═══════╛


In [27]:
cpd_SD = TabularCPD('S', 2, np.array([[0.95, 0.01], [0.05, 0.99]]),
                                    evidence=['D'], evidence_card=[2])
print(cpd_SD)

╒═════╤══════╤══════╕
│ D   │ D_0  │ D_1  │
├─────┼──────┼──────┤
│ S_0 │ 0.95 │ 0.01 │
├─────┼──────┼──────┤
│ S_1 │ 0.05 │ 0.99 │
╘═════╧══════╧══════╛


#### need BaysianModel to calculate 

In [28]:
from pgmpy.models import BayesianModel

In [29]:
model = BayesianModel([('D', 'S')])
model.add_cpds(cpd_D, cpd_SD)
model.check_model()

True

In [30]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior = infer.query(['D'], evidence={'S': 1})
print(posterior['D'])

╒═════╤══════════╕
│ D   │   phi(D) │
╞═════╪══════════╡
│ D_0 │   0.9618 │
├─────┼──────────┤
│ D_1 │   0.0382 │
╘═════╧══════════╛


---

In [38]:
cpd_SD = TabularCPD('S', 2, np.array([[0.9995, 0.01], [0.0005, 0.99]]),
                                    evidence=['D'], evidence_card=[2])
print(cpd_SD)

╒═════╤════════╤══════╕
│ D   │ D_0    │ D_1  │
├─────┼────────┼──────┤
│ S_0 │ 0.9995 │ 0.01 │
├─────┼────────┼──────┤
│ S_1 │ 0.0005 │ 0.99 │
╘═════╧════════╧══════╛


In [39]:
model = BayesianModel([('D', 'S')])
model.add_cpds(cpd_D, cpd_SD)
model.check_model()

True

In [40]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior = infer.query(['D'], evidence={'S': 1})
print(posterior['D'])

╒═════╤══════════╕
│ D   │   phi(D) │
╞═════╪══════════╡
│ D_0 │   0.2013 │
├─────┼──────────┤
│ D_1 │   0.7987 │
╘═════╧══════════╛


---

### Furthermore

when event C realized,

$$P(A|B,C) = \dfrac{P(C|A,B)P(A|B)}{P(C|B)}$$

nomal stuff to contrary
$$P(A|C) = \dfrac{P(C|A)P(A)}{P(C)}$$

(proof)
$$P(A,B,C) = P(A|B,C)P(B,C) = P(A|B,C)P(C|B)P(B) $$

$$P(A,B,C) = P(C|A,B)P(A,B) = P(C|A,B)P(A|B)P(B) $$

$$P(A|B,C)P(C|B)P(B) = P(C|A,B)P(A|B)P(B)$$

$$P(A|B,C) = \dfrac{P(C|A,B)P(A|B)}{P(C|B)}$$

---

In [89]:
from pgmpy.factors.discrete import TabularCPD

In [90]:
from pgmpy.models import BayesianModel

In [91]:
cpd_S = TabularCPD('Upper5', 2, [[1-18/244, 18/244]])
print(cpd_S) # upper5

╒══════════╤═══════════╕
│ Upper5_0 │ 0.92623   │
├──────────┼───────────┤
│ Upper5_1 │ 0.0737705 │
╘══════════╧═══════════╛


In [118]:
cpd_US = TabularCPD('Smoke', 2, np.array([[1-87/226, 1-1/3], [87/226, 1/3]]),
                                    evidence=['Upper5'], evidence_card=[2])
print(cpd_US)

╒═════════╤═════════════════════╤════════════════════╕
│ Upper5  │ Upper5_0            │ Upper5_1           │
├─────────┼─────────────────────┼────────────────────┤
│ Smoke_0 │ 0.6150442477876106  │ 0.6666666666666667 │
├─────────┼─────────────────────┼────────────────────┤
│ Smoke_1 │ 0.38495575221238937 │ 0.3333333333333333 │
╘═════════╧═════════════════════╧════════════════════╛


In [119]:
model = BayesianModel([('Upper5', 'Smoke')])
model.add_cpds(cpd_S, cpd_US)
model.check_model()

True

In [120]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior = infer.query(['Upper5'], evidence={'Smoke': 0}) # no Smoke 일때,
print(posterior['Upper5']) #  upper5일 확률 : 7.95%
posterior = infer.query(['Upper5'], evidence={'Smoke': 1}) # Smoke 일때,
print(posterior['Upper5']) #  upper5일 확률 : 6.45%

╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9205 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0795 │
╘══════════╧═══════════════╛
╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9355 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0645 │
╘══════════╧═══════════════╛


---

In [134]:
cpd_US = TabularCPD('Lunch', 2, np.array([[1-65/226, 1-1/6], [65/226, 1/6]]),
                                    evidence=['Upper5'], evidence_card=[2])
print(cpd_US)

╒═════════╤═════════════════════╤═════════════════════╕
│ Upper5  │ Upper5_0            │ Upper5_1            │
├─────────┼─────────────────────┼─────────────────────┤
│ Lunch_0 │ 0.7123893805309734  │ 0.8333333333333334  │
├─────────┼─────────────────────┼─────────────────────┤
│ Lunch_1 │ 0.28761061946902655 │ 0.16666666666666666 │
╘═════════╧═════════════════════╧═════════════════════╛


In [135]:
model = BayesianModel([('Upper5', 'Lunch')])
model.add_cpds(cpd_S, cpd_US)
model.check_model()

True

In [136]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior = infer.query(['Upper5'], evidence={'Lunch': 0}) # Dinner 일때,
print(posterior['Upper5']) #  upper5일 확률 : 8.52%
posterior = infer.query(['Upper5'], evidence={'Lunch': 1}) # Lunch 일때,
print(posterior['Upper5']) #  upper5일 확률 : 4.41%


╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9148 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0852 │
╘══════════╧═══════════════╛
╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9559 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0441 │
╘══════════╧═══════════════╛


---

Lunch & smoke

In [153]:
cpd_US = TabularCPD('LS', 2, np.array([[1-3/45, 1-0.01/23], [3/45, 0.01/23]]),
                                    evidence=['Upper5'], evidence_card=[2])
print(cpd_US)

╒════════╤═════════════════════╤═══════════════════════╕
│ Upper5 │ Upper5_0            │ Upper5_1              │
├────────┼─────────────────────┼───────────────────────┤
│ LS_0   │ 0.9333333333333333  │ 0.9995652173913043    │
├────────┼─────────────────────┼───────────────────────┤
│ LS_1   │ 0.06666666666666667 │ 0.0004347826086956522 │
╘════════╧═════════════════════╧═══════════════════════╛


In [154]:
model = BayesianModel([('Upper5', 'LS')])
model.add_cpds(cpd_S, cpd_US)
model.check_model()

True

In [155]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior = infer.query(['Upper5'], evidence={'LS': 0}) # no smoke & Dinner 일때,
print(posterior['Upper5']) #  upper5일 확률 : 7.86%
posterior = infer.query(['Upper5'], evidence={'LS': 1}) # smoke & Lunch 일때,
print(posterior['Upper5']) #  upper5일 확률 : 0%

╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9214 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0786 │
╘══════════╧═══════════════╛
╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9995 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0005 │
╘══════════╧═══════════════╛


---

## Multiple CPD

In [1]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

In [2]:
model = BayesianModel([('Upper5', 'Smoke'), ('Upper5', 'Lunch'), 
                                ('Upper5', 'Weekday'), ('Upper5', 'sex'),
                      ('Upper5', 'size'),])

In [3]:
cpd_a = TabularCPD('Upper5', 2, [[1-18/244, 18/244]])

In [4]:
cpd_s = TabularCPD('Smoke', 2, np.array([[1-87/226, 1-1/3], [87/226, 1/3]]),
                                    evidence=['Upper5'], evidence_card=[2])

In [5]:
cpd_L = TabularCPD('Lunch', 2, np.array([[1-65/226, 1-1/6], [65/226, 1/6]]),
                                    evidence=['Upper5'], evidence_card=[2])

In [6]:
cpd_d = TabularCPD('Weekday', 2, np.array([[1-78/226, 1-1/6], [78/226, 1/6]]),
                                    evidence=['Upper5'], evidence_card=[2])

In [7]:
cpd_sex = TabularCPD('sex', 2, np.array([[1-83/226, 1-2/9], [83/226, 2/9]]),
                                    evidence=['Upper5'], evidence_card=[2])

In [8]:
cpd_size = TabularCPD('size', 2, np.array([[1-154/226, 1-1/3], [154/226, 1/3]]),
                                    evidence=['Upper5'], evidence_card=[2])

In [9]:
model.add_cpds(cpd_a, cpd_s, cpd_L, cpd_d, cpd_sex, cpd_size)

In [10]:
copy_model = model.copy()
copy_model.nodes()

NodeView(('Upper5', 'Smoke', 'Lunch', 'Weekday', 'sex', 'size'))

In [11]:
copy_model.edges()

OutEdgeView([('Upper5', 'Smoke'), ('Upper5', 'Lunch'), ('Upper5', 'Weekday'), ('Upper5', 'sex'), ('Upper5', 'size')])

In [12]:
copy_model.get_cpds()

[<TabularCPD representing P(Upper5:2) at 0x7ff77d57a2b0>,
 <TabularCPD representing P(Smoke:2 | Upper5:2) at 0x7ff77d57a048>,
 <TabularCPD representing P(Lunch:2 | Upper5:2) at 0x7ff77d577f60>,
 <TabularCPD representing P(Weekday:2 | Upper5:2) at 0x7ff77d577710>,
 <TabularCPD representing P(sex:2 | Upper5:2) at 0x7ff77d577668>,
 <TabularCPD representing P(size:2 | Upper5:2) at 0x7ff77d5775c0>]

In [14]:
from pgmpy.inference import VariableElimination

infer = VariableElimination(model)
posterior = infer.query(['Upper5'], 
                    evidence={'Lunch': 0, 'Smoke':0,'Weekday':0, 'sex':0, 'size':0})
print(posterior['Upper5'])
posterior = infer.query(['Upper5'], 
                    evidence={'Lunch': 1, 'Smoke':1, 'Weekday':1, 'sex':1, 'size':1})
print(posterior['Upper5'])

╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.7516 │
├──────────┼───────────────┤
│ Upper5_1 │        0.2484 │
╘══════════╧═══════════════╛
╒══════════╤═══════════════╕
│ Upper5   │   phi(Upper5) │
╞══════════╪═══════════════╡
│ Upper5_0 │        0.9943 │
├──────────┼───────────────┤
│ Upper5_1 │        0.0057 │
╘══════════╧═══════════════╛
