# Probabilities

Complementary probabilities:

$
P(A) = p \Rightarrow p(\lnot A) = 1 - p
$

Independence:

$
X \bot Y:  P(X \cap Y) = P(X) P(Y) \\
X \bot Y:  P(X \cup Y) = P(X) + P(Y) \\
$

$P(X)$ is the marginal probability, $P(X,Y)$ is the joint probability. $X \cap Y$ X and Y events both happen, $X \cup Y$ either X or Y event happens.

Dependence:

$
P(Y) = \sum_{i} P(Y|X=i) P(X=i) \\
P(\lnot Y|X) = 1 - P(Y|X)
$

$
P(Y, X) = P(Y|X) \cdot P(X)
$

Note that the later can be written this way:

$
P(A,B,C|D,E) = \large \frac{P(A,B,C,D,E)}{P(D,E)}
$

In a Bayes network, A, B and C are the query variables. D and E are the evidence. The conditional probabilities of the queries given the evidences is the ratio between the joint probability of all variables over the joint probabilities of the evidences.

If X and Y are independent:

$
P(Y|X) = P(Y)
$

and so you find the independent joint probability relationship:

$
P(Y, X) = P(Y) \cdot P(X)
$

Bayes rule:

$
P(B|A) = \large\frac{P(A|B) P(B)}{P(A)}
$

$P(B|A)$ is the *posterior* probability. $P(A|B)$ is the *likelihood*. $P(B)$ is the *prior*. $P(A)$ is the *marginal likelihood*.

If now we have 3 events, Bayes rule can be written in two different ways:

$
P(B|A,C) = \large\frac{P(A,C|B) \cdot P(B)}{P(A,C)}
$

or, if C conditions all events:

$
P(B|A,C) = \large\frac{P(A|B,C) \cdot P(B|C)}{P(A|C)}
$

# Example of Bayes rule

## Problem definition

**C**: is having the cancer.<br>
**+**: is being tested positive to cancer.<br>
**-**: is being tested negative to cancer ($\lnot +$).<br>

In the population, only 1% has cancer:

$
P(C) = 0.01 \\
P(\lnot C) = 0.99
$

The test of cancer is as follows:

$
P(+|C) = 0.9 \\
P(-|C) = 0.1
$

$P(+|C)$ is the **true positive rate** or **sensitivity**

and

$
P(-|\lnot C) = 0.8 \\
P(+|\lnot C) = 0.2
$

$P(-|\lnot C)$ is the **true negative rate** or **specificity**

## Probability of having cancer if tested positive

The Bayes rule enables to answer the following question: how is it likelly to have cancer when you are tested positive ? This is $P(C|+)$.

$
P(C|+) = \large\frac{P(+|C) P(C)}{P(+)}
$

$P(C|+)$ is the posterior probability. $P(C)$ is the *prior* probability. $P(+|C)$ is the likelihood. $P(+)$ is the marginal likelihood.

Knowing with the total probability that:

$
P(+) = P(+|C) P(C) + P(+|\lnot C) P(\lnot C)
$

then

$
P(C|+) = \large\frac{P(+|C) P(C)}{P(+)} = \large\frac{P(+|C) P(C)}{P(+|C) P(C) + P(+|\lnot C) P(\lnot C)}
$

Subsituted with actual values:

$
P(C|+) = \large\frac{P(+|C) P(C)}{P(+|C) P(C) + P(+|\lnot C) P(\lnot C)} = \large\frac{0.9 \cdot 0.01}{0.9 \cdot 0.01 + 0.2 \cdot 0.99} \approx 0.043
$

This value is pretty low. this is because the test has 20% chances of giving a false positive. If we reduce the false positive down to 0.001 (one for a thousand):

$
P(C|+) = \large\frac{P(+|C) P(C)}{P(+|C) P(C) + P(+|\lnot C) P(\lnot C)} = \large\frac{0.9 \cdot 0.01}{0.9 \cdot 0.01 + 0.0001 \cdot 0.99} \approx 0.99
$

## Probability of having cancer if tested positive twice

Now let's assume that we are tested twice (with the same test). What is the probability of having cancer if both tests are positive ? This probabality is $P(C|+,+)$.

The Bayes rule leads to:

$
P(C|+,+) = \large \frac{P(+,+|C) \cdot P(C)}{P(+,+)}
$

$P(+,+)$ might be tricky to calculate, so we introduce the following trick:

$
P(C|+,+) = \large \frac{P(+,+|C) \cdot P(C)}{P(+,+)} \\
P(\lnot C|+,+) = \large \frac{P(+,+| \lnot C) \cdot P(\lnot C)}{P(+,+)}
$

We know that this two propabilities $P(C|+,+)$ and $P(\lnot C|+,+)$ should sum to 1, so we only calculate the following terms:

$
P'(C|+,+) = P(+,+|C) \cdot P(C) = P(+|C) \cdot P(+|C) \cdot P(C) = 0.9 \cdot 0.9 \cdot 0.01 \approx 0.0081 \\
P'(\lnot C|+,+) = P(+,+|\lnot C) \cdot P(\lnot C) = P(+|\lnot C) \cdot P(+|\lnot C) \cdot P(C) \approx 0.0396
$

and so:

$
P(C|+,+) = \large \frac{P(C|+,+)}{P(C|+,+) + P(\lnot C|+,+)} \approx 0.1698 \\
P(\lnot C|+,+) = \large \frac{P(\lnot C|+,+)}{P(C|+,+) + P(\lnot C|+,+)} \approx 0.8302
$

So the probability of having cancer knowing that two tests are positive is almost 17%.

## Probability of having cancer if tested positive and negative (once each)

We now assume that one test is positive and the other is negative. What is the probability of having cancer, noted $P(C|+,-)$ ? By analogy with the previous case we can write:

$
P'(C|+,-) = P(+,-|C) \cdot P(C) = P(+|C) \cdot P(-|C) \cdot P(C) = 0.9 \cdot 0.1 \cdot 0.01 \approx 0.0009 \\
P'(\lnot C|+,-) = P(+,-|\lnot C) \cdot P(\lnot C) = P(+|\lnot C) \cdot P(-|\lnot C) \cdot P(C) \approx 0.1584
$

and so:

$
P(C|+,-) = \large \frac{P(C|+,-)}{P(C|+,-) + P(\lnot C|+,-)} \approx 0.0056 \\
P(\lnot C|+,+) = \large \frac{P(\lnot C|+,-)}{P(C|+,-) + P(\lnot C|+,-)} \approx 0.9944
$

So the probability of having cancer knowing that one test is positive while the other is negative is 0.56%.

## Conditional independance

In the two previous examples we have assumed:

$
P(+,-|C) \cdot = P(+|C) \cdot P(-|C)
P(+,+|C) \cdot = P(+|C) \cdot P(+|C)
$

This is because, given C, + and - are independent. We say that + and - are conditionnaly independent:

$
+ \bot - | C
$

But this does not mean that they are independent whatever the circumstances because the cause (having cancer or not affect both). **Conditional independance does not imply absolute independence**.

## Probability of having a second test positive knowing the first one was positive

This probability will be written $P(+_2|+_1)$:

$
P(+_2|+_1) = P(+_2|+_1, C) \cdot P(C|+_1) + P(+_2|+_1, \lnot C) \cdot P(\lnot C|+_1)
$

We need to take into consideration that $+_1$ and $+_2$ are conditionnaly independent so we can write:

$
P(+_2|+_1) = P(+_2|C) \cdot P(C|+_1) + P(+_2|\lnot C) \cdot P(\lnot C|+_1)
$

Said differently, the first test might have been positive while we have cancer or being positive while we haven't cancer. This is a rewrite of the total probability theorem but given a condition $+_1$. 


With a single test the probability would have been written:

$
P(+) = P(+|C) \cdot P(C) + P(+|\lnot C) \cdot P(\lnot)
$

Subsituting with actual values:

$
P(+_2|+_1) = P(+_2|C) \cdot P(C|+_1) + P(+_2|\lnot C) \cdot P(\lnot C|+_1) = 0.9 \cdot 0.043 + 0.2 \cdot 0.957 \approx 0.2301
$

The second test has 23% of chances to be positive knowing that the first test was positive.

# Cause and effect with Bayes rule

In the cancer example below, the cause is having or not the cancer and the effect is the test being positive or negative. The Bayes rule enables then to analyse the cause from the effect:

What is the probability of having cancer (cause) given the test (effect):

$
P(C|+) = \large\frac{P(+|C) P(C)}{P(+)}
$

And this probability is dependent on the probability of the test being positive (effect) knowing the individual has cancer (cause). 

The Bayes rule then enables to analyse cause from the effect and inversly. It enables to pass from what we know (effect) to what we infer (cause).

The Bayes theorem can be read as follows:

$
P(cause|effect) = \large\frac{P(effect|cause) P(cause)}{P(effect)}
$

$P(cause|effect)$ is the *posterior* probability (what we have inferred). $P(effect|cause)$ is the *likelihood*. $P(cause)$ is the *prior* (what we knew before). $P(effect)$ is the *marginal likelihood*.

# Deferred normalizer

The Bayes rule:

$
P(B|A) = \large\frac{P(A|B) P(B)}{P(A)}
$

The marginal likelihood $P(A)$ might be difficult to compute. We can introduce unnormalized conditional probabilities as follow:

$
P'(B|A) = P(A|B) \cdot P(B) \\
P'(\lnot B|A) = P(A|\lnot B) \cdot P(\lnot B)
$

Because these are two complementary events:

$
P(B|A) + P(\lnot B|A) = 1
$

And so:

$
P(B|A) = P'(B|A) \cdot \large \frac{1}{P'(B|A) + P'(\lnot B|A)} = \large \frac{P(A|B) \cdot P(B)}{P'(B|A) + P'(\lnot B|A)}
$

# References

Math Jax [doc](https://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-referencehttps://math.meta.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference)