### Bayes Theorem

Bayes Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event. One of the many applications of Bayes' theorem is Bayesian inference, a particular approach to statistical inference. When applied, the probabilities involved in the theorem may have different probability interpretations. With Bayesian probability interpretation, the theorem expresses how a degree of belief, expressed as a probability, should rationally change to account for the availability of related evidence. 

Let's see first the more simple problem. 

If two events are independent, i.e. if the occurrence of one event (A) is in no way related to the occurrence of another event (B) then the probability of both events occurring:

$$
P (A, B) = P (A) x P (B)
$$

For example, the probability of a head falling both times in a row is 0.25 (0.5 * 0.5), as well as that a family with two children has two daughters.

If the events are dependent, then the probability of A occurring if B occurred is equal

$$
P (A|B) = \frac{P(A, B)}{P(B)} 
$$

this is often rewritten and calculated as the following equation:

$$
P (A, B) = P(A|B) x P(B) 
$$

which actually means - what is the probability that both A and B will happen if we know what the probability is that A will happen when B happened (conditional probability) (P(A|B)) and  and if we know what the probability is that B will happen (P(B)).

For example, imagine that we want to determine the probability of a rainbow occurring. Rainbow occurs with a probability of 0.3 when it rains (P (A | B)), and the probability of rain is 0.2 (P (B)). The probability of rain and a rainbow is

$$
P (A, B) = 0.3 x 0.2 = 0.06. 
$$

In other words, if we know that rainbow and rain occur in 6% of cases, and that rain falls in 20% of cases, the conditional probability of rainbow occurring when rain falls is 

$$
P (A | B) = \frac {P(A, B)}{P (B)} = \frac{0.06}{0.2} = 0.3
$$

Going through next example, we can see why Bayes is difficult, or it is only difficult to me. If you apply Bayes theorem to next problem you can see how it can be contraintuitive. 

One common ticklish example involves a family with two (unknown) children.
Assuming that:

1. Every child is equally likely to be a boy or a girl
2. The sex of the second child is independent of the sex of the first child

then the event "no girl" has a probability of 1/4, the event "one girl, one boy" has a probability
1/2, and the "two girls" event is 1/4 probability.

Now we can ask ourselves what is the probability of the event "both children are girls" (B) when we know that the "older child is a girl" (G)? Using the conditional probability definition:

$$
P (B|G) = \frac {P (B, G)}{P (G)} = {P (B)}{P (G)} = \frac{0.25}{0.5} = 1/2
$$

BUT, we could also ask about the probability of the event "both children are girls" at the event "at least one of the children is a girl" (L) i.e. if we already know that one child is a girl. P(L) = 0.75, becease the probability of having two boys is 0.25, while vice versa is that 0.75 is that at least one child is a girl. Surprisingly, the answer is different from before!
As before, event B and L ("both girls and at least one child is a girl") is just event B. This means:

$$
P (B|L) = \frac{P (B, L)}{P (L)} = \frac{P (B)}{P (L)} = \frac{0.25}{0.75} = 1/3
$$

How can this be the case? Well, if all you know is that he is at least one of the children
girl, then it is twice as likely that the family has one boy and one girl than it has
both girls.

We will depict this with simulation:

In [1]:
import random
def random_child():
    return random.choice(["boy", "girl"])

both_girls = 0
older_girl = 0
at_leas_one_girl = 0

random.seed(0)

for _ in range(10000):
    younger_child = random_child()
    older_child = random_child()
    if older_child == "girl":
        older_girl += 1
    if older_child == "girl" and younger_child == "girl":
        both_girls += 1
    if older_child == "girl" or younger_child == "girl":
        at_leas_one_girl += 1
print ("P(both_girls | older_girl):", round(both_girls/older_girl,3))
print ("P(both_girls | at_leas_one_girl):", round(both_girls / at_leas_one_girl, 3))

P(both_girls | older_girl): 0.501
P(both_girls | at_leas_one_girl): 0.331


Bayes' theorem complements the conditional probability.
We know that the conditional probability of B happening if A happened is:

$$
(1) P (A|B) = \frac{P(A, B)}{P(B)} 
$$

Bayes says: The conditional probability of A occurring when B occurs is equal in addition to comparing the frequency of occurrence of both events (P (A, B)) to the probability of occurrence of event B (P (B)), and the following:

$$
(2) P (A|B) = \frac{P (B|A) * P (A)}{P (B)}
$$

This seems complicated but is essentially simple:

$$
(3) P (B | A) = \frac{P (A, B)}{P (A)}
$$

Equation (3) can be transformed into $P (A, B) = P (B | A) * P (A)$ and can then be inserted into (1):

$$
P (A | B) = \frac{P (B|A) * P (A)}{ P (B)}
$$

Which gives Bayes' theorem.


$$
P (A | B) = \frac{P (B / A) xP (A)}{(P (B| A) * P (A) + P (B | ¬A) * P (¬A))}
$$


This last equation actually tells us 

$$
P (B) = P (B | A) * P (A) + P (B | ¬A) * P (¬A)
$$

and that means that the probability event B is equal to the sum of conditional probabilities B of A when A occurs and when it does not occur (P (¬A)).

### Modeling with pomegranate

Imagine you have model notificaton to Bank fraud. Bank fraud (event BF) happens with probability equal 0.001.
But notification can also appear as error (event E) of the system with the probability of 0.002.

You went to vacation.

Notification will appear (N) if both events happen with probability of 99% and with 92% if only bank froad happens.
Notification may react with 0.15 probability only if error hapens, and notification may rise with no reason with probability of 0.0001.

Your colleauge, Johnatan (J), does never call you (99.9% of the time), but will call you he sees the notification (he may be out of house with probability of 0.1).

Another colleauge, Maggy (M) may call you to talk with probability 0.1 and out of home 30% of her time (but she will see the notification and will call you if she is home).

What is the probability Maggy will call you if the Bank fraud happens?

In [2]:
import pomegranate as pg

In [53]:
# define events
bank_fraud = pg.DiscreteDistribution({
    'BF': 0.001,
    '-BF': 0.999
})

error = pg.DiscreteDistribution({
    'E': 0.002,
    '-E': 0.998
})

#This is class for conditional probability, here we model conditional probabilities for each combination of events
notification = pg.ConditionalProbabilityTable(
    [
        ['BF', 'E', 'N', 0.99],
        ['BF', 'E', '-N', 0.01],
        
        ['BF', '-E', 'N', 0.92],
        ['BF', '-E', '-N', 0.08],
        
        ['-BF', 'E', 'N', 0.15],
        ['-BF', 'E', '-N', 0.85],
        
        ['-BF', '-E', 'N', 0.0001],
        ['-BF', '-E', '-N', 0.9999],
    ], 
        [bank_fraud, error]
)

johnatan = pg.ConditionalProbabilityTable(
    [
        ['N', 'J', 0.9],
        ['N', '-J', 0.1],
        
        ['-N', 'J', 0.001],
        ['-N', '-J', 0.999],
    ], [notification]
)

maggy = pg.ConditionalProbabilityTable(
    [
        ['N', 'M', 0.70],
        ['N', '-M', 0.30],
        
        ['-N', 'M', 0.1],
        ['-N', '-M', 0.9],
    ], [notification]
)

In [54]:
# define a Bayes Net
bank_fraud = pg.State(bank_fraud, 'BF')

error = pg.State(error, 'E')

notification = pg.State(notification, 'N')

johnatan = pg.State(johnatan, 'J')

maggy = pg.State(maggy, 'M')


model = pg.BayesianNetwork("Bank Fraud")

# Add the states to the network 
model.add_states(bank_fraud, error, notification, johnatan, maggy)

# Add the adges
model.add_edge(bank_fraud, notification)
model.add_edge(error, notification)

model.add_edge(notification, johnatan)
model.add_edge(notification, maggy)

model.bake()

In [80]:
# Calculate the posterior P(M|D) for data.
#  Calculate the probability of each item having been generated from each component in the model. 
# This returns normalized probabilities such that each row should sum to 1.
for prob in model.predict_proba([[None, 'E', None, None, None]])[0]:
    print(40*'==')
    print(prob)
    print(40*'==')

{
    "class" : "Distribution",
    "dtype" : "str",
    "name" : "DiscreteDistribution",
    "parameters" : [
        {
            "BF" : 0.001000000000000443,
            "-BF" : 0.9989999999999996
        }
    ],
    "frozen" : false
}
E
{
    "class" : "Distribution",
    "dtype" : "str",
    "name" : "DiscreteDistribution",
    "parameters" : [
        {
            "-N" : 0.8491599999999995,
            "N" : 0.15084000000000058
        }
    ],
    "frozen" : false
}
{
    "class" : "Distribution",
    "dtype" : "str",
    "name" : "DiscreteDistribution",
    "parameters" : [
        {
            "J" : 0.1366051600000005,
            "-J" : 0.8633948399999994
        }
    ],
    "frozen" : false
}
{
    "class" : "Distribution",
    "dtype" : "str",
    "name" : "DiscreteDistribution",
    "parameters" : [
        {
            "M" : 0.19050400000000042,
            "-M" : 0.8094959999999995
        }
    ],
    "frozen" : false
}


In [72]:
#Bayesian networks are frequently used to infer/impute the value of missing variables given the observed values. 
#In other models, typically there is either a single or fixed set of missing variables, such as latent factors, 
#that need to be imputed, and so returning a fixed vector or matrix as the predictions makes sense. 
#However, in the case of Bayesian networks, we can make no such assumptions, 
#and so when data is passed in for prediction it should be in the format as a matrix with None in 
#the missing variables that need to be inferred. The return is thus a filled in matrix where the Nones 
#have been replaced with the imputed values. 
model.predict([[None, 'E', None, None, None]]) # when error happens the most probable response of the system that nothing will happen

[array(['-BF', 'E', '-N', '-J', '-M'], dtype=object)]

In [81]:
model.predict([['BF', None, None, None, None]]) #when real bank fraud happens...

[array(['BF', '-E', 'N', 'J', 'M'], dtype=object)]

In [None]:
#... it is the most probable that it is not an error and that we will receive the notification and that we will be called by our collages

In [82]:
# What if Maggie just called us:
model.predict([[None, None, None, None, 'M']]) # maybe she just wants to talk a little bit, but if Johnatan and Maggie calls us it probably means it is a bank fraud

[array(['-BF', '-E', '-N', '-J', 'M'], dtype=object)]

In [85]:
model.predict([[None, None, None, 'J', 'M']])

[array(['BF', '-E', 'N', 'J', 'M'], dtype=object)]