In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.insert(0,'../../modules')

In [2]:
import numpy as np
import factors

# Inference
In inference problems we want to figure out a distribution over a variable given other variables we do and do not know. We want:

$$
\begin{aligned}
  P(Y|X_\text{known}) &= \sum_{X_\text{unknown}}P(Y,X_\text{unknown}|X_\text{known}) \\
\end{aligned}
$$
So we condition on known variables and then marginalize out the unknown variables we are not interested in. <br>
In Bayesian networks we can do exact inference with the factor representation.<br>
If we can represent a joint distribution with a factor, then conditioning and marginalizing are very simple, just using the formula above. Say you are interested in knowing whether you will need to get tea from the shop on the way home, given who is at home (using 0 for False and 1 for True):

In [99]:
tea_factor = factors.Factor(["out of tea","mums home","dads home"],[2,2,2])
tea_factor.set([0,0,0],0.15)
tea_factor.set([0,0,1],0.05)
tea_factor.set([0,1,0],0.1)
tea_factor.set([0,1,1],0.35)
tea_factor.set([1,0,0],0.0)
tea_factor.set([1,0,1],0.1)
tea_factor.set([1,1,0],0.05)
tea_factor.set([1,1,1],0.2)
print(tea_factor)

out of tea  mums home  dads home  Values (10 dp)
0           0          0          0.15
0           0          1          0.05
0           1          0          0.1
0           1          1          0.35
1           0          0          0.0
1           0          1          0.1
1           1          0          0.05
1           1          1          0.2



And you know that dad isn't home, but you aren't sure about mum.

In [104]:
set_dads_home_to_0 = factors.drop_variables(tea_factor,["dads home"],[0])
conditoned_dads_home_0 = factors.condition(set_dads_home_to_0)
marginalize_mums_home = factors.marginalize(conditoned_dads_home_0,["mums home"])
print(marginalize_mums_home)

out of tea  Values (10 dp)
0           0.8333333333
1           0.1666666667



This can be checked very easily with rejection sampling (explained in next notebook) just using samples from the joint and discarding those for which the conditional is not correct.

In [105]:
array = tea_factor.array
indexes = tea_factor.indexes
value_grid = np.mgrid[:2,:2,:2]
total_not_out_of_tea = 0
total_out_of_tea = 0
for sample in range(10000):
    row = np.random.choice(np.arange(8),p=tea_factor.array.reshape(-1))
    setting = indexes[row]
    if(setting[2]==0):
        if(setting[0]==0):
            total_not_out_of_tea+=1
        else:
            total_out_of_tea+=1
print("estimated not out of tea prob:",total_not_out_of_tea/(total_not_out_of_tea+total_out_of_tea))
print("estimated out of tea prob    :",total_out_of_tea/(total_not_out_of_tea+total_out_of_tea))

estimated not out of tea prob: 0.8387644263408011
estimated out of tea prob    : 0.1612355736591989


So, exact inference on a factor is easy. However, creating the full joint table is very expensive for high numbers of variables. Fortunately it is possible to condition and marginalize out variables at the deconstructed level:

# The Sum-Product Algorithm
Say we have a joint probability which has been decomposed into a product of factors, e.g: <br>