# Lab 4: Belief Propagation

This lab is built around the process of identifying the fault with a coffee machine using the least amount of effort.

Your task is to:
 1. Given the structure of a graphical model for the state of a coffee machine learn the distributions from data.
 2. Implement belief propagation, so you can evaluate the probability of each failure given the available evidence.
 3. Use Bayesian decision theory to generate a diagnostic flow chart that will identify the problem as quickly as possible on average, taking account of how long it takes to perform each observation.

## Marking and Submission

These lab exercises are marked, and contribute to your final grade. For this lab exercise there are 3 places where you are expected to enter your own code, for 30 marks overall. Every place you have to add code is indicated by

`# **************************************************************** n marks`

with instructions above the code block.

Please submit your completed workbook using Moodle before 5pm on the 29th November 2017. The workbook you submit must be an `.ipynb` file, which is saved into the directory you're running Jupyter; alternatively you can download it from the menu above using `File -> Download As -> Notebook (.ipynb)`. Remember to save your work regularly (Save and checkpoint in the File menu, the icon of a floppy disk, or Ctrl-S); the version you submit should have all code blocks showing the results (if any) of execution below them. You will normally receive feedback within a week.

In [1]:
%matplotlib inline

import zipfile
import io
import csv

import numpy
import matplotlib.pyplot as plt

## Coffee Machine Data Set

You can download from moodle a zip file that contains the data as a csv file. The below code will load the data directly from the zip file, so you don't have to unzip it.

Each row of the file contains a fully observed coffee machine, with the state of every random variable. The random variables are all binary, with False represented by 0 and True represented by 1. The variables are:

Failures (you're trying to detect these):  
&nbsp; &nbsp; 0. `he` - Have mains electricity  
&nbsp; &nbsp; 1. `fp` - Fried power supply unit  
&nbsp; &nbsp; 2. `fc` - Fried circuit board  
&nbsp; &nbsp; 3. `wr` - Water in reservoir  
&nbsp; &nbsp; 4. `dp` - Dead pump  
&nbsp; &nbsp; 5. `fh` - Fried heating element  
&nbsp; &nbsp; 6. `gs` - Group head gasket forms seal  

Mechanism (these are unobservable):  
&nbsp; &nbsp; 7. `pw` - Power supply unit works  
&nbsp; &nbsp; 8. `cb` - Circuit board works  
&nbsp; &nbsp; 9. `gw` - Get water out of group head  
&nbsp; &nbsp; 10. `hw` - Get hot water out of group head  
&nbsp; &nbsp; 11. `li` - Leaks during infusion  

Diagnostic (these are the tests the mechanic can run - observable):  
&nbsp; &nbsp; 12. `ls` - Room lights switch on  
&nbsp; &nbsp; 13. `vp` - A voltage is measured across power supply unit  
&nbsp; &nbsp; 14. `bs` - Burning smell  
&nbsp; &nbsp; 15. `lo` - Power light switches on  
&nbsp; &nbsp; 16. `hp` - Can hear pump  
&nbsp; &nbsp; 17. `me` - Makes espresso  
&nbsp; &nbsp; 18. `ta` - Makes a hot, tasty espresso  

For the above the number is the column number of the provided data (`dm`) and the two letter code a suggested variable name.

For if you are unfamiliar with a espresso coffee machine here is a brief description of how one works:
> The user puts ground coffee into a portafilter (round container with a handle and two spouts at the bottom), tamps it (compacts the coffee down), and clamps the portafilter into the group head at the front of the machine. A gasket (rubber ring) forms a seal between the portafilter and group head. A button is pressed. Water is drawn from a reservoir by a pump into a boiler. In the boiler a heating element raises the waters temperature, before the pump pushes it through the group head and into the portafilter at high pressure. The water passes through the coffee grinds and makes a tasty, tasty espresso.

The graphical model showing how the variables are related is included on moodle as `coffee machine.pdf`; here it is given as conditional probabilities:

 * $P(\texttt{have electricity})$
 * $P(\texttt{fried psu})$
 * $P(\texttt{fried circuit board})$
 * $P(\texttt{water in reservoir})$
 * $P(\texttt{dead pump})$
 * $P(\texttt{fried heating element})$
 * $P(\texttt{group head gasket forms seal})$
 
 * $P(\texttt{psu works}\enspace|\enspace\texttt{have electricity},\enspace\texttt{fried psu})$
 * $P(\texttt{circuit board works}\enspace|\enspace\texttt{psu works},\enspace\texttt{fried circuit board})$
 * $P(\texttt{get water}\enspace|\enspace\texttt{circuit board works},\enspace\texttt{water in reservoir},\enspace\texttt{dead pump})$
 * $P(\texttt{get hot water}\enspace|\enspace\texttt{get water},\enspace\texttt{fried heating element})$
 * $P(\texttt{leaks during infusion}\enspace|\enspace\texttt{get water},\enspace\texttt{group head gasket forms seal})$

 * $P(\texttt{lights switch on}\enspace|\enspace\texttt{have electricity})$
 * $P(\texttt{voltage across psu}\enspace|\enspace\texttt{psu works})$
 * $P(\texttt{burning smell}\enspace|\enspace\texttt{fried psu},\enspace\texttt{fried circuit board})$ 
 * $P(\texttt{power light on}\enspace|\enspace\texttt{circuit board works})$
 * $P(\texttt{can hear pump}\enspace|\enspace\texttt{circuit board works},\enspace\texttt{dead pump})$
 * $P(\texttt{makes espresso}\enspace|\enspace\texttt{get water},\enspace\texttt{group head gasket forms seal})$
 * $P(\texttt{tasty}\enspace|\enspace\texttt{makes espresso},\enspace\texttt{get hot water})$

Note that while the model is close to what you may guess the probabilities are not absolute, to account for mistakes and unknown failures. For instance, the mechanic may make a mistake while brewing an espresso and erroneously conclude that the machine is broken when it is in fact awesome. The probabilities associated with each failure are not uniform.

In [2]:
# It may prove helpful to have a mapping between the suggested variable names and column indices...
nti = dict() # 'name to index'

nti['he'] = 0
nti['fp'] = 1
nti['fc'] = 2
nti['wr'] = 3
nti['dp'] = 4
nti['fh'] = 5
nti['gs'] = 6

nti['pw'] = 7
nti['cb'] = 8
nti['gw'] = 9
nti['hw'] = 10
nti['li'] = 11

nti['ls'] = 12
nti['vp'] = 13
nti['bs'] = 14
nti['lo'] = 15
nti['hp'] = 16
nti['me'] = 17
nti['ta'] = 18

# For conveniance this code loads the data from the zip file,
# so you don't have to decompress it (takes a few seconds to run)...
with zipfile.ZipFile('coffee_machines.zip') as zf:
    with zf.open('coffee_machines.csv') as f:
        sf = io.TextIOWrapper(f)
        reader = csv.reader(sf)
        next(reader)
        dm = []
        for row in reader:
            dm.append([int(v) for v in row])
        dm = numpy.array(dm, dtype=numpy.int8)

print('Data: {} exemplars, {} features'.format(dm.shape[0], dm.shape[1]))
print('     Broken machines  =', dm.shape[0] - dm[:,nti['ta']].sum())
print('     Working machines =', dm[:,nti['ta']].sum())


Data: 262144 exemplars, 19 features
     Broken machines  = 113637
     Working machines = 148507


## 1. Learn Model

Below a set of variables to represent conditional probability distributions have been defined. They are a Bernoulli trial for each combination of conditional variables, given as $P(\texttt{False}|...)$ in `[0,...]` and $P(\texttt{True}|...)$ in `[1,...]` (It may be easier to think of them as boring categorical distributions).

To fit a maximum likelihood model you should first use them as counters - loop over the data set and count how many of each combination exist. You must then normalise them so that the sum over axis 0 is always 1. There is an extra mark for doing the right thing and including a prior. You may want to know that the conjugate prior to the Bernoulli trial represented as $\left[P(\texttt{False}), P(\texttt{True})\right]$ is a Dirichlet distribution; a uniform prior would be a reasonable choice (it can be argued that the expected failure probability is low, and you should adjust the first 7 variables accordingly, but given the quantity of data available it's not going to matter).

Hint:
 * The use of `0=False` and `1=True` both in the dm table and in the conditional probability distributions is very deliberate.
 * Consider putting all of the variables into a list with extra information about them (such as indices from `nti`) to make your code neater.
 * If you make a mistake you could easily end up with NaN or infinity - which would break the next step. Print them out so you can check they are valid!

__(5 marks)__
 * 3 marks for counting all conditions.
 * 1 mark for normalising distributions.
 * 1 mark for adding a sensible prior.

In [3]:
# A set of variables that will ultimately represent conditional probability distributions.
# The naming convention is that P_he means P(he), or that P_pw_he_fp means P(pw|he,fp)...

P_he          = numpy.zeros(2)
P_fp          = numpy.zeros(2)
P_fc          = numpy.zeros(2)
P_wr          = numpy.zeros(2)
P_dp          = numpy.zeros(2)
P_fh          = numpy.zeros(2)
P_gs          = numpy.zeros(2)

P_pw_he_fp    = numpy.zeros((2,2,2))
P_cb_pw_fc    = numpy.zeros((2,2,2))
P_gw_cb_wr_dp = numpy.zeros((2,2,2,2))
P_hw_gw_fh    = numpy.zeros((2,2,2))
P_li_gw_gs    = numpy.zeros((2,2,2))

P_ls_he       = numpy.zeros((2,2))
P_vp_pw       = numpy.zeros((2,2))
P_bs_fp_fc    = numpy.zeros((2,2,2))
P_lo_cb       = numpy.zeros((2,2))
P_hp_cb_dp    = numpy.zeros((2,2,2))
P_me_gw_gs    = numpy.zeros((2,2,2))
P_ta_me_hw    = numpy.zeros((2,2,2))



# **************************************************************** 5 marks

rows = dm.shape[0]

for i in range(dm.shape[0]):
    # First seven
    if (dm[i,nti['he']] == 1):
        P_he[1] += 1/rows
    else:
        P_he[0] += 1/rows
    if (dm[i,nti['fp']] == 1):
        P_fp[1] += 1/rows
    else:
        P_fp[0] += 1/rows
    if (dm[i,nti['fc']] == 1):
        P_fc[1] += 1/rows
    else:
        P_fc[0] += 1/rows
    if (dm[i,nti['wr']] == 1):
        P_wr[1] += 1/rows
    else:
        P_wr[0] += 1/rows
    if (dm[i,nti['dp']] == 1):
        P_dp[1] += 1/rows
    else:
        P_dp[0] += 1/rows
    if (dm[i,nti['fh']] == 1):
        P_fh[1] += 1/rows
    else:
        P_fh[0] += 1/rows
    if (dm[i,nti['gs']] == 1):
        P_gs[1] += 1/rows
    else:
        P_gs[0] += 1/rows
    
    # P_pw_he_fp
    if (dm[i,nti['pw']] == 0 and dm[i,nti['he']] == 0 and dm[i,nti['fp']] == 0):
        P_pw_he_fp[0][0][0] += 1
    if (dm[i,nti['pw']] == 0 and dm[i,nti['he']] == 0 and dm[i,nti['fp']] == 1):
        P_pw_he_fp[0][0][1] += 1
    if (dm[i,nti['pw']] == 0 and dm[i,nti['he']] == 1 and dm[i,nti['fp']] == 0):
        P_pw_he_fp[0][1][0] += 1
    if (dm[i,nti['pw']] == 0 and dm[i,nti['he']] == 1 and dm[i,nti['fp']] == 1):
        P_pw_he_fp[0][1][1] += 1
    if (dm[i,nti['pw']] == 1 and dm[i,nti['he']] == 0 and dm[i,nti['fp']] == 0):
        P_pw_he_fp[1][0][0] += 1
    if (dm[i,nti['pw']] == 1 and dm[i,nti['he']] == 0 and dm[i,nti['fp']] == 1):
        P_pw_he_fp[1][0][1] += 1
    if (dm[i,nti['pw']] == 1 and dm[i,nti['he']] == 1 and dm[i,nti['fp']] == 0):
        P_pw_he_fp[1][1][0] += 1
    if (dm[i,nti['pw']] == 1 and dm[i,nti['he']] == 1 and dm[i,nti['fp']] == 1):
        P_pw_he_fp[1][1][1] += 1
        
    # P_cb_pw_fc
    if (dm[i,nti['cb']] == 0 and dm[i,nti['pw']] == 0 and dm[i,nti['fc']] == 0):
        P_cb_pw_fc[0][0][0] += 1
    if (dm[i,nti['cb']] == 0 and dm[i,nti['pw']] == 0 and dm[i,nti['fc']] == 1):
        P_cb_pw_fc[0][0][1] += 1
    if (dm[i,nti['cb']] == 0 and dm[i,nti['pw']] == 1 and dm[i,nti['fc']] == 0):
        P_cb_pw_fc[0][1][0] += 1
    if (dm[i,nti['cb']] == 0 and dm[i,nti['pw']] == 1 and dm[i,nti['fc']] == 1):
        P_cb_pw_fc[0][1][1] += 1
    if (dm[i,nti['cb']] == 1 and dm[i,nti['pw']] == 0 and dm[i,nti['fc']] == 0):
        P_cb_pw_fc[1][0][0] += 1
    if (dm[i,nti['cb']] == 1 and dm[i,nti['pw']] == 0 and dm[i,nti['fc']] == 1):
        P_cb_pw_fc[1][0][1] += 1
    if (dm[i,nti['cb']] == 1 and dm[i,nti['pw']] == 1 and dm[i,nti['fc']] == 0):
        P_cb_pw_fc[1][1][0] += 1
    if (dm[i,nti['cb']] == 1 and dm[i,nti['pw']] == 1 and dm[i,nti['fc']] == 1):
        P_cb_pw_fc[1][1][1] += 1
    
    # P_gw_cb_wr_dp
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[0][0][0][0] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[0][0][0][1] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[0][0][1][0] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[0][0][1][1] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[0][1][0][0] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[0][1][0][1] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[0][1][1][0] += 1
    if (dm[i,nti['gw']] == 0 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[0][1][1][1] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[1][0][0][0] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[1][0][0][1] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[1][0][1][0] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 0 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[1][0][1][1] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[1][1][0][0] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 0 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[1][1][0][1] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 0):
        P_gw_cb_wr_dp[1][1][1][0] += 1
    if (dm[i,nti['gw']] == 1 and dm[i,nti['cb']] == 1 and dm[i,nti['wr']] == 1 and dm[i,nti['dp']] == 1):
        P_gw_cb_wr_dp[1][1][1][1] += 1
    
    # P_hw_gw_fh
    if (dm[i,nti['hw']] == 0 and dm[i,nti['gw']] == 0 and dm[i,nti['fh']] == 0):
        P_hw_gw_fh[0][0][0] += 1
    if (dm[i,nti['hw']] == 0 and dm[i,nti['gw']] == 0 and dm[i,nti['fh']] == 1):
        P_hw_gw_fh[0][0][1] += 1
    if (dm[i,nti['hw']] == 0 and dm[i,nti['gw']] == 1 and dm[i,nti['fh']] == 0):
        P_hw_gw_fh[0][1][0] += 1
    if (dm[i,nti['hw']] == 0 and dm[i,nti['gw']] == 1 and dm[i,nti['fh']] == 1):
        P_hw_gw_fh[0][1][1] += 1
    if (dm[i,nti['hw']] == 1 and dm[i,nti['gw']] == 0 and dm[i,nti['fh']] == 0):
        P_hw_gw_fh[1][0][0] += 1
    if (dm[i,nti['hw']] == 1 and dm[i,nti['gw']] == 0 and dm[i,nti['fh']] == 1):
        P_hw_gw_fh[1][0][1] += 1
    if (dm[i,nti['hw']] == 1 and dm[i,nti['gw']] == 1 and dm[i,nti['fh']] == 0):
        P_hw_gw_fh[1][1][0] += 1
    if (dm[i,nti['hw']] == 1 and dm[i,nti['gw']] == 1 and dm[i,nti['fh']] == 1):
        P_hw_gw_fh[1][1][1] += 1
        
    # P_li_gw_gs
    if (dm[i,nti['li']] == 0 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 0):
        P_li_gw_gs[0][0][0] += 1
    if (dm[i,nti['li']] == 0 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 1):
        P_li_gw_gs[0][0][1] += 1
    if (dm[i,nti['li']] == 0 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 0):
        P_li_gw_gs[0][1][0] += 1
    if (dm[i,nti['li']] == 0 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 1):
        P_li_gw_gs[0][1][1] += 1
    if (dm[i,nti['li']] == 1 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 0):
        P_li_gw_gs[1][0][0] += 1
    if (dm[i,nti['li']] == 1 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 1):
        P_li_gw_gs[1][0][1] += 1
    if (dm[i,nti['li']] == 1 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 0):
        P_li_gw_gs[1][1][0] += 1
    if (dm[i,nti['li']] == 1 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 1):
        P_li_gw_gs[1][1][1] += 1
        
    # P_ls_he
    if (dm[i,nti['ls']] == 0 and dm[i,nti['he']] == 0):
        P_ls_he[0][0] += 1
    if (dm[i,nti['ls']] == 0 and dm[i,nti['he']] == 1):
        P_ls_he[0][1] += 1
    if (dm[i,nti['ls']] == 1 and dm[i,nti['he']] == 0):
        P_ls_he[1][0] += 1
    if (dm[i,nti['ls']] == 1 and dm[i,nti['he']] == 1):
        P_ls_he[1][1] += 1
        
    # P_vp_pw
    if (dm[i,nti['vp']] == 0 and dm[i,nti['pw']] == 0):
        P_vp_pw[0][0] += 1
    if (dm[i,nti['vp']] == 0 and dm[i,nti['pw']] == 1):
        P_vp_pw[0][1] += 1
    if (dm[i,nti['vp']] == 1 and dm[i,nti['pw']] == 0):
        P_vp_pw[1][0] += 1
    if (dm[i,nti['vp']] == 1 and dm[i,nti['pw']] == 1):
        P_vp_pw[1][1] += 1
        
    # P_bs_fp_fc
    if (dm[i,nti['bs']] == 0 and dm[i,nti['fp']] == 0 and dm[i,nti['fc']] == 0):
        P_bs_fp_fc[0][0][0] += 1
    if (dm[i,nti['bs']] == 0 and dm[i,nti['fp']] == 0 and dm[i,nti['fc']] == 1):
        P_bs_fp_fc[0][0][1] += 1
    if (dm[i,nti['bs']] == 0 and dm[i,nti['fp']] == 1 and dm[i,nti['fc']] == 0):
        P_bs_fp_fc[0][1][0] += 1
    if (dm[i,nti['bs']] == 0 and dm[i,nti['fp']] == 1 and dm[i,nti['fc']] == 1):
        P_bs_fp_fc[0][1][1] += 1
    if (dm[i,nti['bs']] == 1 and dm[i,nti['fp']] == 0 and dm[i,nti['fc']] == 0):
        P_bs_fp_fc[1][0][0] += 1
    if (dm[i,nti['bs']] == 1 and dm[i,nti['fp']] == 0 and dm[i,nti['fc']] == 1):
        P_bs_fp_fc[1][0][1] += 1
    if (dm[i,nti['bs']] == 1 and dm[i,nti['fp']] == 1 and dm[i,nti['fc']] == 0):
        P_bs_fp_fc[1][1][0] += 1
    if (dm[i,nti['bs']] == 1 and dm[i,nti['fp']] == 1 and dm[i,nti['fc']] == 1):
        P_bs_fp_fc[1][1][1] += 1   
    
    # P_lo_cb
    if (dm[i,nti['lo']] == 0 and dm[i,nti['cb']] == 0):
        P_lo_cb[0][0] += 1
    if (dm[i,nti['lo']] == 0 and dm[i,nti['cb']] == 1):
        P_lo_cb[0][1] += 1
    if (dm[i,nti['lo']] == 1 and dm[i,nti['cb']] == 0):
        P_lo_cb[1][0] += 1
    if (dm[i,nti['lo']] == 1 and dm[i,nti['cb']] == 1):
        P_lo_cb[1][1] += 1
        
    # P_hp_cb_dp
    if (dm[i,nti['hp']] == 0 and dm[i,nti['cb']] == 0 and dm[i,nti['dp']] == 0):
        P_hp_cb_dp[0][0][0] += 1
    if (dm[i,nti['hp']] == 0 and dm[i,nti['cb']] == 0 and dm[i,nti['dp']] == 1):
        P_hp_cb_dp[0][0][1] += 1
    if (dm[i,nti['hp']] == 0 and dm[i,nti['cb']] == 1 and dm[i,nti['dp']] == 0):
        P_hp_cb_dp[0][1][0] += 1
    if (dm[i,nti['hp']] == 0 and dm[i,nti['cb']] == 1 and dm[i,nti['dp']] == 1):
        P_hp_cb_dp[0][1][1] += 1
    if (dm[i,nti['hp']] == 1 and dm[i,nti['cb']] == 0 and dm[i,nti['dp']] == 0):
        P_hp_cb_dp[1][0][0] += 1
    if (dm[i,nti['hp']] == 1 and dm[i,nti['cb']] == 0 and dm[i,nti['dp']] == 1):
        P_hp_cb_dp[1][0][1] += 1
    if (dm[i,nti['hp']] == 1 and dm[i,nti['cb']] == 1 and dm[i,nti['dp']] == 0):
        P_hp_cb_dp[1][1][0] += 1
    if (dm[i,nti['hp']] == 1 and dm[i,nti['cb']] == 1 and dm[i,nti['dp']] == 1):
        P_hp_cb_dp[1][1][1] += 1    
    
    # P_me_gw_gs
    if (dm[i,nti['me']] == 0 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 0):
        P_me_gw_gs[0][0][0] += 1
    if (dm[i,nti['me']] == 0 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 1):
        P_me_gw_gs[0][0][1] += 1
    if (dm[i,nti['me']] == 0 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 0):
        P_me_gw_gs[0][1][0] += 1
    if (dm[i,nti['me']] == 0 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 1):
        P_me_gw_gs[0][1][1] += 1
    if (dm[i,nti['me']] == 1 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 0):
        P_me_gw_gs[1][0][0] += 1
    if (dm[i,nti['me']] == 1 and dm[i,nti['gw']] == 0 and dm[i,nti['gs']] == 1):
        P_me_gw_gs[1][0][1] += 1
    if (dm[i,nti['me']] == 1 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 0):
        P_me_gw_gs[1][1][0] += 1
    if (dm[i,nti['me']] == 1 and dm[i,nti['gw']] == 1 and dm[i,nti['gs']] == 1):
        P_me_gw_gs[1][1][1] += 1
    
    # P_ta_me_hw
    if (dm[i,nti['ta']] == 0 and dm[i,nti['me']] == 0 and dm[i,nti['hw']] == 0):
        P_ta_me_hw[0][0][0] += 1
    if (dm[i,nti['ta']] == 0 and dm[i,nti['me']] == 0 and dm[i,nti['hw']] == 1):
        P_ta_me_hw[0][0][1] += 1
    if (dm[i,nti['ta']] == 0 and dm[i,nti['me']] == 1 and dm[i,nti['hw']] == 0):
        P_ta_me_hw[0][1][0] += 1
    if (dm[i,nti['ta']] == 0 and dm[i,nti['me']] == 1 and dm[i,nti['hw']] == 1):
        P_ta_me_hw[0][1][1] += 1
    if (dm[i,nti['ta']] == 1 and dm[i,nti['me']] == 0 and dm[i,nti['hw']] == 0):
        P_ta_me_hw[1][0][0] += 1
    if (dm[i,nti['ta']] == 1 and dm[i,nti['me']] == 0 and dm[i,nti['hw']] == 1):
        P_ta_me_hw[1][0][1] += 1
    if (dm[i,nti['ta']] == 1 and dm[i,nti['me']] == 1 and dm[i,nti['hw']] == 0):
        P_ta_me_hw[1][1][0] += 1
    if (dm[i,nti['ta']] == 1 and dm[i,nti['me']] == 1 and dm[i,nti['hw']] == 1):
        P_ta_me_hw[1][1][1] += 1

for i in range(P_pw_he_fp.shape[0]):  
    for j in range(P_pw_he_fp.shape[1]):
        cur_sum = 0
        for k in range(P_pw_he_fp.shape[2]):
            cur_sum += P_pw_he_fp[k,j,i]
        P_pw_he_fp[:,j,i] /=cur_sum

for i in range(P_cb_pw_fc.shape[0]):  
    for j in range(P_cb_pw_fc.shape[1]):
        cur_sum = 0
        for k in range(P_cb_pw_fc.shape[2]):
            cur_sum += P_cb_pw_fc[k,j,i]
        P_cb_pw_fc[:,j,i] /= cur_sum 
        
for i in range(P_gw_cb_wr_dp.shape[0]):  
    for j in range(P_gw_cb_wr_dp.shape[1]):
        for k in range(P_gw_cb_wr_dp.shape[2]):
            cur_sum = 0
            for l in range(P_gw_cb_wr_dp.shape[3]):
                cur_sum +=P_gw_cb_wr_dp[l,k,j,i]
            P_gw_cb_wr_dp[:,k,j,i] /= cur_sum 

for i in range(P_hw_gw_fh.shape[0]):  
    for j in range(P_hw_gw_fh.shape[1]):
        cur_sum = 0
        for k in range(P_hw_gw_fh.shape[2]):
            cur_sum += P_hw_gw_fh[k,j,i]
        P_hw_gw_fh[:,j,i] /= cur_sum 
        
for i in range(P_li_gw_gs.shape[0]):  
    for j in range(P_li_gw_gs.shape[1]):
        cur_sum = 0
        for k in range(P_li_gw_gs.shape[2]):
            cur_sum += P_li_gw_gs[k,j,i]
        P_li_gw_gs[:,j,i] /= cur_sum 

for i in range(P_ls_he.shape[0]):
    cur_sum = 0
    for j in range(P_ls_he.shape[1]):
        cur_sum += P_ls_he[j,i]
    P_ls_he[:,i] /= cur_sum
        
for i in range(P_vp_pw.shape[0]):
    cur_sum = 0
    for j in range(P_vp_pw.shape[1]):
        cur_sum += P_vp_pw[j,i]
    P_vp_pw[:,i] /= cur_sum

for i in range(P_bs_fp_fc.shape[0]):  
    for j in range(P_bs_fp_fc.shape[1]):
        cur_sum = 0
        for k in range(P_bs_fp_fc.shape[2]):
            cur_sum += P_bs_fp_fc[k,j,i]
        P_bs_fp_fc[:,j,i] /= cur_sum     
    
for i in range(P_lo_cb.shape[0]):
    cur_sum = 0
    for j in range(P_lo_cb.shape[1]):
        cur_sum += P_lo_cb[j,i]
    P_lo_cb[:,i] /= cur_sum
    
for i in range(P_hp_cb_dp.shape[0]):  
    for j in range(P_hp_cb_dp.shape[1]):
        cur_sum = 0
        for k in range(P_hp_cb_dp.shape[2]):
            cur_sum += P_hp_cb_dp[k,j,i]
        P_hp_cb_dp[:,j,i] /= cur_sum  
        
for i in range(P_me_gw_gs.shape[0]):  
    for j in range(P_me_gw_gs.shape[1]):
        cur_sum = 0
        for k in range(P_me_gw_gs.shape[2]):
            cur_sum += P_me_gw_gs[k,j,i]
        P_me_gw_gs[:,j,i] /=cur_sum  

for i in range(P_ta_me_hw.shape[0]):  
    for j in range(P_ta_me_hw.shape[1]):
        cur_sum = 0
        for k in range(P_ta_me_hw.shape[2]):
            cur_sum += P_ta_me_hw[k,j,i]
        P_ta_me_hw[:,j,i] /=cur_sum      
        
probs = dict()
probs[nti['he']] = P_he
probs[nti['fp']] = P_fp
probs[nti['fc']] = P_fc 
probs[nti['wr']] = P_wr  
probs[nti['dp']] = P_dp
probs[nti['fh']] = P_fh
probs[nti['gs']] = P_gs  

probs[nti['pw']] = P_pw_he_fp 
probs[nti['cb']] = P_cb_pw_fc  
probs[nti['gw']] = P_gw_cb_wr_dp
probs[nti['hw']] = P_hw_gw_fh 
probs[nti['li']] = P_li_gw_gs

probs[nti['ls']] = P_ls_he
probs[nti['vp']] = P_vp_pw
probs[nti['bs']] = P_bs_fp_fc
probs[nti['lo']] = P_lo_cb
probs[nti['hp']] = P_hp_cb_dp 
probs[nti['me']] = P_me_gw_gs
probs[nti['ta']] = P_ta_me_hw 

# print(P_hw_gw_fh)
        
# print(P_he,P_fp,P_fc, P_wr,P_dp,P_fh,P_gs,P_pw_he_fp,P_cb_pw_fc,P_gw_cb_wr_dp,P_hw_gw_fh,
#       P_li_gw_gs,P_ls_he,P_vp_pw,P_bs_fp_fc,P_lo_cb,P_hp_cb_dp,P_me_gw_gs,P_ta_me_hw)

print(probs)




{0: array([ 0.01021194,  0.98978806]), 1: array([ 0.99592972,  0.00407028]), 2: array([ 0.98017883,  0.01982117]), 3: array([ 0.10062027,  0.89937973]), 4: array([ 0.94974518,  0.05025482]), 5: array([ 0.99489212,  0.00510788]), 6: array([ 0.05112839,  0.94887161]), 7: array([[[ 1.,  1.],
        [ 0.,  1.]],

       [[ 0.,  0.],
        [ 1.,  0.]]]), 8: array([[[ 1.,  1.],
        [ 0.,  1.]],

       [[ 0.,  0.],
        [ 1.,  0.]]]), 9: array([[[[ 1.,  1.],
         [ 1.,  1.]],

        [[ 1.,  1.],
         [ 0.,  1.]]],


       [[[ 0.,  0.],
         [ 0.,  0.]],

        [[ 0.,  0.],
         [ 1.,  0.]]]]), 10: array([[[ 1.,  1.],
        [ 0.,  1.]],

       [[ 0.,  0.],
        [ 1.,  0.]]]), 11: array([[[ 1.        ,  1.        ],
        [ 0.        ,  0.94989847]],

       [[ 0.        ,  0.        ],
        [ 1.        ,  0.05010153]]]), 12: array([[ 1.        ,  0.09855589],
       [ 0.        ,  0.90144411]]), 13: array([[ 1.        ,  0.00979432],
       [ 0.      

### Brute Force
The below code does exactly what you want to implement for step 2, but it brute forces it. Slow and memory efficient of course, but lets you test it. Also means that if you can't solve question 2 you can still use this to have a go at question 3.

In [4]:
def brute_marginals(known):
    """known is a dictionary, where a random variable index existing as a key in the dictionary
    indicates it has been observed. The value obtained using the key is the value the
    random variable has been observed as. Returns a 19x2 matrix, such that [rv,0] is the
    probability of random variable rv being False, [rv, 1] the probability of being True."""
    
    # Calculate the full joint (please don't ask)...
    dims = 'abcdefghijklmnopqrs'
    joint = numpy.einsum('a,b,c,d,e,f,g,hab,ihc,jide,kjf,ljg,ma,nh,obc,pi,qie,rjg,srk->' + dims,
                         P_he, P_fp, P_fc, P_wr, P_dp, P_fh, P_gs,
                         P_pw_he_fp, P_cb_pw_fc, P_gw_cb_wr_dp, P_hw_gw_fh, P_li_gw_gs,
                         P_ls_he, P_vp_pw, P_bs_fp_fc, P_lo_cb, P_hp_cb_dp, P_me_gw_gs, P_ta_me_hw)
    
    # Multiply in the known states (zero out the dimensions we know it's not)...
    for key, value in known.items():
        other = abs(value-1)
        index = [slice(None)] * len(joint.shape)
        index[key] = other
        joint[tuple(index)] = 0.0
    
    # Calculate and return all marginals...
    ret = numpy.empty((len(joint.shape), 2))
    for row in range(ret.shape[0]):
        ret[row,:] = numpy.einsum(dims + '->' + dims[row], joint)
    
    ret /= ret.sum(axis=1)[:,None]
    return ret



## 2. Belief Propagation

Your task is to write a function that take known states and calculates the marginal probability for every state of the graphical model. This will require using the _sum-product algorithm_ and passing all messages on the graph.

Here is the wikipedia page for reference: https://en.wikipedia.org/wiki/Belief_propagation (not the greatest, but not horrible)

Hints:
 * It is strongly suggested to use the same calling convention as `brute_marginals` above for your final interface.
 * The order of the variables above is such that each variable is dependent only on variables that occur before it. Depending on your approach this information may save you some effort.
 * You will want to add code before the function to prepare. This might involve creating Factor and Node classes. Or a list of edges in the graphical model, in the order they need to be processed. Choose whatever you are comfortable with, but spend time thinking it through first. If you get it wrong your code will get messy!
 * This problem is small enough that you can brute force it - code to do so has been provided above. Do test that your implementation is correct by comparing them!

__(15 marks)__
 * 5 marks for creating sensible data structures.
 * 7 marks for sending the messages correctly.
 * 2 marks for correctly calculating the marginals at the end.
 * 1 mark for testing.

In [5]:



# **************************************************************** 15 marks
# tried using the code provided with the gaussian example
class Node:
    # A class to produce a random variable node
        
    def __init__(self, name, links, factors):
       
        self.name = name
        self.msg_in = dict()
        self.msg_out = dict()
        
        for f in factors:
            self.msg_in[f] = None
            self.msg_out[f] = None
            
        self.factors = factors 
     
       
    def send(self, factor):
        # sends a message    
        msg = [1] * 2
        
        for i, f in enumerate(self.factors):
            print(i,f,factor)
            if f==factor:
                continue
            if self.msg_in == None
                pass
            
            if (msg is None):
                msg = self.msg_in[f]
            else:
                msg *= self.msg_in[f]
        
        self.msg_out[f] = msg
        
        
    def calc_belief(self):
        ret = None
        for f,m in self.msg_in.items():
            if (ret is None):
                ret = m
            else:
                ret *= m
        
        return ret

class Factor:
    # A class to define a factor receiving 2 or more messages
    
    def __init__(self, name, dist, nodes):

        self.name = name               
        self.dist = dist
        self.numlinks = len(rvs)
        self.nodes = nodes
        self.node_sti =  dict();
        
        for i,rv in enumerate(nodes):
            index = [None] * self.numlinks
            index[i] = slice(None)
            self.node_sti[rv.name] = index;
           
        
    def send(self, node):
    # sends
        prod = self.dist
        
        for i in self.nodes:
           
            if( i.name == node):
                continue
            
            for f,m in i.msg_out.items():
                if(f == self.name):
                    continue    
                other = i.msg_in[f]
                prod *= other
            
        for j in self.nodes:
            if (j.name == node):
                continue
            prod = prod.sum(axis=0)
                
        for k in self.nodes:
            if (k.name == node):
                prod = prod / prod.sum()
                k.msg_in[self.name] = prod


## 3. Bayesian Decision Theory

Bayesian decision theory sounds fancy, but is actually really simple. All you do is use Bayesian reasoning and a cost function to select the decision that, with everything integrated out, minimises the expected cost. In practise various approximations are usually made - here for instance we are using the mean model parameters instead of properly integrating them out; for a model such as this that is reasonable. Note that it isn't actually Bayesian unless you introduced a prior above!

Your task is to generate a flow chart indicating how a mechanic is to diagnose a problem with a coffee machine.
Here is an example: https://en.wikipedia.org/wiki/Flowchart#/media/File:LampFlowchart.svg Output it below the code block; don't worry about making it pretty but if your output is particularly hard to read feel free to copy & paste it into a markdown block and neaten it up.

Consider a machine to be diagnosed when confidence that one of the problems is in the broken state exceeds 90%. You can take multiple approaches to solving this problem, but the recommended technique:
 * Write a function to detect if a coffee machine is reliably diagnosed, given a dictionary of observed variables.
 * Write a recursive search function that takes a dictionary of observed variables. It will return a tree that includes the diagnostic variable to observe at each node and how long on average it takes to diagnose from that point in the tree.
 * Call the recursive search function with no observed variables and print out the tree it returns.
 
Note that this is very similar to constructing a random forest, but with a different objective, and instead of looping splits you loop unobserved diagnostic random variables. Remember to weight the machines by the probability of their defect occurring, which you learnt in step 1. Make sure you handle all observations being made and no defect being found (Don't delete these scenarios, as they do occur in real life; instead make sure that if everything is observed the defect is considered to have been found, even though the defect is probably with the user.).

__(10 marks)__
 * 2 marks for a function to detect if a machine is diagnosed.
 * 8 marks for the rest, whichever approach you choose to take.

In [132]:
# How long each diagnostic takes (in seconds)...
diag_time = dict()
diag_time[nti['ls']] = 20.0
diag_time[nti['vp']] = 240.0
diag_time[nti['bs']] = 3.0
diag_time[nti['lo']] = 3.0
diag_time[nti['hp']] = 10.0
diag_time[nti['me']] = 60.0
diag_time[nti['ta']] = 120.0



# **************************************************************** 10 marks

def rel_diagnosed(known,conf):
    # Given the known dict, calculate marginals and check whether they are over a confidence threshold.
    #   If they are then return true (reliably diagnosed), otherwise return false.
    
    marginals = brute_marginals(known)
    
    if(marginals[nti['he'],0] > conf):
        return True
        
    if(marginals[nti['fp'],1] > conf):
        return True
    
    if(marginals[nti['fc'],1] > conf):
        return True
    
    if(marginals[nti['wr'],0] > conf):
        return True
    
    if(marginals[nti['dp'],1] > conf):
        return True
    
    if(marginals[nti['fh'],1] > conf):
        return True
    
    if(marginals[nti['gs'],0] > conf):
        return True
    
    diagnostics_obs = 0;
    for k,m in known.items():
        if k >=12 :
            diagnostics_obs += 1
        else:
            continue
            
    if(diagnostics_obs != 7):
        return False
    else:
        return True
    
    return False


# known = dict()
# known[12] = 1
# rel_diagnosed(known,0.9)

def print_flow(flow, ctr = 0):
    if(ctr == 0):
        print('* t.t.d stands for Time To Diagnose')
        print()
        print('Espresso machine is out of order')
        print()
        print('                |               ')
        print('                |               ')
        print('                |               ')
        print()
    
    if(flow['diag']): # if machine was diagnosed
        return flow['check'], flow['time']
    else:
        ctr += 1
#         if print_flow(flow['yes'],ctr) != None:
        check, time = print_flow(flow['yes'],ctr)
        print(flow['what'],'----Yes----', check, '| t.t.d = ',time,'s')
        print()
        print('                |               ')
        print('                | No               ')
        print('                |               ')
        print()
        print_flow(flow['no'],ctr)
#         print_flow(flow['yes'],1)
        if(ctr == 7):
            print('     Problem considered resolved')
    pass

def build_flow(known, where, diag_time2, total_time = 0):
    if(rel_diagnosed(known,0.9)):
        if(where == nti['ls']):
            check = 'No main electricity, '# + "{0:.2f}".format(probs[nti['he']][0] * 100) + '%'
        if(where == nti['vp']):
            check = 'Check power supply unit'
        if(where == nti['bs']):
            check = 'Fried power supply unit or fried circuit board'
        if(where == nti['lo']):
            check = 'Check circuit board'
        if(where == nti['hp']):
            check = 'Check circuit board or there is a dead pumb'
        if(where == nti['me']):
            check = 'Check group head gasket.'
        if(where == nti['ta']):
            check = 'Problem is with the user'
        
        return {'diag' : True, 'check' : check, 'time' : total_time}
    else:
        min_time = min(diag_time2.values())

        min_time_key = [key for key, value in diag_time2.items() if value == min_time][0] # change for weighting
        total_time += min_time
        diag_time2[min_time_key] = numpy.inf
#         print(diag_time2)

        defect = [key for key, value in nti.items() if value == min_time_key][0]
        if(nti[defect] == nti['ls']):
            poss_defect = 'Unable to switch on room lights?'
            
            known[min_time_key] = 1
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 0
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
            
        if(nti[defect] == nti['vp']):
            poss_defect = 'Unable to measure voltage across power supply unit?'
            
            known[min_time_key] = 0
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 1
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
        if(nti[defect] == nti['bs']):
            poss_defect = 'Is there a burning smell?'
            
            known[min_time_key] = 0
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 1
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
        if(nti[defect] == nti['lo']):
            poss_defect = 'Power light switched off?'
            
            known[min_time_key] = 0
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 1
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
        if(nti[defect] == nti['hp']):
            poss_defect = 'Unable to hear pump?'
            
            known[min_time_key] = 1
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 0
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
        if(nti[defect] == nti['me']):
            poss_defect = 'Unable to produce an espresso?'
            
            known[min_time_key] = 0
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 1
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
        if(nti[defect] == nti['ta']):
            poss_defect = 'Makes a tasty espresso?'
            
            known[min_time_key] = 0
            No = build_flow(known, min_time_key, diag_time2,total_time)
        
            known[min_time_key] = 1
            Yes = build_flow(known, min_time_key, diag_time2,total_time)
        
#         print(defect)
        
        
        return { 'diag' : False,
                 'defect': defect,
                 'what' : poss_defect,
                 'no' : No,
                 'yes' : Yes,
                 'time' : total_time}
 

        



# print(min(3.0, numpy.inf))
diag_time2 = diag_time.copy()
ret = build_flow({},None,diag_time2)
# print(ret)
print_flow(ret)




* t.t.d stands for Time To Diagnose

Espresso machine is out of order

                |               
                |               
                |               

Is there a burning smell? ----Yes---- Fried power supply unit or fried circuit board | t.t.d =  3.0 s

                |               
                | No               
                |               

Power light switched off? ----Yes---- Check circuit board | t.t.d =  6.0 s

                |               
                | No               
                |               

Unable to hear pump? ----Yes---- Check circuit board or there is a dead pumb | t.t.d =  16.0 s

                |               
                | No               
                |               

Unable to switch on room lights? ----Yes---- No main electricity,  | t.t.d =  36.0 s

                |               
                | No               
                |               

Unable to produce an espresso? ----Yes---- Check group h