<div class="alert alert-danger">
    Read the following instructions carefully!
</div>

# Probability, Bayes' Theorem, (Conditional) Independence
## Problem Set 1
## Probabilistic Models UE

---
In the first assignment, you will familiarise yourself with matrix computations in NumPy. You must use operations on NumPy arrays, even if it would be possible to solve the exercises with simple multiplications, divisions, and loops. This will ensure that you get a feeling of how matrix operations and broadcasting works. If you are not familiar with these concepts, look at the interactive introduction to Python and the honey badger example.

**Hint:** You can still compute the correct results on paper and compare them with the solution produced by your Python code!


Before you start with this problem:
- Study the corresponding slide deck(s) and consider re-watching the lecture recording(s).
- Internalize the material until you feel confident you can work with them or implement them yourself. Only then start working on this problem; otherwise, you will waste a lot of time.

---


<div class="alert alert-warning">

**Due-Date:** see Moodle
   
**Constraints**: Operations on NumPy arrays only.
  
**Automatic Grading:** 

- Replace the placeholders `# YOUR CODE HERE` `raise NotImplementedError()` / `YOUR ANSWER HERE` with your code / answers.
- Put results in the corresponding variable; otherwise, we will not grade your solution (i.e., we assign 0 points).
- Do not delete or add cells.
    
**Submission:** As a ZIP-package via Moodle; the ZIP-package must have the following structure:
    
    + <student ID, (k/ vk + 8 digits), e.g. k01234567>.zip
    |
    |-- Problem_1.ipynb
    |-- ...
    |-- Problem_<# of problems>.ipynb
    +
    
**Questions?** Post it into the Problem Set Forum!
</div>



In [1]:
import numpy as np
from helpers import print_table

# 1. Inference-by-Enumeration (8 points)

The Inference-by-Enumeration algorithm computes the answer to a probabilistic query $P(\mathbf{X} \mid \mathbf{E})$ exactly from the full joint distribution table (FJDT).

---
### 1.1. Implementation


<div class="alert alert-warning">
Implement the Inference-by-Enumeration algorithm. (2 points)
</div>

Implement the `inference_by_enumeration` function for a generic probabilistic query of the form $P(\mathbf{X} \mid \mathbf{E})$. Note that this version of the Inference-by-Enumeration algorithm computes the probabilistic query for all possible assignments to the evidence variables, not only for one specific assignment (cf. slide deck: Probabilistic Models - Part 2: Fundamental Concepts and Notation, p. 40). The function must return one object:
- The answer to the probabilistic query, which is a `np.ndarray` with the same number of dimensions and the same variable order as the FJDT, but not the same size: The dimensions of non-query and non-evidence variables ($\mathbf{Z}$) must be converted to singleton dimensions, i.e., dimensions of size one.

For example, if we have a full joint distribution table of three binary variables (shape $2\times2\times2$) and we ask for the distributions of the first variable given the second variable, the result would be of shape $2\times2\times1$ (corresponding to two stacked conditional distribution tables).

**Hint:** Remember to solve this without a `for` loop. Set the `keepdims` parameter of NumPy's <a href="https://numpy.org/doc/stable/reference/generated/numpy.sum.html">sum</a> method to `True` to not discard the reduced dimensions. Keeping these empty dimensions simplifies <a href="https://numpy.org/doc/stable/user/basics.broadcasting.html">broadcasting operations</a> to a no-brainer.

In [27]:
def inference_by_enumeration(
    FJDT: np.ndarray, 
    query_variable_indices: tuple, 
    evidence_variable_indices: tuple=tuple()
) -> np.ndarray:
    '''
    Computes the answer to a probabilistic query exactly from the full joint distribution table.
    :param table: The full joint distribution table as a np.ndarray.
    :param query_variable_indices: A tuple containing the indices of the query variables in the FJDT.
    :param evidence_variable_indices: A tuple containing the indices of the evidence variables in the FJDT.
    :returns: The answer to the probabilistic query; a `np.ndarray`.
    ''' 
    assert type(FJDT) == np.ndarray, "FJDT must be a np.ndarray"
    assert type(query_variable_indices) == tuple, "query_variable_indices must be a tuple"
    assert type(evidence_variable_indices) == tuple, "evidence_variable_indices must be a tuple"
        
    # VL 2, slide 40:
    
    # 1. P(X,y)=
    # Find out the variable indices which not belong to the query and evidence:
    set_merged = set(query_variable_indices + evidence_variable_indices)
    set_full = set(range(len(FJDT.shape)))
    tuple_non_evidence = tuple(set_full-set_merged)
    
    PXAndy = np.sum(FJDT, axis=tuple_non_evidence, keepdims=True) # each Y dimension should be still there

    # 2. Normalizaztion constant Z=P(y)=
    Py = np.sum(PXAndy, axis=query_variable_indices, keepdims=True) # each Y dimension should be still there
    
    # 3. P(X|y)=
    PXgiveny = PXAndy/Py

    return PXgiveny

In [28]:
# create a full joint distribution table for three binary variables
ABC = np.ones((2,2,2)) / 2**2
# name the variable indices so we can refer to them more easily
A, B, C = 0, 1, 2

# check type & shape of result
assert type(inference_by_enumeration(ABC, (B, C), ())) == np.ndarray
# compute P(A)
assert inference_by_enumeration(ABC, (A,), ()).shape == (2, 1, 1)
# compute P(BC)
assert inference_by_enumeration(ABC, (B, C), ()).shape == (1, 2, 2)
# compute P(BC|A)
assert inference_by_enumeration(ABC, (B, C), (A,)).shape == (2, 2, 2)
# compute P(B|AC)
assert inference_by_enumeration(ABC, (B,), (C,A,)).shape == (2, 2, 2)


---
### 1.2. Example: Computing Probabilities from a Full Joint Distribution Table

<br>
<center><img src="https://upload.wikimedia.org/wikipedia/commons/b/b9/Atlantic_blue_marlin.jpg" width="500" height="600">
<br>

Based on his experience, Santiago, an old Cuban fisherman, has learned that temperature and precipitation are the most prominent factors influencing marlin fishing. After decades of (more or less) successful years, he decides to retire and pass on his knowledge to a young apprentice. Since the apprentice received excellent grades in her probabilistic models class, he creates the following full joint distribution table $P(C, R, H)$ and hands it over to her:


<table style="border-collapse:collapse;border-spacing:0;width:500px"><tr><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center" rowspan="2">$P({C}, {R}, {H})$</th><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top" colspan="2">$\neg r$<br></th><th style="font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top" colspan="2">$r$</th></tr><tr><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">$\neg h$</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">$h$</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">$\neg h$</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">$h$</td></tr><tr><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center">$\neg c$<br></td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.21</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.31</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.35</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.07<br></td></tr><tr><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center">$c$</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.04</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.01</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.004</td><td style="font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;text-align:center;vertical-align:top">0.006</td></tr></table>

In this table, $C$, $R$, and $H$ are the binary random variables encoding catch, rain, and hot, respectively. 
    
    
**Hint**: You can use `print_table` to print your probability distribution tables in a similar fashion.

In [29]:
help(print_table)

Help on function print_table in module helpers:

print_table(probability_table: numpy.ndarray, variable_names: str) -> None
    Prints a probability distribution table.
    
    Parameters
    ----------
    probability_table : np.ndarray
        The probability distribution table
    variable_names : str
        A string containing the variable names, e.g., 'CDE'.
    
    Returns
    -------
    None



<div class="alert alert-warning">
Create a NumPy array that contains the full joint distribution table $P(C, R, H)$ as defined above. <b>Important</b>: Encode $C$, $R$, and $H$ in the first, second, and third dimension of the NumPy array, respectively. Use index 0 for event *False* and index 1 for event *True*. (1 point)
</div>

In [49]:
CRH = np.asarray([[[0.21, 0.31],[0.35, 0.07]],[[0.04, 0.01],[0.004, 0.006]]])
# array of shape AxBxC where A gives the number of 2D arrays, B as the rows, C as the columns

# Check the result with print_table(CRH, 'CRH')
print_table(CRH, 'CRH')

0,1,2,3,4
,$r_0$,$r_0$,$r_1$,$r_1$
,$h_0$,$h_1$,$h_0$,$h_1$
$c_0$,0.210,0.310,0.350,0.070
$c_1$,0.040,0.010,0.004,0.006


In [50]:
assert CRH is not None, 'Store the result into the variable \'CRH\'!'
assert CRH.shape == (2,2,2), 'The full joint distribution table must have shape (2,2,2)'
assert CRH.sum() == 1, 'The probabilities of all atomic events must sum to one.'


---
### 1.3. Probabilistic Queries


Compute the following two probabilistic queries. For each query, there are three tasks:
1. Write down the *probabilistic query* and the *expression to compute the answer from the full joint distribution* (you do not need to do the actual computation). Keep your answer short and use $\LaTeX$ and Markdown.
2. Give the *shape of the result of the probabilistic query* (without singleton dimensions) and the *number of non-redundant entries* in the result, storing your answer into the provided variables. Example:
 - the full joint distirbution table of the previous example has a three dimensions with two entries each, thus it's shape is (2,2,2)
 - the full joint distribution table has $2*2*2$ entries; however one of them is redundant; thus the number of non-redundant entries is $2*2*2 - 1$.
3. Check your answer with the `inference_by_enumeration` function and store the result into the provided variable. **If necessary, select the result for the given evidence and remove all singleton dimensions.**

<div class="alert alert-warning">
Compute the probability distribution over catching a marlin. (2 points)
</div>

What is the probability distribution for catching a marlin in general NOT given/known the concrete states of the the other random variables?

\begin{align*}
P(C) = \Sigma_{x \in \{r, \neg r \}} \Sigma_{y \in \{h, \neg h \}} P(C, R=x, H=y)
\end{align*}

In [58]:
probability_table_shape = (2,) # Only the shape of C is remaining # e.g., (2,2,2) for the FJDT, (2,) for a vector, () for a scalar
number_non_redundant_elements = 2 - 1 # e.g., 2*2*2 - 1 for the FJDT 
C = inference_by_enumeration(CRH, (0,), ()).squeeze()
# Use inference_by_enumeration to compute the result. Select the result for the given evidence (if any) and discard singleton dimensions.


In [56]:
assert type(probability_table_shape) is tuple, 'Shape of the result must be a tuple.'
assert type(number_non_redundant_elements) is int, 'Number of elements must be int.'
assert C is not None, 'Store the result into the variable \'C\'!'


<div class="alert alert-warning">
Compute the probability distribution over catching a marlin given that the weather is <b>not</b> rainy. (2 points)
</div>

What is the probability distribution for catching a marlin assuming that the the random variable R has the event NOT rainy?

\begin{align*}
P(C, \neg r) &= \Sigma_{y \in \{h, \neg h \}} P(C, \neg r, H=y)\\
P(\neg r) &= \Sigma_{z \in \{c, \neg c \}}  \Sigma_{y \in \{h, \neg h \}} P(\neg r, C=z, H=y)\\
P(C|\neg r) &= \frac{P(C,\neg r)}{P(\neg r)} 
\end{align*}

In [79]:
probability_table_shape = (2,) # e.g., (2,2,2) for the FJDT, (2,) for a vector, () for a scalar
number_non_redundant_elements = 2 -1  # e.g., 2*2*2 - 1 for the FJDT
C_not_r = inference_by_enumeration(CRH, (0,), (1,))[:,0].squeeze()
# Use inference_by_enumeration to compute the result. Select the result for the given evidence (if any) and discard singleton dimensions.


In [80]:
assert type(probability_table_shape) is tuple, 'Shape of the result must be a tuple.'
assert type(number_non_redundant_elements) is int, 'Number of elements must be int.'
assert C_not_r is not None, 'Store the result into the variable \'C_not_r\'!'


---
### 1.4. Independence

<div class="alert alert-warning">
Prove that catching the marlin is not independent of the weather being rainy, (i.e., $C \not \perp R$) by showing that the joint distribution of the variables is not equal to the product of the marginals. Check your result with NumPy. (1 point)
</div>



**Hint**: It is sufficient to print the joint distribution and the product of the marginal distributions (use `print_table`.)

In [102]:
CR = inference_by_enumeration(CRH, (0,1), ()).squeeze() # store the joint distribution into this variable

# Make some kind of cartesian product:

# PC gives us the rows as C is the dim 0:
PC = inference_by_enumeration(CRH, (0,), ()).squeeze()
PC_expanded = np.asarray([PC[0], PC[0], PC[1], PC[1]]) # [row0/negated c, row0/negated c, row1/c, row1/c]

# PR gives us the columns as R is the dim 1:
PR = inference_by_enumeration(CRH, (1,), ()).squeeze()
PR_expanded = np.concatenate((PR,PR)) # [column0/negated r, column1/r, column0/negated r, column1/r]

# Take the product and reshape it such that the above explained row-column-structure becomes true:
C_times_R = (PC_expanded * PR_expanded).reshape(2,2) # store the product of the marginal distributions into this variable

print('Joint Probability:')
print_table(CR, 'CR')

print('Product of Marginals:')
print_table(C_times_R, 'CR')

Joint Probability:


0,1,2
,$r_0$,$r_1$
$c_0$,0.520,0.420
$c_1$,0.050,0.010


Product of Marginals:


0,1,2
,$r_0$,$r_1$
$c_0$,0.536,0.404
$c_1$,0.034,0.026


In [103]:
assert type(CR) == np.ndarray, 'Results must be NumPy arrays.'
assert type(C_times_R) == np.ndarray, 'Results must be NumPy arrays.'

assert CR.shape == (2,2), 'Results must be 2x2 arrays.'
assert C_times_R.shape == (2,2), 'Results must be 2x2 arrays.'