In [1]:
import numpy as np
from helpers import print_table

# 2. Pairwise vs. Mutual Independence (6 points)

**Definition**: We say that two random variables are *pairwise independent* if $$p(X_n \mid X_m) = p(X_n)$$ and hence $$p(X_m, X_n) =  p(X_n \mid X_m) p(X_m) = p(X_n) p(X_m) $$

**Definition**: We say that $n$ random variables are *mutually independent* if $$p(X_i \mid X_{S}) = p(X_i)\;\; \forall S \subseteq \{1, \dots, n\} \setminus \{ i \}$$ and hence $$p(X_{1:n}) = \prod_{i=1}^n p(X_i)$$




<div class="alert alert-warning">
Show that pairwise independence between all pairs of variables does not necessarily imply mutual independence. Come up with a minimal counter example that has exactly three binary random variables.
</div>

Specify this counterexample via its full joint distribution table (FJDT). **Briefly** outline your thought process in the text field below (use $\LaTeX$ and markdown) and store the model's full joint distribution table into the `XYZ` variable. It is sufficient to show pairwise independence and non-mutual independence by comparing products of marginals and joint distributions. 

**Hint**: Copy your implementation of `inference_by_enumeration` from Problem 1. You can use `print_table` to visualize your distribution tables such as the FJDT, products of marginals, and joint distributions.

In [2]:
help(print_table)

Help on function print_table in module helpers:

print_table(probability_table: numpy.ndarray, variable_names: str) -> None
    Prints a probability distribution table.
    
    Parameters
    ----------
    probability_table : np.ndarray
        The probability distribution table
    variable_names : str
        A string containing the variable names, e.g., 'CDE'.
    
    Returns
    -------
    None



YOUR ANSWER HERE

#### Counter example  
$P(X, Y, Z)$ defined as: 

- $P(X=0, Y=0, Z=0) = \frac{1}{4}$

- $P(X=0, Y=1, Z=1) = \frac{1}{4}$

- $P(X=1, Y=0, Z=1) = \frac{1}{4}$

- $P(X=1, Y=1, Z=0) = \frac{1}{4}$

- while all other combinations are zero. 

#### Marginal distributions (as requested by the exercise)
The marginal distributions can be defined as: 

- $P(X=0) = P(X=0, Y=0, Z=0) + P(X=0, Y=1, Z=1) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}$

- $P(Y=0) = P(X=0, Y=0, Z=0) + P(X=1, Y=0, Z=1) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}$

- $P(Z=0) = P(X=0, Y=0, Z=0) + P(X=1, Y=1, Z=0) = \frac{1}{4} + \frac{1}{4} = \frac{1}{2}$

- same applies to $P(X=1)$, $P(Y=1)$ and $P(Z=1)$ which are equal to $1/2$

#### Pairwise Independence (as requested by the exercise)
The pairwise independence is defined as:

- $P(X=0, Y=0) = P(X=0, Y=0, Z=0) = \frac{1}{4}$

- $P(X=0, Y=1) = P(X=0, Y=1, Z=1) = \frac{1}{4}$

- $P(X=1, Y=0) = P(X=1, Y=0, Z=1) = \frac{1}{4}$

- $P(X=1, Y=1) = P(X=1, Y=1, Z=0) = \frac{1}{4}$

which leads to: 

$$
P(X=i)P(Y=j) = \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4}, \forall i,j \in {0,1}
$$

Similarly, it can be proof that the variable $X$ and $Z$, as well as $Y$ and $Z$ are independent.

#### Non-mutual Indipendence:

The mutual independence is respected if $P(X,Y,Z) = P(X) P(Y) P(Z)$, however we calculated that $P(X,Y,Z) = \frac{1}{4}$ while $P(X) P(Y) P(Z) = \frac{1}{2} \cdot \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{8}$. The pairwise independence does not imply mutual independence. 

In [3]:
XYZ = np.zeros((2,2,2))

# YOUR CODE HERE
# raise NotImplementedError()

XYZ[0, 0, 0] = 0.25  
XYZ[0, 1, 1] = 0.25  
XYZ[1, 0, 1] = 0.25  
XYZ[1, 1, 0] = 0.25  


In [4]:
# copy inference_by_enumeration from Problem 1 & print and compare the probability tables here!

# YOUR CODE HERE
# raise NotImplementedError()

# Copied from previous exercise

def inference_by_enumeration(
    FJDT: np.ndarray, 
    query_variable_indices: tuple, 
    evidence_variable_indices: tuple = tuple()
) -> np.ndarray:
    '''
    Constructs a conditional probability table (CPT) from the full joint distribution.
    :param FJDT: The full joint distribution table as a np.ndarray.
    :param query_var: A tuple of indices representing the query variables.
    :param cond_var: A tuple of indices representing the conditioning variables.
    :returns: The conditional probability table (CPT) as a np.ndarray.
    '''
    assert type(FJDT) == np.ndarray, "FJDT must be a np.ndarray"
    assert type(query_variable_indices) == tuple, "query_variable_indices must be a tuple"
    assert type(evidence_variable_indices) == tuple, "evidence_variable_indices must be a tuple"

    all_vars = set(range(FJDT.ndim))
    query_vars = set(query_variable_indices)
    evidence_vars = set(evidence_variable_indices)
    hidden_vars = list(all_vars - query_vars - evidence_vars)

    marginalized = FJDT.sum(axis=tuple(hidden_vars), keepdims=True)

    sum_over_query = marginalized.sum(axis=tuple(query_variable_indices), keepdims=True)

    # In case of division by 0
    with np.errstate(divide='ignore', invalid='ignore'):
            cpt = marginalized / sum_over_query
            cpt[~np.isfinite(cpt)] = 0
    
    # YOUR CODE HERE
    # raise NotImplementedError()
    return cpt

X = 0
Y = 1
Z = 2

# Marginal distributiona
P_X = inference_by_enumeration(XYZ, (X,), ())
P_Y = inference_by_enumeration(XYZ, (Y,), ())
P_Z = inference_by_enumeration(XYZ, (Z,), ())

# Joint distributions
P_XY = inference_by_enumeration(XYZ, (X, Y), ())
P_XZ = inference_by_enumeration(XYZ, (X, Z), ())
P_YZ = inference_by_enumeration(XYZ, (Y, Z), ())

# Product of marginals
PX_PY = P_X * P_Y
PX_PZ = P_X * P_Z
PY_PZ = P_Y * P_Z
PX_PY_PZ = P_X * P_Y * P_Z

print("X and Y are independent:", np.allclose(P_XY, PX_PY))
print("X and Z are independent:", np.allclose(P_XZ, PX_PZ))
print("Y and Z are independent:", np.allclose(P_YZ, PY_PZ))
print("X, Y, Z are mutually independent:", np.allclose(XYZ, PX_PY_PZ))

X and Y are independent: True
X and Z are independent: True
Y and Z are independent: True
X, Y, Z are mutually independent: False


In [5]:
assert XYZ.shape == (2, 2, 2), 'FJDT must have shape (2,2,2)'
assert np.isclose(XYZ.sum(), 1), 'Probabilites in FJDT must sum to one'
