# CS/DSC/AI 391L: Machine Learning
## Homework 5 - Theory
### Lecture: Prof. Qiang Liu

# 1. Gaussian Multivariate

Assume we have a multivariate normal random variable $X = [X_1, X_2, X_3, X_4]^T$, whose covariance matrix $\Sigma$ and inverse covariance matrix $Q$ are:

$$\Sigma = \begin{bmatrix} 
0.71 & -0.43 & 0.43 & 0 \\
-0.43 & 0.46 & -0.26 & 0 \\
0.43 & -0.26 & 0.46 & 0 \\
0 & 0 & 0 & 0.2
\end{bmatrix}, \quad
Q = \begin{bmatrix}
5 & 3 & -3 & 0 \\
3 & 5 & 0 & 0 \\
-3 & 0 & 5 & 0 \\
0 & 0 & 0 & 5
\end{bmatrix}$$

Note that $Q$ is simply the inverse of $\Sigma$, i.e., $Q = \Sigma^{-1}$.

(a) [5 points] Are $X_3$ and $X_4$ correlated?

(b) [5 points] Are $X_3$ and $X_4$ conditionally correlated given the other variables? That is, does $\text{cor}(X_3, X_4 | X_1, X_2)$ equal to zero?

(c) [5 points] Please find the Markov blanket of $X_3$. Recall that the Markov blanket of $X_i$ is the set of variables (denoted by $X_M$) such that

$$X_i \perp X_{-(i\cup M)} | X_M,$$

where $-(i\cup M)$ denotes all the variables outside of $(i)\cup M$.

(d) [5 points] Assume that $Y=[Y_1,Y_2]^T$ is defined by

$$Y_1 = X_1 + X_4$$
$$Y_2 = X_2 - X_4$$

Please calculate the covariance matrix of $Y$.


In [17]:
import numpy as np

# Create the covariance matrix Σ
Sigma = np.array([
    [0.71, -0.43, 0.43, 0],
    [-0.43, 0.46, -0.26, 0],
    [0.43, -0.26, 0.46, 0],
    [0, 0, 0, 0.2]
])
# Create the inverse covariance matrix Q
Q = np.array([
    [5, 3, -3, 0],
    [3, 5, 0, 0],
    [-3, 0, 5, 0],
    [0, 0, 0, 5]
])

In [18]:
# Check correlation between X3 and X4
correlation_X3_X4 = Sigma[2,3]  # Remember Python u
print(f"Correlation between X3 and X4: {correlation_X3_X4}")

# The answer is: No, X3 and X4 are not correlated because their covariance is 0

Correlation between X3 and X4: 0.0


A)

To determine if X₃ and X₄ are correlated, we need to look at their correlation coefficient in the covariance matrix Σ. In the covariance matrix, the element at position (3,4) or (4,3) represents the covariance between X₃ and X₄.
Looking at the matrix Σ:

The element Σ₃₄ (or Σ₄₃) = 0
Since the covariance is 0, this means X₃ and X₄ are uncorrelated.
Here's the code to verify this:

In [19]:
# Check conditional correlation between X3 and X4
conditional_correlation = Q[2,3]  # Python uses 0-based indexing
print(f"Conditional correlation between X3 and X4 in Q matrix: {conditional_correlation}")

# The answer is: Yes, X3 and X4 are conditionally independent given X1 and X2
# because their entry in the precision matrix Q is 0

Conditional correlation between X3 and X4 in Q matrix: 0


B) To determine if X₃ and X₄ are conditionally correlated given X₁ and X₂, we need to look at the inverse covariance matrix Q (also known as the precision matrix). In the precision matrix, a zero entry (i,j) indicates conditional independence between variables i and j given all other variables.

Let's check the element Q₃₄ (or Q₄₃):

Answer: Yes, X₃ and X₄ are conditionally independent given X₁ and X₂, because Q₃₄ = 0.
This means that cov(X₃, X₄ | X₁, X₂) = 0.
The intuition here is that:
While Q₃₄ = 0 in the precision matrix
This indicates that X₃ and X₄ are conditionally independent
In other words, if we know the values of X₁ and X₂, knowing X₃ provides no additional information about X₄ (and vice versa)

In [20]:
# Find Markov blanket of X2 by looking at non-zero entries in Q matrix
def find_markov_blanket(Q, variable_index):
    # Get the row corresponding to our variable (remember 0-based indexing)
    connections = Q[variable_index]
    # Find indices of non-zero elements, excluding the variable itself
    blanket = [i for i in range(len(connections)) 
              if connections[i] != 0 and i != variable_index]
    return blanket

markov_blanket = find_markov_blanket(Q, 1)  # 1 is index for X2
print(f"Markov blanket of X2 includes X{[i+1 for i in markov_blanket]}")

Markov blanket of X2 includes X[1]


C)

Answer: The Markov blanket of X₂ is {X₁}. We can see this because in the row corresponding to X₂ in the precision matrix Q, there is only one non-zero off-diagonal element, which corresponds to X₁ (Q₂₁ = 3).
This means that X₂ is conditionally independent of all other variables given X₁.

D)

Let's calculate this step by step:
For Var(Y₁) = Var(X₁ + X₄):
Var(Y₁) = Var(X₁) + Var(X₄) + 2Cov(X₁,X₄)
= 0.71 + 0.2 + 2(0) = 0.91

For Var(Y₂) = Var(X₂ - X₄):
Var(Y₂) = Var(X₂) + Var(X₄) - 2Cov(X₂,X₄)
= 0.46 + 0.2 + 2(0) = 0.66


For Cov(Y₁,Y₂) = Cov(X₁ + X₄, X₂ - X₄):
= Cov(X₁,X₂) - Cov(X₁,X₄) + Cov(X₄,X₂) - Cov(X₄,X₄)
= -0.43 - 0 + 0 - 0 = -0.43

Therefore, the covariance matrix of Y is:

In [21]:
import numpy as np

# Define the transformation matrix A
# Y1 = X1 + X4
# Y2 = X2 - X4
A = np.array([
    [1, 0, 0, 1],  # For Y1 = X1 + X4
    [0, 1, 0, -1]  # For Y2 = X2 - X4
])

# Calculate covariance matrix of Y
# Cov(Y) = A * Sigma * A^T
Y_cov = A @ Sigma @ A.T

print("Covariance matrix of Y:")
print(Y_cov)

# For better readability, round to 4 decimal places
print("\nCovariance matrix of Y (rounded):")
print(np.round(Y_cov, 4))

Covariance matrix of Y:
[[ 0.91 -0.63]
 [-0.63  0.66]]

Covariance matrix of Y (rounded):
[[ 0.91 -0.63]
 [-0.63  0.66]]


# 2. Expectation Maximization (EM)

Assume we have a dataset of two points {$x^{(1)}, x^{(2)}$}:

$x^{(1)} = -1$, $x^{(2)} = 1$

Assume $x^{(i)}$ is drawn i.i.d. from a simple mixture distribution of two Gaussian components:

$f(x|\mu_1, \mu_2) = \frac{1}{2}\phi(x|\mu_1, 1) + \frac{1}{2}\phi(x|\mu_2, 1)$

where $\phi(x|\mu_i, 1)$ denotes the probability density function of Gaussian distribution $N(\mu_i, 1)$ with mean $\mu_i$ and unit variance. We want to estimate the unknown parameters $\mu_1$ and $\mu_2$.

a) Assume we run EM starting from an initialization of $\mu_1 = -2$ and $\mu_2 = 2$. Please decide the value of $\mu_1$ and $\mu_2$ at the next iteration of EM algorithm. (You may find it handy to know that $1/(1 + e^{-4}) \approx 0.98$).

b) Do you think EM (when initialized with $\mu_1 = -2$ and $\mu_2 = 2$) will eventually converge to $\mu_1 = -1$ and $\mu_2 = 1$ (i.e., coinciding with the two data points)? Please justify your answer using either your theoretical understanding or the result of an empirical simulation.

c) Please decide the fixed point of EM when we initialize it from $\mu_1 = \mu_2 = 2$.

d) Please decide the fixed point of K-means when we initialize it from $\mu_1 = -2$ and $\mu_2 = 2$.
