# Foundations of Stochastic

In this notebook, we will intensify our knowledge about the foundations of stochastic. 

At the start, we will introduce and analyze the properties of a binomial distribution.
Subsequently, we introduce and analyze normal distributions.
Finally, we will work with probability rules.

### **Table of Contents**
1. [Discrete Probabilities](#discrete-probabilities)
2. [Continuous Probabilities](#continuous-probabilities)
3. [Probability Rules](#probability-rules)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

from scipy import stats
from ipywidgets import interactive, FloatSlider, IntSlider

### **1. Discrete Probabilities** <a class="anchor" id="discrete-probabilities"></a>

A discrete random variable $X$ taking values in the set $X(\Omega) = \{0, \dots, n\}, n \in \mathbb{N}_{\geq 0}$ is said to be binomially distributed $X \sim \mathrm{Bin}(n, p)$ with $n$ as the number of trials and $p \in [0, 1]$ as the success probability, if the *probability mass function (PMF) or probability distribution* $P(X)$ can be denoted by 

BEGIN SOLUTION
$$
P(X=x) = \binom{n}{x} p^x(1-p)^{(n-x)}, x \in X(\Omega).
$$
END SOLUTION

#### **Questions:**
1. (a) How can we prove that the PMF $P(X)$ of a binomially distributed variable $X \sim \mathrm{Bin}(n,p)$ is normalized? 

    *Remark: One can use the binomial expansion known from elementary algebra.* 
    
    BEGIN SOLUTION

    According to the binomial expansion, we get
    $$
    \forall x, y \in \mathbb{R}, \forall z \in \mathbb{N}_{\geq 0}: (x+y)^z = \sum_{k=0}^{z} \binom{z}{k} x^k y^{z-k}.
    $$ 
    Defining $x=p$, $y=1-p$, and $z=n$, we can show the normalization:
    $$
    \sum_{k=0}^{n} P(X=k) = \sum_{k=0}^{n} \binom{n}{k} p^k(1-p)^{n-k} = (p + (1-p))^n = 1^n = 1.
    $$ 

    END SOLUTION
    
    (b) Define the probability space $(\Omega, \mathcal{A}, P)$ and a random variable $X$ modeling the number of heads when tossing a coin five times.
    
    BEGIN SOLUTION
    
    We define the sample space $\Omega = \{H, T\}^5$ with $H$ as head and $T$ as tail, the event space $\mathcal{A} = 2^\Omega$ as the power set of the sample space, and the probability measure through
    $$
    P(A) = \begin{cases} 0, \text{ if } A = \emptyset, \\ 0.5^5 \text{ if } |A| = 1 \\ \sum_{\omega \in A} P(\{\omega\}) \text{ else.} \end{cases}
    $$
    The random variable is binomially distributed, i.e., $X \sim \mathrm{Bin}(n=5, p=0.5)$ with 
    $$
    X(\omega_1, \dots, \omega_5) = \sum_{i=1}^5 \delta(\omega_i = H), 
    $$
    where $\delta$ is an indicator function.
    
    END SOLUTION
    
    (c) How can we derive the expected value $E(X)$ and the variance $V(X)$ of a binomially distributed variable $X \sim \mathrm{Bin}(n,p)$? 

    *Remark: A binomially distributed random variable can be represented through a sum of independent random variables following the same Bernoulli distribution, i.e.,* 
    $$X = \sum_{i=1}^n X_i \text{ with } \forall i \in \{1, \dots, n\}: X_i \sim \mathrm{Bern}(p).$$

    BEGIN SOLUTION

    According to Theorem 2.4 (Linearity of Expected Values), we can compute the expected value as:
    $$E(X) = E\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n E\left(X_i\right) = \sum_{i=1}^n p = np,$$
    where we use $E(X_i) = p$ as the expected value of a Bernoulli distributed random variable.

    According to Theorem 2.9 (Properties of Statistical Variance), we know the variances of statistically independent random variables are additive. Hence, we can compute the variance as:
    $$V(X) = V\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n V\left(X_i\right) = \sum_{i=1}^n p(1-p) = np(1-p),$$
    where we use $V(X_i) = p(1-p)$ as the variance of a Bernoulli distributed random variable.

    END SOLUTION  

    (d) How can we estimate the expected value and variance of the Binomial distribution, when we have made the observations $x_1, \dots, x_N \in \{0, \dots, n\}$, $N \in \mathbb{N}_{>0}$?

    BEGIN SOLUTION

    We can use Definition 2.13 (Empirical Mean) to estimate the expected value as:
    $$\overline{x} = \frac{1}{N} \sum_{i=1}^N x_i$$
    and the Definition 2.14 (Empirical Covariance) to estimate the variance as:
    $$\overline{\sigma}^2 = \frac{1}{N-1} \sum_{i=1}^N (\overline{x}-x_i)^2.$$

    END SOLUTION

In the following, we use the [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) package to plot the binomial distribution for different values of $n$ and $p$.

In [None]:
def visualize_binomial_distribution(n, p, N):
    """
    Visualizes the binomial distribution for varying parameters and
    indicates summary statistics, i.e., (empirical) mean and (empirical variance).
    
    Parameters
    ----------
    n : int
        Positive number of trials within one binomial experiment.
    p : float in [0, 1]
        Success probability.
    N : int
        Positive number of repeated binomial experiments.
    """
    # Compute the expected value `mean` and the variance `var`.
    # BEGIN SOLUTION
    mean = n * p
    var = n * p * (1-p)
    # END SOLUTION
    
    # Draw N observations `x_sampled` from the PMF P(X) with X ~ Bin(n, p).
    x_sampled = stats.binom(n, p).rvs(N) # <-- SOLUTION
    
    # Estimate the expected value `mean_est` and variance `var_est` using the observations.
    # BEGIN SOLUTION
    mean_est = x_sampled.sum() / N
    var_est = ((x_sampled - mean_est)**2).sum() / (N-1)
    # END SOLUTION
    
    # Create an array `x` containing the numbers {0, ..., n}.
    x = np.arange(0, n+1) # <-- SOLUTION
    
    # Compute P(X=x) as `p_x` with x in {0, ..., n} for X ~ Bin(n, p)
    p_x = stats.binom(n, p).pmf(x) # <-- SOLUTION
    
    # Plot results.
    plt.bar(x, p_x, label=f'PMF')
    plt.xlabel('$x$')
    plt.ylabel('$P(X=x)$')
    plt.title(
        "$X \sim \mathrm{Bin}(" 
        + str(n) + "," 
        + str(p) 
        + ")$ with \n$E(X) =$" 
        + str(np.round(mean, 2)) 
        + ", $\overline{x} =$" 
        + str(np.round(mean_est, 2)) 
        + ", \n$V(X) = $" 
        + str(np.round(var, 2))
        + ", $\overline{\sigma}^2 =$" 
        + str(np.round(var_est, 2)) 
    )
    plt.legend()
    plt.show()
    
interactive(
    visualize_binomial_distribution, 
    n=IntSlider(value=10, min=1, max=100),
    p=FloatSlider(value=0.5, min=0.0, max=1.0),
    N=IntSlider(value=10, min=2, max=1000)
)

#### **Questions:**
1. (c) How does the sample size $N$ affect the estimates of the empirical mean and variance?
   
    BEGIN SOLUTION
    
    Increasing the sample size $N$ leads to more accurate empirical estimates, i.e., they get closer to the true
    statistics.
    
    END SOLUTION  


### **2. Continuous Probabilities** <a class="anchor" id="continuous-probabilities"></a>

A continuous random variable $X$ taking any value in the set $X(\Omega) = \mathbb{R}$ is said to be rectangularly (uniformly) distributed $X \sim \mathrm{Rect}(a, b)$ with $a, b \in \mathbb{R}, a < b$ as its parameters, if the *probability density function (PDF)* $f(X)$ can be denoted by 

BEGIN SOLUTION

$$
f(X=x) = \begin{cases} \frac{1}{b-a}, \text{ if } x \in [a, b] \\ 0, \text{ else.} \end{cases}
$$

END SOLUTION

#### **Questions:**
2. (a) How can we show that the PDF of the rectangularly distributed random variable $X \sim \mathrm{Rect}(a,b)$ is a valid PDF?
   
    BEGIN SOLUTION
    
    We need to show the properties (1-3) given in Definition 2.8.
    
    (1) The integral of $f$ exists, since $f$ is continuous in $\mathbb{R}$ except for the two jump points at $X=a$ and $X=b$.
    
    (2) The density is always non-negative, since $f(X=x) = 1 \geq 0\, \forall x \in [a, b]$ and $f(X=x) = 0 \geq 0\, \forall x \in (-\infty, a) \cup (b, \infty)$.
    
    (3) The function $f$ is normalized according to
    $$
    \int_{-\infty}^{\infty} f(X=x)\mathrm{d}x = \int_{a}^{b} \frac{1}{b-a}\mathrm{d}x = \left[\frac{x}{b-a}\right]^b_a = \frac{b-a}{b-a} = 1.
    $$
    
    END SOLUTION 

**Definition 2.17** <font color='red'>**Multivariate Normal Distribution**</font> 

A multivariate continuous random variable $\mathbf{X} = (X_1, \dots, X_D)^\mathrm{T}, D \in \mathbb{N}_{>0}$ follows a *multivariate normal distribution* with the mean $\boldsymbol{\mu} \in \mathbb{R}^D$ and the symmetric, positive-definite covariance matrix $\boldsymbol{\Sigma} \in \mathbb{R}^{D \times D}$
if the PDF is defined through
$$
f(\mathbf{X}=\mathbf{x}) = \frac{1}{(2\pi)^{\frac{D}{2}}} \cdot \frac{1}{|\boldsymbol{\Sigma}|^{\frac{1}{2}}} \cdot \exp\left(-\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^\mathrm{T} \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})\right),
$$
where $|\boldsymbol{\Sigma}|$ denotes the determinant of the covariance matrix and $\boldsymbol{\Sigma}^{-1}$ its inverse.

**Remarks:**
- The normal distribution, also known as the Gaussian distribution, is one of the most important probability distributions in stochastic. It is used in a wide range of fields to model the distribution of random variables that arise in nature, such as the heights of people, the weights of objects, and the errors in measurements.
- We denote $\mathbf{X} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ to indicate that a random variable follows a multivariate normal distribution.
- The inverse of the covariance matrix, i.e., $\boldsymbol{\Sigma}^{-1}$, is also named precision matrix.

#### **Questions:**
2. (b) Which form does the PDF of a univariate normal distribution ($D=1$) take?
   
    BEGIN SOLUTION
    
    For $D=1$, the covariance matrix $\boldsymbol{\Sigma}$ reduces to the variance $\sigma^2$. Accordingly, we obtain:
    $$
    f(X=x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right).
    $$
    
    END SOLUTION  

In [None]:
def visualize_normal_distribution(mu, sigma, N):
    """
    Visualizes the univariate normal distribution for varying parameters and
    indicates summary statistics, i.e., (empirical) mean and (empirical) variance.
    
    Parameters
    ----------
    mu : float
        Mean of the normal distribution.
    sigma : float
        Standard deviation of the normal distribution.
    N : int
        Positive number of repeated binomial experiments.
    """
    # Draw N observations `x_sampled` from the pdf f(X) with X ~ N(mu, sigma**2).
    x_sampled = stats.norm(mu, sigma).rvs(N) # <-- SOLUTION
    
    # Estimate the expected value `mean_est` and variance `var_est` using the observations.
    # BEGIN SOLUTION
    mean_est = x_sampled.sum() / N
    var_est = ((x_sampled - mean_est)**2).sum() / (N-1)
    # END SOLUTION
    
    # Create an array `x` of 1000 linearly distributed values in the range [-5, 5].
    x = np.linspace(-5, 5, 1000) # <-- SOLUTION
    
    # Compute the density f(X=x) as `f_x` for all values in `x`.
    f_x = stats.norm(mu, sigma).pdf(x)

    # Plot the f(x) for mu and sigma over all values in `x`.
    plt.plot(x, f_x, label=f'PDF')
    plt.xlabel('$x$')
    plt.ylabel('$f(X=x)$')
    plt.title(
        "$X \sim \mathcal{N}(" 
        + str(mu) + "," 
        + str(np.round(sigma**2, 2)) 
        + ")$ with \n$\overline{x} =$" 
        + str(np.round(mean_est, 2)) 
        + ", \n$\overline{\sigma}^2 =$" 
        + str(np.round(var_est, 2)) 
    )
    plt.legend()
    plt.show()
    # END SOLUTION

interactive(
    visualize_normal_distribution, 
    mu=FloatSlider(value=0, min=-2, max=2),
    sigma=FloatSlider(value=1, min=0.1, max=2),
    N=IntSlider(value=10, min=2, max=1000)
)

#### **Questions:**
2. (c) How do the parameters $\mu$ and $\sigma^2$ affect the shape of the PDF of the normal distribution?

   BEGIN SOLUTION
   
   Changing the mean $\mu$ shifts the mode ($x$-value with maximum density) of the PDF on the x-axis. For a higher variance $\sigma^2$, the PDF will be broader and shorter. However, if the variance $\sigma^2$ is small, the curve will be narrow and tall near to the mean $\mu$.
   
   END SOLUTION

Even for the univariate case $D=1$, the CDF of the normal distribution cannot be expressed in terms of elementary functions. However, many numerical approximations are known. A typical approach is to transform any univariate normal distribution into a standard normal distribution.

**Definition 2.18** <font color='red'>**Standard Normal Distribution**</font> 

The univariate normal distribution $\mathcal{N}(0, 1)$ is called *standard normal distribution* and its CDF is denoted as $\Phi: \mathbb{R} \rightarrow [0, 1]$.

For computing the probabilities of a random variable $X \sim \mathcal{N}(0, 1)$, there are [lookup tables](https://en.wikipedia.org/wiki/Standard_normal_table) and any random variable following a univariate normal distribution can be transformed to follow the standard normal distribution.

**Theorem 2.10** <font color='red'>**Transformation to a Standard Normal Distribution**</font> 

Let $X \sim \mathcal{N}(\mu, \sigma^2)$ be a random variable following the univariate normal distribution $\mathcal{N}(0, 1)$. Then, we get:
$$
F(X=x) = \Phi\left(\frac{x-\mu}{\sigma}\right).
$$

**Remark**: The standard normal distribution is symmetric such that $\forall x \in \mathbb{R}: \Phi(-x) = 1 - \Phi(x)$. 

#### **Questions:**

2. (d) Does it hold that for a random variable $X$ with $E(X)=\mu$ and $V(X)=\sigma^2$, we get $E(Z)=0$ and $V(Z)=1$ for $Z=\frac{(X-\mu)}{\sigma}$? Prove your answer.

   BEGIN SOLUTION
   
   As the following proof shows, the answer is *yes*.
   
   According to Theorem 2.4 (Linearity of Expected Values), we get
   $$
   E(Z) = E\left(\frac{X-\mu}{\sigma}\right) = E\left(\frac{1}{\sigma} \cdot X - \frac{\mu}{\sigma}\right) = \frac{1}{\sigma}E\left(X\right) - \frac{\mu}{\sigma} = \frac{\mu}{\sigma} - \frac{\mu}{\sigma} = 0.
   $$
   
   According to Theorem 2.5 (Properties of Variance), we get
   $$
   V(Z) = V\left(\frac{X-\mu}{\sigma}\right) = V\left(\frac{1}{\sigma} \cdot X - \frac{\mu}{\sigma}\right) = \frac{1}{\sigma^2}V\left(X\right) = \frac{\sigma^2}{\sigma^2} = 1.
   $$
   
   END SOLUTION
   
   (e) What is the probability of $1 \leq X < 6$ for $X \sim \mathcal{N}(2, 4)$? Answer this question by using the standard normal distribution with a [lookup table](https://en.wikipedia.org/wiki/Standard_normal_table).
   
   BEGIN SOLUTION
   
   According to Theorem 2.2 (Probabilities of Arbitrary Intervals) and Theorem 2.10 (Transformation to Standard Normal distribution), we can compute
   $$
   P(1 \leq X < 6) = F(X=6) - F(X=1) = \Phi\left(\frac{6-2}{2}\right) - \Phi\left(\frac{1-2}{2}\right) \\ 
   = \Phi\left(2\right) - \Phi\left(-0.5\right) = \Phi\left(2\right) - (1-\Phi\left(0.5\right)) \approx 0.97725 - (1 - 0.69146) = 0.66871.
   $$
   
   END SOLUTION
   
In the following, we use the [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) package to verify the result in question 2(d).

In [None]:
# Compute the probability P(1 <= X < 6) for X ~ N(2, 4).
p_6 = stats.norm(2, np.sqrt(4)).cdf(6)
p_1 = stats.norm(2, np.sqrt(4)).cdf(1)
p = p_6 - p_1
print(f"P(1 <= X < 6) = {p_6} - {p_1} = {p}")

One of the main reasons the normal distribution is so widely used is due to the central limit theorem.

**Theorem 2.11** <font color='red'>**Central Limit Theorem**</font> 

Let $X_1, X_2, \dots $ be a sequence of i.i.d. random variables. Further, assume that the expected value $E(X_1) = \mu$ and the variance $\sigma^2 = V(X_1)$ exist. Then, the random variable $S_N = X_1 + \dots + X_N, N \in \mathbb{N}_{>0}$ has an expected value of $E(S_N) = N\mu$ and a variance of $V(S_N) = N\sigma^2$. If one forms from it the standardized random variable
$$
Z_N = \frac{S_N - N\mu}{\sigma\sqrt{N}},
$$
then the central limit theorem states that the CDF of $Z_N$ for $N \rightarrow \infty$ pointwisely converges to the CDF $\Phi$ of the standard normal distribution $\mathcal{N}(0,1)$:
$$
\lim_{N \rightarrow \infty} F(Z_n = z) = \Phi(z).
$$

**Remarks:** 
- Intuitively, the theorem states that, under certain conditions, the sum of numerous i.i.d. random variables tends towards a normal distribution. This makes the normal distribution a natural choice for modeling a wide range of phenomena that arise in nature.
- The earliest version of this theorem, that the normal distribution may be used as an approximation to the binomial distribution, is the de Moivre–Laplace theorem.

**Theorem 2.12** <font color='red'>**De Moivre-Laplace Theorem**</font> 

Let $X \sim \mathrm{Bin}(n, p)$ be a random variable with the expected value $E(X)=\mu$ and the variance $V(X) = \sigma$. Then, for a sufficiently large $n$ we can make the following approximation:
$$
F(X=x) = P(X \leq x) \approx \Phi\left(\frac{x-\mu}{\sigma}\right).
$$

**Remark**: The following condition can serve as a rule of thumb for the application of the de Moivre-Laplace theorem
$$
V(X) = n \cdot p \cdot (1-p) > 9.
$$

In the following, we use the [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) package to compare the actual CDF of the binomial distribution with the one approximated via the de Moive-Laplace theorem.

In [None]:
def visualize_de_moive_laplace_theorem(n, p):
    """
    Compares the CDF of the binomial distribution with the
    approximation using the de Moivre-Laplace theorem.
    
    Parameters
    ----------
    n : int
        Positive number of trials within one binomial experiment.
    p : float in [0, 1]
        Success probability.
    """
    # Compute the expected value `mean` and the variance `var`.
    # BEGIN SOLUTION
    mean = n * p
    var = n*p*(1-p)
    # END SOLUTION
    
    # Create `x_bin` array containing the numbers {0, ..., n}.
    x_bin = np.arange(0, n+1) # <-- SOLUTION
     
    # Compute F(X=x) as `f_x_bin` with x in {0, ..., n} for X ~ Bin(n, p)
    f_x_bin = stats.binom(n, p).cdf(x_bin) # <-- SOLUTION
    
    # Create an array `x_norm` containing 10*n values evenly distributed across the interval [0, n].
    x_norm = np.linspace(0, n, 10*n) # <-- SOLUTION
    
    # Use the de Moivre-Laplace theorem to approximate `f_x_bin` via `f_x_norm`.
    f_x_norm = stats.norm(0, 1).cdf((x_norm-mean)/np.sqrt(var)) # <-- SOLUTION
    
    # Plot results.
    plt.bar(x_bin, f_x_bin, label=f'CDF of binomial distribution', color="blue", alpha=0.5)
    plt.plot(x_norm, f_x_norm, label=f'CDF of normal distribution', color="red")
    plt.xlabel('$x$')
    plt.ylabel('$F(X=x)$')
    plt.title(
        "$X \sim \mathrm{Bin}(" 
        + str(n) + "," 
        + str(p) 
        + ")$ with \n$E(X) =$" 
        + str(np.round(mean, 2)) 
        + ", \n$V(X) = $" 
        + str(np.round(var, 2))
    )
    plt.legend()
    plt.show()
    
interactive(
    visualize_de_moive_laplace_theorem, 
    n=IntSlider(value=10, min=1, max=100),
    p=FloatSlider(value=0.5, min=0.0, max=1.0),
    N=IntSlider(value=10, min=2, max=1000)
)

For the multivariate normal distribution, the covariance matrix $\boldsymbol{\sigma}$ plays a critical role regarding the distribution's shape. For a better understanding, we use the [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) package to study the influence of this matrix in the bivariate ($D=2$) case.

In [None]:
def visualize_bivariate_normal_distribution(mean_x, mean_y, var_x, var_y, cov_xy, N):
    """
    Visualizes the bivariate normal distribution for varying parameters.
    
    Parameters
    ----------
    mean_x : float
        Mean of the normal distribution in the x-dimension.
    mean_y : float
        Mean of the normal distribution in the y-dimension
    var_x : float
        Variance of the normal distribution in the x-dimension.
    var_y : float
        Variance of the normal distribution in the y-dimension.
    cov_xy : float
        Covariance of the normal distribution between the x- and y-dimension.
    N : int
        Positive number of observations to be drawn from the normal distribution.
    """
    # Create the `mu` vector as numpy.ndarray.
    mu = np.array([mean_x, mean_y]) # <-- SOLUTION
    
    # Create the `Sigma` matrix as numpy.ndarray.
    Sigma = np.array([[var_x, cov_xy], [cov_xy, var_y]]) # <-- SOLUTION
    
    # Print error message if `Sigma` is not positiv definite.
    # BEGIN SOLUTION
    if np.any(np.linalg.eigvals(Sigma) < 0):
        print("Sigma is not positive definite.")
        return
    # END SOLUTION
    
    # Draw N observations `X_sampled` from the PDF f(X) with X ~ Norm(mu, Sigma).
    X_sampled = stats.multivariate_normal(mu, Sigma).rvs(N) # <-- SOLUTION
    
    # Plot sampled observations.
    plt.scatter(X_sampled[:, 0], X_sampled[:, 1], label="sampled observations")
    plt.xlim([-10, 10])
    plt.ylim([-10, 10])
    plt.xlabel('$x$')
    plt.ylabel('$y$')
    plt.title(
        "$X \sim \mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma})$"
    )
    plt.legend()
    plt.show()

interactive(
    visualize_bivariate_normal_distribution, 
    mean_x=FloatSlider(value=0, min=-4, max=4),
    mean_y=FloatSlider(value=0, min=-4, max=4),
    var_x=FloatSlider(value=1, min=0.1, max=4),
    var_y=FloatSlider(value=1, min=0.1, max=4),
    cov_xy=FloatSlider(value=0, min=-4, max=4),
    N=IntSlider(value=1000, min=2, max=10000)
)

#### **Questions:**
2. (f) How do the elements of the covariance matrix $\boldsymbol{\Sigma}$ affect the shape of the PDF of the normal distribution?

   BEGIN SOLUTION
   
   The entries on the diagonal corresponding to the variances in the $x$- and $y$-dimension quantify the spread accross these dimensions. The entries off the diagonal corresponding to the covariance quantify the linear relationship between both dimensions.
   
   END SOLUTION
   
### **3. Probability Rules** <a class="anchor" id="probability-rules"></a>
Consider the following bivariate PMF $P(X, Y)$ of two discrete random variables $X$ and $Y$ with $X(\Omega) = \{x_1, x_2, x_3, x_4\}$ and $Y(\Omega) = \{y_1, y_2, y_3\}$:

| $P(X=x_i, Y=y_i)$      | $x_1$ | $x_2$ | $x_3$ | $x_4$ |
|------------------------|-------|-------|-------|-------|
| $y_1$                  | 0.01  | 0.2   | 0.1   | 0.1   |
| $y_2$                  | 0.05  | 0.05  | 0.07  | 0.2   |
| $y_3$                  | 0.1   | 0.03  | 0.05  | 0.04  |

#### **Questions:**
3. (a) How can we compute the marginal PMFs $P(X)$ and $P(Y)$?
    
    BEGIN SOLUTION
    
    We can use the Theorem 2.6 (Sum Rule) to specify both marginal PMFs:
    
    |            | $x_1$ | $x_2$ | $x_3$ | $x_4$     |
    |------------|-------|-------|-------|-----------|
    | $P(X=x_i)$ | 0.16  | 0.28  | 0.22  | 0.34  |
    
    |            | $y_1$ | $y_2$ | $y_3$     |
    |------------|-------|-------|-----------|
    | $P(Y=y_i)$ | 0.41  | 0.37  | 0.22      |
        
    END SOLUTION
    
   (b) How can we compute the conditional PMFs $P(X \mid Y=y_1)$ and $P(Y \mid X=x_3)$?
    
    BEGIN SOLUTION
    
    We can use the Theorem 2.8 (Bayes' Theorem) to specify both conditional PMFs (both tables contain approximated values):
    
    |                       | $x_1$ | $x_2$ | $x_3$ | $x_4$  |
    |-----------------------|-------|-------|-------|--------|
    | $P(X=x_i \mid Y=y_1)$ | 0.03  | 0.49  | 0.24  | 0.24   |
    
    |                       | $y_1$ | $y_2$ | $y_3$ |
    |-----------------------|-------|-------|-------|
    | $P(Y=y_i \mid X=x_3)$ | 0.45  | 0.32  | 0.23  |
    
    END SOLUTION
    
     (b) Are the random variables $X$ and $Y$ statistically independent?
    
    BEGIN SOLUTION
    
    According to Definition 2.15 (Statistical Independence), they are not independent because of 
    $$
    P(X=x_1, Y=y_1) = 0.01 \neq 0.0656 = P(X=x_1) \cdot P(Y=y_1).
    $$
    
    END SOLUTION