# 2.3 Independent Variables and Random Samples

<a target="_blank" href="https://colab.research.google.com/github/SaajanM/mat422-homework/blob/main/2.3%20Independent%20Variables%20and%20Random%20Samples/independent_vars_random_samples.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

In [None]:
# Install a numpy package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install numpy
!{sys.executable} -m pip install matplotlib

In [2]:
# Import the numpy package
import numpy as np
import matplotlib
import matplotlib.pyplot as pyplt
from mpl_toolkits.mplot3d import Axes3D
import math

$\newcommand\norm[1]{\left\lVert#1\right\rVert}$

## Section 2.3.1 Joint Probability Distributions

Very rarely in data science do we only care about a single random variable. More often than not we would like to draw conclusions about multiple random variables in how they relate to each other. This is where joint probability distributions come into play.

### Section 2.3.1.1 Two Discrete Random Variables

We can naturally extend the probability mass function to two variables.

**Defintion:** Let $X,Y$ be two discrete random variables defined on the sample space $S$ of an experiment. We define the **joint probability mass function** $p(x,y)$ to be
$$
p(x,y) = P(X=x\text{ and } Y=y)
$$

This definition leads to the conclusion that all values of $p(x,y)$ are greater than zero and the sum of all values over the joint sample space is 1.

From the joint distribution, we can extract out information about the original variables using something called a **marginal distribution**. Namely the marginal probability mass function for $X$ is
$$
p_X(x) = \sum_{y:p(x,y)>0}p(x,y)
$$
and respectively for $Y$,
$$
p_Y(y) = \sum_{x:p(x,y)>0}p(x,y)
$$

### Section 2.3.1.2 Two Continuous Random Variables

Many of the concepts here extend to continuous space, but with integrals instead of summation. But we should define it.

**Definition:** A joint probability density function is any function $f(x,y)$ satisfying $f(x,y)\geq 0$ for all $x,y$ and $\int_{-\infty}^\infty\int_{-\infty}^\infty f(x,y) \text{d}x\text{d}y$

Then for any two dimensional area $A$, the probabilty of $X,Y$ falling in $A$ is given by
$$
P((X,Y)\in A) = \int\int_A f(x,y) \text{d}x\text{d}y
$$

Essentially we are finding the volume under the region $A$.

The marginal distribution almost is almost perfectly ported over from discrete land, just with integrals.

## Section 2.3.1.3 Independent Random Variables

This is very similar to the concept of independent random events, but with probability density/mass functions.

**Definition:** Two random variables $X,Y$ are said to be independent if for every pair of $x,y$ values
$$
\begin{align*}
\textbf{(discrete)} & & p(x,y) = p_X(x) \cdot p_Y(y)\\
& \text{or} & \\
\textbf{(continuous)} & & f(x,y) = f_X(x) \cdot f_Y(y)
\end{align*}
$$

The below code demonstrates two discrete random variables that are independent.


In [71]:
d20_set_one = np.random.choice(20, 10000)
d20_set_two = np.random.choice(20, 10000)
data = np.hstack((np.atleast_2d(d20_set_one).T, np.atleast_2d(d20_set_two).T))


def joint(x, y):
    mask_one = data[:, 0] == x
    masked_data = data[mask_one]
    mask_two = masked_data[:, 1] == y
    res = masked_data[mask_two]
    return res.shape[0] / 100000


def marginal_x(x):
    res = 0
    for y in range(20):
        res += joint(x, y)
    return res


def marginal_y(y):
    res = 0
    for x in range(20):
        res += joint(x, y)
    return res


independent = True

for x in range(20):
    for y in range(20):
        independent = independent and np.allclose(
            joint(x, y), marginal_x(x) * marginal_y(y), atol=0.001
        )

print("Independent?: {}".format(independent))

Independent?: True


## Section 2.3.2 Correlation and Dependence

Correlations are useful because they can indicate a predictive relationship
that can be exploited in practice. Covariance is a measure of the joint vari-
ability of two random variables.

### Section 2.3.2.1 Correlation for Random Variables

In the case of two variables $X,Y$ not independent, we often wish to see how much the two are related.

In the following section, we only deal with discrete random variables, but keep in mind that the majority of the concepts directly translate to continuous random variables via use of integrals over summation.

**Definition:** The **covariance**, or measure of correlation of $X,Y$ is
$$
\text{Cov}(X,Y) = E[(X-\mu_x)(Y-\mu_y)]
$$

Notice that $\text{Cov}(X,X)=V(X)$.

**Definition:** The **correlation coeficient** of $X$ and $Y$, denoted by $\text{Corr}(X,Y), $\rho_{X,Y}$, or just $\rho$, is defined by
$$
\rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X\sigma_Y}
$$

Note that if $X,Y$ are independent, then $\rho=0$ but the reverse is not true.
Also note that this is a linear correlation measure. Not much else.

$\rho$ is also bounded by $[-1,1]$ if $Y$ is a linear combination of $X$.

Below is code for calculating the correlation coefficient in `numpy`

In [85]:
x = np.linspace(1, 100, num=100)
y = x + 2 * np.random.rand(100)

print(
    "Corellation Coefficient (expected close to 1): {}".format(np.corrcoef(x, y)[0, 1])
)

Corellation Coefficient (expected close to 1): 0.9997595641567226


### Section 2.3.2.2 Correlation For Samples

When applied to samples, the correlation coefficient is often written as $r_{xy}$ or $r$.
For paired data $\{(x_1,y_1),\dots,(x_n,y_n)\}$ we find that $r_{xy} = \frac{s_{xy}}{s_x s_y}$ where the sample covariance
$$
s_xy = \frac{1}{n-1}\sum_{i=1}^n (x_i-\overline{x})(y_i-\overline{y})
$$
and the sample standard deviation
$$
s_x = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (x_i-\overline{x})^2}
$$

## Section 2.3.3.1 Random Samples

A **simple random sample** is a randomly selected subset of a population and often is used in practice.

This is because we can rarely obtain complete information about a population

**Definition:** The random variables $X_1,\dots, X_n$ are said to form a random sample of size $n$ if all $X_i$ are independent random variables and they all share the same probability distribution.

We also see that the sample random mean $E(\overline X)$ is equivalent to the population mean while the sample variance $V(\overline X)$ is the population variance divided by $n$.

### Section 2.3.3.2 Central Limit Theorem

Given the random variables $X_1,\dots, X_n$, the central limit theorem says that as $n$ increases $\overline X$ approaches a random normal distribution with mean equal to the sample mean and variance equal to the standard variance.

The approximation gets better as $n$ increases.