# ⚠️ EDIT "OPEN IN COLAB" BADGE PRIOR TO DOING ASSIGNMENT

<a target="_blank" href="https://colab.research.google.com/github/BenjaminHerrera/MAT422/blob/main/HW_2.3.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# HW 2.3
# Benjamin Herrera
# 29 SEP 2024

# ⚠️ Run these commands prior to running anything

In [1]:
!pip install scipy
!pip install matplotlib
!pip install numpy
!pip install pandas



## 💪 Joint Probability Distributions

A joint probability is the likelihood of 2+ unique, different, or the same events happening at the same time. This feature shows a distribution for each random variable occurring. For example, we flip multiple coins and we have a random variable $X$ that defines the number of times the dice has $3$ in two rolls. And a random variable $Y$ that represents the value of two rolls added up. A join probability distribution would combine the two distributions together. 

To show the probability mass function for the given example, let's define the intake of two arguments:

$$p(x, y) = P(X=x \textrm{ and } Y = y)$$

where $x$ and $y$ are possible values of $X$ and $Y$

We can isolate the probability mass function for a given variable. For example if we just want to focus on $x$, we can define the PMF as:

$$p_x(x) = \sum_{y \isin Y:p(x, y) > 0} p(x, y)$$

for y:

$$p_y(y) = \sum_{x \isin X:p(x, y) > 0} p(x, y)$$

Here's a code example of the PMF for discrete joint probability distribution:

In [4]:
# Import libraries
import numpy as np
import pandas as pd

# Define X and Y
data = pd.DataFrame({
    'X': [1, 2],
    'Y': [1, 2]
})

# Define what the PMF matrix
joint_freq = pd.crosstab(data['X'], data['Y'])

# Convert frequency to joint probability by dividing by total number of samples
joint_pmf = joint_freq / joint_freq.sum().sum()

# Show the PMF matrix of the two variables
print("Joint PMF Matrix:")
print(joint_pmf)


Joint PMF Matrix:
Y    1    2
X          
1  0.5  0.0
2  0.0  0.5


Here's a code example for the marginal distribution of X

In [5]:
# Calculate the marginal distribution of X
marginal_p_x = joint_pmf.sum(axis=1)

# Print the marginal distribution of X
x_values = [1, 2]
marginal_distribution_x = dict(zip(x_values, marginal_p_x))
print("Marginal Distribution of X:")
for x, prob in marginal_distribution_x.items():
    print(f"Px({x}) = {prob:.4f}")

Marginal Distribution of X:
Px(1) = 0.5000
Px(2) = 0.5000


For continuous random variables, we can utilize integrals nested inside one another for every random variable we use. This is then called a join density function. To define this, we describe $A = \{(x, y) : a_x \leq x \leq b_x, a_y \leq y \leq b_y\}$

$$P((X, Y) \isin A) = \int\int_A f(x,y) dx dy = \int_{a_x}^{b_x}\int_{a_y}^{b_y} f(x, y) dy dx$$

This creates a more smoother distribution over two random variables. The above example works for two variables, but anymore variables included, then more integrals need to be placed. The ranges for these bounds are from negative infinity to infinity, usually.

What if the occurrences of two (or more) variables are independent? In other words, what if $X$ didn't depend on $Y$ to occur? We can easily still show the probability functions for two discrete variables as:

$$p_{X, Y}(x, y) = p_x(x) \cdot p_y(y)$$

and for continuous variables:

$$f_{X, Y}(x, y) = f_x(x) \cdot f_y(y)$$

In [17]:
# Using the example from the coins, if the occurrences are independent, then
# P_{X, Y}(1, 1) is:
print(joint_pmf.iloc[1][2] * joint_pmf.iloc[0][1])

0.25


## 🤵 Correlation and Independence

If two events occur in correlation with each other, we can use joint distributions to predict future occurrences, called a covariance. This is defined via this equation:

$$Cov(X, Y) = \mathop{\mathbb{E}}[(X - \mu_x) (Y - \mu_y)]$$

For discrete events, we can further specify it to:

$$\sum_x \sum_y (x - \mu_x)(y-\mu_y)p(x, y)$$

For continuous ones, we can define:

$$\int_{- \inf}^{inf} \int_{- \inf}^{inf} (x-\mu_x)(y-\mu_y)f(x,y) dx dy$$

With this, we can define a correlation coefficient between X and Y via the following definition

$$Corr(X, Y) = \frac{Cov(X, Y)}{\sigma_x \cdot \sigma_y}$$

When the value of Corr is 0, X and Y are independent but doesn't deduct to it. If the value falls [-1, 1], excluding 0, then there is a linear function $y = ax + b$ for some scalar $a \neq 0$ and $b$.

When we apply this correlation coefficient to a sample, we can represent its correlation coefficient (Pearson CC) as:

$$PCC_{xy} = \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2} \sqrt{\sum (y_i - \bar{y})^2}}$$

Variables with a bar on top of them are the means for that event sample.

## 🎲 Random Samples

Unlike random variables, a random sample is a selection of an event (set of actions) over all of the possible set of actions. Thus, this set of random variables must satisfy two conditions. The first is that all of the random variables are independent. The second is that all of them have the same probability distribution. This also means that the mean and the variance of the random samples are the same as the population. 

The above notions plays in well with Central Limit Theorem. Imagine this, we try to figure out the nation's stance on voting Kamala Harris and Donald Trump for President. Now we can't go about and knock on every door, asking what their stance is. That type of sampling would take forever! So, we sample $n$ number of people for a city and see what they say [1]. Because of the facts of the random sample, the sampled statistics should correlate to the population as a whole.

[1] We assume that the stratification sampling strategy equalizes all aspects of the demographics in that city.