# ⚠️ EDIT "OPEN IN COLAB" BADGE PRIOR TO DOING ASSIGNMENT

<a target="_blank" href="https://colab.research.google.com/github/BenjaminHerrera/MAT422/blob/main/HW_2.2.ipynb">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# HW 2.2
# Benjamin Herrera
# 22 SEP 2024

# ⚠️ Run these commands prior to running anything

In [2]:
!pip install scipy
!pip install matplotlib
!pip install numpy



## 🪓 Probability Axioms

Before moving further, let's discuss about **sample spaces** and **events**. One can think of a sample space as the set of all possible outcomes. This is like the $span(V)$ of some vector space $V$. Another example would be the possible selections of a single card from a deck of cards. An event on the other hand is the selection(s) of things from the set of all possible sets. In other words, it is the occurrence of something happening in the total possible space.

To define the probability distribution of an event $A$ from occurring from the sample space $S$, we denote that as $P(A)$. Here are some properties of this notion based on Definition 2.2.3 in the book:

* $1 \geq P(A) \geq 0$
* $P(S) = 1$
* $P(A_1 \cup A_2 \cup \dots A_n) = \sum_{i=1}^{\inf} P(A_i)$
* If $P(A) = 1 - P(A')$, then $P(A) + P(A') = 1$
* $P(A \cup B) = P(A) + P(B) = P(A) + P(B) - P(A \cap B)$

Let's get an idea of this notion:

In [3]:
# Define the sample set and four events. Each event represents the probability 
#    of a player 1 to 4 wins a grand prix
S = {
    "A1": 0.2,
    "A2": 0.3,
    "A3": 0.15,
    "A4": 0.25
}

# Show that first point holds
for i in S:
    if S[i] > 1 or S[i] < 0:
        print("1st point fails!")
print("1st point passes!")

# Show that second and third point holds
total = 0
for i in S:
    total += S[i]
print("2nd and 3rd point passes!", f"P(S) = {total}")

# Show that the fourth point holds:
selected_event = "A1"
target = S[selected_event]
not_selected_event = 1 - target
print(f"4th point passes! P(A) + P(A') = {target + not_selected_event}")

# Show that the fifth point holds:
A = "A1"
B = "A2"
A_value = S[A]
B_value = S[B]
print(f"5th point passes! P(A) + P(B) = {A_value + B_value}")
print(f"                  P(A) + P(B) - P(A union B) = {A_value + B_value - 0}")

1st point passes!
2nd and 3rd point passes! P(S) = 0.9
4th point passes! P(A) + P(A') = 1.0
5th point passes! P(A) + P(B) = 0.5
                  P(A) + P(B) - P(A union B) = 0.5


## 🛂 Conditional Probability

Conditional probability is the idea of a probability for an event to happen given that another event occurs beforehand. Given an event A happening from event B that has occurred before A, we define this conditional probability as: 

$$P(A | B) = \frac{P(A \cap B)}{P(B)}$$

For example, let's say that Sal drank a beer is event A and Sal took an Uber is event B. We first find the probability of A and B happening. Of course that is going to happen because Sal is a good man and he know's that he can't drive to the bar or else he will drive drunk. Afterwards, we take the probability of B happening. Given the entire sample size $S$, $P(A \cap B) = 0.5$ and $P(B) = 0.7$. Therefore, $P(A | B) = 0.71$

Likewise, we can also isolate $P(A \cap B)$ via

$$P(A \cap B) = P(A | B) \cdot P(B)$$

## 🎲 Discrete Random Variables

A random variable is defined as the mapping of a sample space to the field of real numbers. For example:

In [5]:
# Import the random library
import random

# Let's say that each index of this below list is a unique ID of some 
#   members in the triple-comma club (billionaire's club). The value of each
#   index is the number of Brabus cars AND Rolls Royce cars they own. 
# There are roughly 3,194 billionaires in the world
S_to_R = [random.randint(0, 100) for _ in range(3_194)]

# Show the first five billionaire's luxury car count
print(S_to_R[:5])

[22, 5, 32, 48, 90]


Obviously, there's no actual billionaire's club (that we know of just yet), but you can see that at index 0, billionaire ID 0 has 22 Brabuses and Rolls ). We have mapped some random sample space to a real space. 

Now we come to the idea of a probability mass function (PMF). A PMF is a function that determines the probability of a discrete random variable mapping to some value real value. In other words and in an example, if we are looking at billionaire ID 0's number of luxury cars, it is 22. If we were to plot out the PMF for 22 luxury cars, the x-axis value of 1 would be 1.0 while everything is 0. This notation is as:

$$p(x) = P(X = x)$$

On the other hand, the cumulative distribution function is defined as the probability of all real space variables being mapped to it by some event via an equality. In other words: 

$$F(x) = P(X \leq x)$$

If we were to apply this idea to the triple-comma club example, we would see that the $F(X) = P(22 \leq 5)$ would be 1 since ID 0 has a value of 1.0. 

This also brings in the idea of expected value (mean). Like it states itself, the expected value is the summation of every real value ID multiplied by its probability. Bringing to our triple-comma club example again, the expected value of 22 is 1.0. 

## 🌊 Continuous Random Variables

The concepts and ideas of random variables also apply to continuous ones as well. However, their definitions are going to change a little. For starters, we can define the PDF of X random variable as:

$$P(a \leq X \leq b) = \int_a^b f(x)dx$$

For expected value, it is simply defined as:

$$E(X) = \int_{-\inf}^{\inf} x \cdot f(x) dx$$

Because now that we are working with continuous random variables, we can define a normal distribution. A normal distribution is a hill shaped distribution that closely represents real-world distributions that are unknown. A normal distribution has parameters of population mean $\mu$ and variance $\sigma^2$. When we sample from a normal distribution we denote this as:

$$X \sim N(\mu, \sigma^2)$$

Applications of this are useful for social sciences and machine learning. For example, most distributions of datasets, with regards to their diversity of classes, can be seen as a normal distribution if they were to be sample from real world sources. Data pre-processing steps, would likely try to even out this distribution to get a general performance across all classes. However, No-Free-Lunch applies in this case.