# Probability

Probability is a measure of how likely or unlikely something is to happen.

- It is a way of quantifying uncertainty about an event by assigning it a value ranging from 0 to 1. 
- If an outcome is impossible, its probability is 0. If an outcome is certain, its probability is 1. 
- The closer a probability is to 1, the more likely it is to happen.

### The Events we Measure

Before you check the probability, you create a list/set of all the possible outcomes. This list/set is called the **sample space**.

Events are the subsets of the sample space 'S'.

you can denote events with capital letters like A, B, C, D, etc.

For example: 

- If you roll a single die, the total possible outcomes is S = {1, 2, 3, 4, 5, 6}.
- The possible events of odd outcomes are A = {1, 3, 5}.
- The possible events of even outcomes are B = {2, 4, 6}.

**Types of events:**

- **Discrete Events:** Discrete events are those with a countable number of possible outcomes.
- **Continuous Events:** Continuous events are those that can take an infinite number of values (ex. formant values for various speech signals, heights of people at various ages, etc.).


**Manipulating Events:**

You can manipulate events by using the following: 
- Unions
- Intersections
- Complements


In [224]:
# When you do Union, you combine both sets into one:

A = {2, 4, 6}  # Set A
B = {1, 2, 3}  # Set B

# Union of A and B
AUB = A.union(B)
# or
AUB = A|B

AUB


{1, 2, 3, 4, 6}

In [226]:
# When you do intersection, you get the common outcomes between two sets,  


A = {2, 4, 6}  # Set A
B = {1, 2, 3}  # Set B

# Intersection of A and B
AIB = A.intersection(B)
# or
AIB = A&B

AIB


{2}

In [232]:
# When you get the complement of a set, you get the elements that are not present in that set. 
# Here 'U' is a universal set that A is being compared with to get the new values. 

U = {1, 2, 3, 4, 5, 6}  # Universal set
A = {2, 4, 6}  # Set A

# Complement of A with respect to U
A_comp = U.difference(A)
#or
A_comp = U^A

A_comp


{1, 3, 5}

# Types of Probabilities

- **Marginal Probability**: The probability of a single event happening, regardless of other events.  
  Example: P(A), the probability of rolling a 3 on a die, is  $\frac{1}{6}$.

- **Union Probability**: The probability that either of two events (or both) happen.  
  Example: P(AUB), rolling an even number or a number less than 4 on a die, is $\frac{5}{6}$.

- **Intersection (Joint) Probability**: The probability that two events happen simultaneously.  
  Example: P(A Int B), rolling a number that is both even and less than 4, is $\frac{1}{6}$.

- **Conditional Probability**: The probability of one event happening given that another event has already occurred.  
  Example: P(A | B), probability of rolling a 6 given that you rolled an even number, is $\frac{1}{3}$.

- **Probability of a Complement**: The probability that an event does **not** happen.  
  Example: P(A_comp), the probability of **not** rolling a 3 on a die, is $\frac{5}{6}$.




## Marginal Probability

Marginal probability refers to the probability of a single event occurring, without considering the influence of any other events.

here is the formula for discrete events: 


P (A) = $\frac {Num\_ of\_ Outcome\_ of\_ Interest (A)}{Num\_ of\_ All\_ Possible\_ Outcomes (B)}$


If you want to test the probability of 3 on a dice with 6 sides:

P(A is 3) = $\frac {1}{6} $ = 0.16


In [114]:
# Imagine you roll dice multiple times and you transform the result into a dataset of outcomes:

outcomes = [1, 2, 3, 4, 5, 6, 1, 3, 3, 4, 5, 2, 6, 1, 3]

                                                                        # Total number of outcomes
total_outcomes = len(outcomes)

                                                                        # Count occurrences of the event (rolling a 3)
count_rolling_3 = outcomes.count(3)

                                                                        # Marginal probability of rolling a 3
marginal_probability_of_3 = count_rolling_3 / total_outcomes

                                                                        # Display the result
print("Marginal Probability of rolling a 3:", marginal_probability_of_3)


Marginal Probability of rolling a 3: 0.26666666666666666


## Union Probability:
We perform the union of two sets to get the probability that either of two events (or both) happen.

Some also read AUB (A  union B) as 'A or B'. 

The formula of **Union** is:

                    P(A U B) = P(A) + P(B) - P(A Int B)

Suppose you have a set **A {2, 4, 6}** and **B {1, 2, 3}** out of a big dataset of **C = {1, 2, 3, 4, 5, 6}**. 

                    P(A U B) = P{2, 4, 6} + P{1, 2, 3} - {2}
                             = P{1, 2, 2, 3, 4, 6} - {2}
                             = P{1, 2, 3, 4, 6} 

Now, lets apply the formula of **probability** to it: 

P (AUB) = $\frac {Num\_ of\_ Outcome\_ of\_ Interest (AUB)}{Num\_ of\_ All\_ Possible\_ Outcomes (C)}$

P (AUB) = $\frac{5}{6}$ = 0.83

So, P(AUB) has 5 out of 6 outcomes which can be written as $\frac{5}{6}$, giving us a probability of 0.83



In [220]:
# Our Example Sets or Events:

A = {2, 4, 6}
B = {1, 2, 3}
C = {1, 2, 3, 4, 5, 6}


                                                # Calculate the union
AUB = A.union(B)
# or
AUB = A|B
                                                # Calculate the probability
P_AUB = (len(AUB))/(len(C))
                                                # Display the result
print("Probability of A union B:", P_AUB)


Probability of A union B: 0.8333333333333334


## Joint (Intersection) Probability

Joint probability refers to the probability of two (or more) events happening simultaneously. 
In this we perform intersection to see the common element in two sets/events. 
 
- If you have two events (A) and (B), the **joint probability** is denoted as P(A, B), and represents the probability that both events (A) and (B) occur at the same time.

The formula to check the joint probability of two independent events:

                    P(AintB) = P(A) * P(B)

For example, if you're rolling a die, the joint probability of getting an even number (A) and a number less than 4 (B) can be calculated by finding the overlap of these two events:

- A = {2, 4, 6}      (even numbers)
- B = {1, 2, 3}      (numbers less than 4)


The joint probability, P(A, B), would focus on the number 2, which satisfies both conditions, and its probability would be $ \frac{1}{6}$ (since it's one of six equally likely outcomes).



In [218]:
# Our Example Sets or Events:

A = {2, 4, 6}
B = {1, 2, 3}
C = {1, 2, 3, 4, 5, 6}


                                                # Calculate the intersection
AintB = A.intersection(B)
# or
AintB = A&B
                                                # Calculate the probability
P_AintB = (len(AintB))/(len(C))
                                                # Display the result
print("Probability of A union B:", P_AintB)


Probability of A union B: 0.16666666666666666


## Complement Probability

The complement of an event A is every item in the sample space, S, that isn't an A.

This kind of probability tells you how likely it is that the event will NOT happen. 

For example: Suppose you are rolling a dice where:

- A = {2,4}
- S = {1, 2, 3, 4, 5, 6}

The complement of A would be all the values that are not in A = A_Comp = {1, 3, 5, 6}

The complement probability of P(A_Comp) would be $ \frac{4}{6}$ which equals 0.66

In [234]:
# Our Example Sets or Events:

A = {2, 4}
S = {1, 2, 3, 4, 5, 6}


                                                # Calculate the complement
A_Comp = S.difference(A)
# or
A_Comp = S^A
                                                # Calculate the probability
P_AC = (len(A_Comp))/(len(S))
                                                # Display the result
print("Probability of A Complement:", P_AC)


Probability of A Complement: 0.6666666666666666


## Conditional Probabilities 

To calculate the conditional probability of a set, you take into account the prior information available in another set. 

It gives you the probability of one event, given that another event has already occured. 

You write it like this = P(A|B)
- The pipe symbol (|) is only used to separate the two events
- A is the event you want to find the probability of.
- B is the event that has already occured.

The formula for conditional probability is:

P (A|B) = $\frac { P(A\_int\_B)}{P (B)}$

You do (A int B) because you want the outcome from B that also appears in A. 
You divide it by B because that is now your sample space, because you know the item in question exists in B.


In [216]:
A = {2, 4, 6}
B = {1, 2, 3}

                                                        # Calculate the intersection with a function or a symbol. 
AintB = A.intersection(B)
#or
AintB = A&B
                                                        # Calculate the probability
P_A_given_B = (len(AintB))/(len(B))
                                                        # Display the result
print("The Conditional Probability of A:", P_A_given_B)

The Conditional Probability of A: 0.3333333333333333


# Fun Practice

- Let's the itertools.product function to generate all possible outcomes of rolling a dice twice.
- then calculate the joint probability of {2,5} and {5,2}.




In [282]:
# First we get the dataset of rolling the dice twice. We get all the multiples by using the product() function.


from itertools import product

                                                        # the range function has upto 7 values. It does not include 7.
roll_outcomes = range(1,7)
                                                        # we get multiples of the roll_outcomes in case we repeat it 2 times.
outcomes = product(roll_outcomes, repeat = 2)
                                                        # we get a list of outcomes. 
print(list(outcomes))



[(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]


In [403]:
# We have to write the previous part again in this cell because once it is printed it is gone. 
# This time we convert outcomes to a list so that we do not have to worry about losing its values again. 

roll_outcomes = range(1, 7)
outcomes = set(product(roll_outcomes, repeat=2))
                            

# This is how GUS did it. 
# Get the series of 1s generated by the for-loop each time a match is found.

ones = (1 for outcome in outcomes if outcome == (2, 5) or outcome == (5, 2))

#print (list(ones))    #remove this comment if you want to see the list.

# Sum all the 1s you got.

count = sum(ones)

print("Number of matching outcomes for (2, 5) or (5, 2):", count)

probability = count/len(outcomes)

print("The probability of 2 and 5 coming once after another is:", probability)

Number of matching outcomes for (2, 5) or (5, 2): 2
The probability of 2 and 5 coming once after another is: 0.05555555555555555


In [405]:
# the easy way of doing this is here:

A = {2,5}
B = {5,2}

print ("the union probability of A and B is", len(A|B)/ len(outcomes))
print ("the joint probability of A and B is", len(A&B)/ len(outcomes))
print ("the complement probability of A and B is", len(outcomes-(A|B))/ len(outcomes))



the union probability of A and B is 0.05555555555555555
the joint probability of A and B is 0.05555555555555555
the complement probability of A and B is 1.0


# Chain Rule

The chain rule in probability is a useful method for calculating the joint probability of a sequence of events by expressing it as a product of conditional probabilities.

The chain rule breaks down the computation of a joint probability into a sequence of simpler conditional probabilities. It’s especially useful when dealing with a sequence of dependent events. Each event depends on the previous ones.


For a set of events **( A_1, A_2, ..., A_n)**, the chain rule expresses the joint probability **P(A_1, A_2, ..., A_n)** as:

    P(A_1, A_2, ..., A_n) = P(A_1) * P(A_2 | A_1) * P(A_3 | A_1, A_2) * ... * P(A_n | A_1, A_2, ..., A_{n-1})


In simpler terms, this formula says that the joint probability of a sequence of events can be broken down as:

- The probability of the first event P(A_1),
- Times the probability of the second event given the first event P(A_2 | A_1),
- Times the probability of the third event given the first two events P(A_3 | A_1, A_2),
- And so on, for each subsequent event.

### For Two Events:
For just two events, (A) and (B), the chain rule simplifies to:

            P(A,B) = P(A) * P(B|A)

### For Three Events:
For three events, (A), (B), and (C), it becomes:

        P(A,B,C) = P(A) * P(B|A) * P(C|A,B)

This breakdown helps us calculate the joint probability by considering each event in relation to the previous ones.

### Example:
If we want to find the joint probability of three events (A), (B), and (C):

1. P(A) is the probability of (A) happening.
2. P(B | A) is the probability of (B) happening given that (A) has occurred.
3. P(C | A, B) is the probability of (C) happening given that both (A) and (B) have occurred.




In [440]:
# Define the individual probabilities

A = {1, 2, 3, 4, 5, 6}            # Event A (universe of possible outcomes for A)
B = {4, 5, 6, 7, 8}               # Event B (outcomes for B)
C = {6, 7, 8, 9}                  # Event C (outcomes for C)

AintB = A & B
AintBintC = AintB & C



# Calculate the probabilities based on the set sizes

P_A = len(A) / 10                                             # Probability of A (assuming universe of 10 elements)
P_B_given_A = len(AintB) / len(A)                             # Conditional Probability of B given A (A ∩ B / A)
P_C_given_A_and_B = len(AintBintC) / len(AintB)               # Conditional Probability of C given A and B

# Apply the chain rule to calculate the joint probability of A, B, and C
# P(A, B, C) = P(A) * P(B|A) * P(C|A, B)

P_A_B_C = P_A * P_B_given_A * P_C_given_A_and_B


print(f"The joint probability of A, B, and C is: {P_A_B_C:.4f}")



The joint probability of A, B, and C is: 0.1000


# Markov Assupmtion

The **Markov Assumption** refers to a simplifying assumption in probability theory and statistics, particularly in the context of stochastic processes such as Markov chains and Hidden Markov Models (HMMs).

### **Definition:**
The Markov Assumption states that the future state of a process depends only on its present state and not on its past states. In other words, the future is conditionally independent of the past, given the present.

Mathematically, if (X_t) represents the state of a process at state (n), the Markov Assumption implies:

                        P(X{n+1} | Xn)

This means the probability of transitioning to the next state \( X{n+1} \) depends only on the current state \( Xn \), not on the sequence of states that preceded it.

### **Example:**
Imagine a random walk where a person moves between different positions on a line. The probability that they will move to a new position depends only on their current position, not on how they got there.

If at state \( n \), they are at position \( Xn \), the probability of moving to a new position \( X{n+1} \) depends solely on where they are at \( n \) (i.e., \( Xn \)), and not on the entire sequence of positions they took to get to \( Xn \).

### **Why is the Markov Assumption Useful?**
- **Simplifies complex systems**: By assuming that the current state captures all relevant information, we can model systems more easily.
- **Efficient computation**: In a Markov model, you only need to track the current state, reducing computational complexity.
- **Common in machine learning**: The assumption is frequently used in models like Hidden Markov Models (HMMs) and Reinforcement Learning, where predicting future outcomes is based on the current state.

### **Limitations:**
The Markov Assumption is a simplification and may not hold in all situations. Real-world processes often have dependencies on past states (memory) that cannot be ignored. However, in cases where the assumption is reasonable, it can greatly simplify analysis and computations.

In [573]:
import random

                                                            # Define the states (as sets A and B)
states = ['A', 'B']
                                                            # Define the transition probabilities (Markov Assumption)
                                                            # The rows represent the current state, and the columns represent the next state.
transition_matrix = {
    'A': {'A': 0.7, 'B': 0.3},
    'B': {'A': 0.4, 'B': 0.6}
}
                                                            # Function to determine the next state based on the current state and transition matrix
def simulate_markov_chain(initial_state, steps):
    current_state = initial_state
    chain = [current_state]

    for r in range(steps):
        current_state = random.choices(states, weights=[transition_matrix[current_state][state] for state in states], k=1)[0]
        chain.append(current_state)
    
    return chain


# Call the Markov Function
markov_chain = simulate_markov_chain('A', 5)

# Print the results
print(f"Initial state: {initial_state}")
print(f"Markov chain: {markov_chain}")


Initial state: Sunny
Markov chain: ['A', 'B', 'A', 'B', 'B', 'B']


# Fun Weather Example:

Here, we create a simple weather model where the weather at the next time step depends only on the current weather. The weather can be either "Sunny" or "Rainy".

In [575]:
import random

states = ['Sunny', 'Rainy', 'Snowy']

transition_matrix = {
    'Sunny': {'Sunny': 0.6, 'Rainy': 0.3, 'Snowy': 0.1},
    'Rainy': {'Sunny': 0.4, 'Rainy': 0.4, 'Snowy': 0.2},
    'Snowy': {'Sunny': 0.2, 'Rainy': 0.5, 'Snowy': 0.3},
}

def simulate_markov_chain(initial_state, steps):
    states = list(transition_matrix.keys())
    current_state = initial_state
    state_chain = [current_state]

    for r in range(steps):
        current_state = random.choices(states, weights=[transition_matrix[current_state][state] for state in states], k=1 )[0]
        state_chain.append(current_state)
    
    return state_chain


# Call the markov function
markov_chain = simulate_markov_chain('Sunny', 5)


print(f"Initial state: {initial_state}")
print(f"Markov chain: {markov_chain}")


Initial state: Sunny
Markov chain: ['Sunny', 'Sunny', 'Sunny', 'Sunny', 'Sunny', 'Sunny']


# Bayes' Theorem

**Bayes' Theorem** is a fundamental theorem in probability theory that describes the relationship between conditional probabilities. It allows us to update the probability estimate of an event based on new evidence. 
Bayes' Theorem essentially provides a way to revise our belief about an event \( A \) after taking into account the occurrence of event \( B \). It allows us to move from prior knowledge to an updated or posterior belief.


### Formula

P(A|B) = $\frac{P(B|A) * P(A)}{P(B)}$


Where:
-  P(A|B)  is the **posterior probability**, the probability of event \( A \) occurring given that \( B \) is true.
-  P(B|A)  is the **likelihood**, the probability of observing \( B \) given that \( A \) is true.
-  P(A)  is the **prior probability**, the initial probability of \( A \) occurring before considering \( B \).
-  P(B)  is the **marginal probability** of \( B \), the total probability of observing \( B \).




In [412]:

# Given probabilities
P_D = 0.01                      # Prior: Probability of having the disease
P_T_given_D = 0.99              # Likelihood: Probability of a positive test if diseased
P_T_given_not_D = 0.05          # Probability of a positive test if not diseased
P_not_D = 1 - P_D               # Complement: Probability of not having the disease
                                
                                # Calculate the marginal probability of a positive test (P(T))

P_T = P_T_given_D * P_D + P_T_given_not_D * P_not_D

                                # Apply Bayes' Theorem to calculate P(D|T)

P_D_given_T = (P_T_given_D * P_D) / P_T

print(f"The probability of having the disease given a positive test is: {P_D_given_T:.4f}")


The probability of having the disease given a positive test is: 0.1667
