# Understanding Probability for Multiple Random Variables

Probability helps us measure the uncertainty of random variable outcomes. It's quite straightforward when dealing with a single variable, but in machine learning, we often work with several variables interacting in complex, unknown ways. To handle this, we use specific techniques to calculate the probabilities of multiple random variables, which include:

- **Joint Probability**: This measures the likelihood of two or more events happening at the same time.

- **Marginal Probability**: This quantifies the probability of an event without considering the outcomes of other variables.

- **Conditional Probability**: It calculates the probability of an event when one or more other events have already occurred.

In this tutorial, we'll provide a beginner-friendly introduction to these concepts, allowing you to gain a solid understanding of how they form the foundation for fitting predictive models to data.

By the end of this tutorial, you'll have a clear grasp of:

- Joint probability, which deals with the simultaneous occurrence of events.

- Marginal probability, which focuses on single events independently of other variables.

- Conditional probability, which evaluates the likelihood of an event given the presence of other events.

---


## Probability for a Single Random Variable

Probability is all about measuring the likelihood of an event. It tells us how probable a specific outcome is for a random variable, like flipping a coin, rolling a dice, or drawing a playing card from a deck.

- In simple terms, probability is a way of quantifying the chances of something happening.
  - As described in "Probability: For the Enthusiastic Beginner" (2016), it's about assessing the likelihood of an event taking place.

- For a random variable 'x', we use a function P(x) to assign probabilities to all its possible values. This is expressed as the **Probability Density of x**, denoted as P(x).

- When we want to find the probability of a specific event 'A' occurring for 'x', we use P(x = A) or simply P(A), where 'A' represents the event itself.

- The calculation of probability is quite straightforward. We divide the number of desired outcomes by the total possible outcomes.
  - This is especially intuitive for discrete random variables, like rolling a die. For instance, the probability of rolling a 5 on a six-sided die is calculated as the number of 5s (1) divided by the total possible outcomes (6), resulting in approximately 0.1666 or 16.666%.

- It's crucial to remember that the sum of probabilities for all possible outcomes should always equal one, ensuring valid probabilities:

  - **Sum of the Probabilities for All Outcomes = 1.0**

- An impossible outcome carries a probability of zero. For instance, you can't roll a 7 with a standard six-sided die:

  - **Probability of Impossible Outcome = 0.0**

- On the other hand, a certain outcome has a probability of one. For example, when rolling a six-sided die, it's certain that a value between 1 and 6 will occur:

  - **Probability of Certain Outcome = 1.0**

- You can also calculate the probability of an event not happening, which is called the complement.
  - This can be determined by subtracting the probability of the event from one, represented as 1 - P(A). For example, if you want to find the probability of not rolling a 5, it would be 1 - P(5), resulting in approximately 0.8333 or 83.333%.

  - **P(not A) = 1 - P(A)**

Now that we've covered the basics of probability for a single random variable, let's delve into probability for multiple random variables.

---


## Probability with Multiple Random Variables

- In the field of machine learning, we often work with many variables.
  - Imagine a data table like one in Excel, where each row represents a different event or observation, and each column represents a distinct variable.
  - These variables can be categorized as either discrete (taking on specific values) or continuous (having numerical values).

- Dealing with multiple variables can be complex because they can interact in various ways, affecting their probabilities.
  - To simplify this, let's focus on just two variables, X and Y, although the same principles apply to more variables. We'll also narrow our scope to the probability of two specific events: one for each variable (X = A, Y = B). However, we could equally discuss groups of events for each variable.

- With this context, we introduce the probability of multiple random variables as the probability of event A and event B happening, represented as X = A and Y = B.
  - We assume these two variables are related or dependent in some way. Consequently, there are three key types of probability to consider:

    - **Joint Probability**: This measures the probability of both events A and B occurring together.

    - **Marginal Probability**: It calculates the probability of event A happening, considering variable Y.

    - **Conditional Probability**: This computes the probability of event A happening, given that event B has occurred.

- These types of probability underpin many predictive modeling techniques in areas like classification and regression. For instance:

  - The probability of a data row is essentially the combined probability across each input variable.

  - The probability of a specific value for one input variable depends on the probabilities of the other input variables.

  - The predictive model itself is an estimation of the probability of an output given an input example.

Understanding joint, marginal, and conditional probability is fundamental in machine learning.

Let's explore each of them in more detail.

---

## Joint Probability: Simultaneous Events

Sometimes, we want to understand the likelihood of two events occurring at the same time, such as the outcomes of two different random variables.
  - This is referred to as the joint probability. When we consider the joint probability of two or more random variables, it's known as the joint probability distribution.
  - To formally express the joint probability of event A and event B, we use the notation:

    - **P(A and B)**

  - Alternatively, the "and" or conjunction is denoted with the upside-down capital U (∩) or sometimes a comma (,). So, you might also see it as:

    - **P(A ∩ B) = P(A, B)**

- The joint probability for events A and B is calculated as the probability of event A given that event B has occurred, multiplied by the probability of event B. In a formal sense, it can be expressed as:

    - **P(A ∩ B) = P(A given B) × P(B)**

  - This calculation is sometimes known as the fundamental rule of probability or the product rule of probability.
  - Here, P(A given B) represents the probability of event A given that event B has occurred, which is called the conditional probability (explained below).

- It's important to note that the joint probability is symmetrical, meaning that **P(A ∩ B)** is the same as **P(B ∩ A)**.


---


## Marginal Probability: Probability of an Event for One Variable

Sometimes, we want to know the probability of an event for one random variable, regardless of what's happening with another random variable.
- For instance, we might be interested in the probability of X = A for all the possible outcomes of Y. This concept is known as marginal probability, or the marginal distribution.

- When we talk about the marginal probability of one random variable in the presence of additional random variables, it's referred to as the marginal probability distribution.

- The term "marginal" is used because if you were to arrange all the outcomes and their probabilities for the two variables in a table (with X as columns and Y as rows),
  - the marginal probability of one variable (X) would be the sum of the probabilities for the other variable (Y rows) on the edge of the table.

  - There's no special notation for marginal probability; it's simply the sum or union over all the probabilities of all events for the second variable when you have a fixed event for the first variable.

- Mathematically, this can be expressed as:

    - **P(X = A) = Σ P(X = A, Y = y) for all y in Y**

  - This is another essential rule in probability known as the sum rule.

- It's important to note that marginal probability differs from conditional probability (discussed next) because it considers the union of all events for the second variable, rather than the probability of a single event.


---


## Conditional Probability: Probability Given Another Event

Sometimes, we're interested in the probability of one event occurring when we already know that another event has taken place.
  - This is known as conditional probability.
  - When we're dealing with the conditional probability of one or more random variables, we refer to it as the conditional probability distribution.

- The idea here is to find the probability of an event A happening under the condition that event B has occurred. We formally express this as:

  - **P(A given B)**

- The term "given" is represented using the pipe (|) operator. In other words, you might see it written as:

  - **P(A|B)**

- Mathematically, the conditional probability for events A given event B is calculated as:

  - **P(A|B) = P(A ∩ B) / P(B)**

  - Here, it's essential to note that this calculation assumes that the probability of event B is not zero, meaning it's not impossible.
    - The concept of event A given event B doesn't imply that event B has already happened or is certain. Instead, it signifies the probability of event A occurring after or in the presence of event B during a specific trial.

---



# Probability for Independence and Exclusivity

In the context of multiple random variables, there are two important scenarios to consider.
- First, it's possible that these variables don't interact, meaning they are independent of each other.
- Alternatively, these variables may interact, but their events don't happen at the same time, which we call exclusivity.

We'll delve into understanding the probabilities associated with multiple random variables under these circumstances in this section.

  ---
  


## Independence: When Variables Don't Affect Each Other

When one variable isn't influenced by another, we call this independence or statistical independence. This concept significantly impacts how we calculate the probabilities of these two variables.

- For instance, let's consider the joint probability of independent events A and B. In this case, it's essentially the probability of A multiplied by the probability of B. Mathematically, we express this as:

  - **Joint Probability: P(A ∩ B) = P(A) × P(B)**

- As you might intuit, when dealing with an independent random variable, the marginal probability for an event is simply the probability of that event. In other words, it's the probability of a single random variable that you're familiar with:

  - **Marginal Probability: P(A)**

  - The marginal probability of an independent variable is essentially just the probability of that independent variable.

- Similarly, when variables are independent, the conditional probability of A given B is simply the probability of A, as the probability of B has no effect. This can be be described as:

  - **Conditional Probability: P(A|B) = P(A)**

- The idea of statistical independence may be familiar from the world of sampling.
  - It assumes that one sample isn't influenced by previous samples and doesn't affect future ones.
  - In many machine learning algorithms, the assumption is made that samples from a domain are independent of each other and come from the same probability distribution, often referred to as "independent and identically distributed" or simply "i.i.d."


---


## Exclusivity: When Events Don't Overlap

When one event happening prevents the occurrence of other events, we say that these events are mutually exclusive.
- In simple terms, they don't overlap; they're like separate worlds.
-  The probabilities of these events are disjoint, which means they can't interact - they are strictly independent.
  - In fact, if event A and event B are mutually exclusive, the joint probability of both events happening is zero:

    - **P(A ∩ B) = 0.0**

- Instead, we describe the probability of an outcome as event A or event B happening, which can be formally stated as:

  - **P(A or B) = P(A) + P(B)**

- The "or" is also called a union and is denoted by the capital U letter (∪), so you might see it as:

  - **P(A or B) = P(A ∪ B)**

- Now, if the events are not mutually exclusive, meaning that they can both happen, we might be interested in the outcome of either event.
  - In this case, the probability of non-mutually exclusive events is calculated as the probability of event A and the probability of event B, minus the probability of both events happening simultaneously. Mathematically, it can be expressed as:

    - **P(A ∪ B) = P(A) + P(B) - P(A ∩ B)**

This concept helps us understand how to deal with situations where events can both occur, or they are exclusive and don't overlap.


---

## Further Reading


**Books:**

1. [Probability: For the Enthusiastic Beginner, 2016](https://amzn.to/2jULJsu)
2. [Pattern Recognition and Machine Learning, 2006](https://amzn.to/2JwHE7I)
3. [Machine Learning: A Probabilistic Perspective, 2012](https://amzn.to/2xKSTCP)

**Articles:**

1. [Probability, Wikipedia](https://en.wikipedia.org/wiki/Probability)
2. [Notation in probability and statistics, Wikipedia](https://en.wikipedia.org/wiki/Notation_in_probability_and_statistics)
3. [Independence (probability theory), Wikipedia](https://en.wikipedia.org/wiki/Independence_(probability_theory))
4. [Independent and identically distributed random variables, Wikipedia](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables)
5. [Mutual exclusivity, Wikipedia](https://en.wikipedia.org/wiki/Mutual_exclusivity)
6. [Marginal distribution, Wikipedia](https://en.wikipedia.org/wiki/Marginal_distribution)
7. [Joint probability distribution, Wikipedia](https://en.wikipedia.org/wiki/Joint_probability_distribution)
8. [Conditional probability, Wikipedia](https://en.wikipedia.org/wiki/Conditional_probability)



---

