---
numbering:
  title:
    offset: 1
---

(ch1.4)=
# Joint and Marginal Probabilities

[Section 1.3](ch1.3) established rules for "not" statements and "or" statements. We haven't worked out what we should do for "if" statements or "and" statements.

This section is all about "and" statements. 

## And Statements and Joint Probabilities

What is the chance that two events happen simultaneously? For example, what's the chance that, when I draw a card from a deck, that it is *both* a face card *and* in  a red suit?

:::{note} Joint Probability Definition
The probability that two (or more) events co-occur is a **joint probability.** Let $A$ and $B$ denote two events. We denote their joint probability:

$$\text{Pr}(A,B) = \text{Pr}(A \text{ and } B) $$

Here the $A,B$ convention is a standard shorthand. You can read the comma inside the parenthesis as an "and". As always, the $\text{Pr}$ asks for the chance that the event inside the parentheses occurs.
:::

We've seen that concatenating two events with an "or" statement produces a new event equal to the union of the component events, $A \text{ or } B \text{ happen} = \omega \in A \cup B$.  Combining two events with an "and" has the opposite effect. Instead of producing a larger set that contains both of its component sets, combining two sets with an "and" produces a smaller set that is a subset of both its parents. For example:

$$\{a,b,c,d\} \text{ and } \{c,d,e,f,g\} = \{c,d\}$$

Notice that the "and" operation selects for only the elements contained in both sets. On a Venn diagram, this corresponds to the region where the sets overlap, or, intersect. Accordingly, we call the operation that selects only the elements in both sets an **intersection.** The intersection of two sets is denotes $\cap$. 

So, we can now write joint probabilities three ways. It is important that you can read all three:

$$\text{Pr}(A,B) = \text{Pr}(A \text{ and } B) = \text{Pr}(A \cap B)$$

### Bounding Joint Probabilities

Often, we would like to compute a chance, but don't know enough about our problem to calculate the actual value. Alternately, the necessary calculation may be too cumbersome to perform, or, may be overly precise. In these situations it can be helpful to bound the chances instead. 

For instance, it is common practice to define statistical tests with controlled error rates. This means that, the test is designed to produce a guarantee of the kind, "this test will return a wrong answer *at most* $p$ percent of the time under some assumptions." It is common to use a bound instead of an equality in these cases since we want tests that apply to a wide variety of systems and are not sensitive to assumptions that we can't justify. As a result, the collection of assumptions are usually chosen to allow many probability models. Then most events don't have a uniquely defined chance. Instead, there is an upper bound that holds for all chances that could be assigned consistent with the listed assumptions.

Let's practice deriving bounds for joint probabilities.

#### An Upper Bound

In [Section 1.1](ch1.1) we observed that expanding an event to include more outcomes cannot make the event less probable. You'll prove this fact on your homework. For now, take it a natural consequence of equating probabilities to proportions. Increasing the number of ways an event can occur cannot decrease its frequency in a series of trials. 

We then used that idea to argue that applying a union never decreased the chance of an event. 

Applying an intersection has the opposite effect. Instead of making an event more generic by allowing alternative outcomes, applying an intersection makes an event more specific. Thinking in terms of conditions, applying an or loosens the conditions that define an event while applying an and tightens those conditions. Formally:

$$A \cap B \subset A \text{ and } A \cap B \subset B $$

since every element of $A \cap B$ is contained in both $A$ and $B$. It follows that:

$$\text{Pr}(A, B) \leq \text{Pr}(A) \text{ and } \text{Pr}(A, B) \leq \text{Pr}(B)$$

Since the left hand sides of both inequalities above are identical, we can put the inequalities together. Suppose that $\text{Pr}(A) = 0.5$ and $\text{Pr}(B) = 0.2$. If $\text{Pr}(A,B) \leq 0.5$ and is $\leq 0.2$, then the smaller upper bound implies the larger upper bound. So, we might as well just say that $\text{Pr}(A,B) \leq 0.2$. Therefore:

:::{note} Rules of Chance

$$\text{Pr}(A,B) \leq \text{min}(\text{Pr}(A),\text{Pr}(B)) $$

In other words, *the probability that two events both occur is never larger than the probability that each event occurs, and is never larger than the probability that the less likely of the two occurs.* 

Less specifically, *adding constraints to the definition of an event can never increase its likelihood.*
:::

We'll come back to this observation later in the course.

#### A Lower Bound?

We derived an upper bound (see [Section 1.3](ch1.3)) and a lower bound on the chance of a union. One of the bounds followed the argument provided above. Expanding an event never makes it less likely, so the chance of a union is never less than the chance of its most likely parts. The other followed from an observation about frequencies. The probability of a union is never more than the sum of the probabilities of each of its parts.

:::{tip} Exercise üõ†Ô∏è
Try to find an upper bound on the chance of an intersection. This is a good exercise in using rules. Make a table containing the key rules of chance (nonnegativity, normalization, additivity, the complement rule, etc) and see whether you can derive an upper bound on $\text{Pr}(A, B)$ that holds for any events $A$ and $B$. Make sure that your bound is tight, i.e. there exists some example pair of events $A$ and $B$ where the bound is an equality.
:::

### Joint Probability Tables

It is often helpful to visualize joint probabilities with a table.

Suppose that we have two events, $A$, and $B$. For instance, if we roll a fair six-sided die, we could define:

$$A = \{\text{even}\} = \{2,4,6\}, \quad B = \{\leq 6\} = \{1,2,3,4,5\}$$

Every event, $E$, and its complement, $E^c = \{\text{not } E\}$ partition $\Omega$ since no outcome can both satisfy, and not satisfy, an event, and, every outcome must either satisfy, or not satisfy, and event. So $A \cup A^c$ and $B \cup B^c$ both return $\Omega$. 

Combining these partitions produces a finer partition containing four sets:

$$\begin{aligned} & A \text{ and } B = A \cap B, \quad & (\text{not } A) \text{ and } B = A^c \cap B \\
& A \text{ and } (\text{ not } B) = A \cap B^c, \quad & (\text{not } A) \text{ and } (\text{ not } B)  = A^c \cap B^c 
\end{aligned}$$

:::{tip} Exercise üõ†Ô∏è
Check that this is a valid partition by convincing yourself that:

1. No outcome can land in two of these four sets simultaneously (they are mutually exclusive/disjoint)
1. Every outcome must land in one of these sets (their union is $\Omega$)
:::

We can write the same partition in terms of the outcomes in each set:

$$\begin{aligned} & \{2, 4\}, \quad & \{1,3,5\} \\
& \{6\}, \quad &  \emptyset 
\end{aligned}$$

Since we are rolling a fair die, all outcomes are equally likely, so the probability of each event in the table is the number of ways it can happen divided by the number of possible rolls. Therefore:

$$\begin{aligned} & \text{Pr}(A,B) = 2/6, \quad & \text{Pr}(A^c,B) = 3/6 \\
& \text{Pr}(A,B^c) = 1/6 , \quad & \text{Pr}(A^c,B^c) = 0  
\end{aligned}$$


If we are only interested in whether a roll is even, and whether it is less than 6, then the only four distinct outcomes are (even and less than 6), (odd and less than 6), (even and equal to 6), (odd an equal to 6). So, we could replace $\Omega = \{1,2,3,4,5,6\}$ with:

$$\Omega' = \left\{ \begin{aligned} & \text{even and less than 6}, \quad & \text{odd and less than 6}, \\
& \text{even and equal to 6}, \quad & \text{odd and equal to 6} \end{aligned}\right\} $$

We now have an outcome space containing 4 outcomes that are naturally arranged into a table. Their chances were computed above. Their chances don't match, so are just a categorical distribution over the four categories. It's common practice to represent the categorical distribution over all intersections produced by a pair of events and their complements with a **joint probability table.** In our example, the table is:

Event | $A$ | not $A$     
:----:|:-------------|:-------------
$B$   | $\text{Pr}(A,B) = 2/6 $| $\text{Pr}(A^c,B) = 3/6 $
not $B$   | $\text{Pr}(A,B^c) = 1/6 $   | $\text{Pr}(A^c,B^c) = 0 $

:::{note} Joint Probability Table Definition
Formally, a **joint probability table** is a table whose:

1. columns correspond to whether an event does or does not occur
1. rows correspond to whether a different event does not occur
1. entries equal the joint probability of the event defined as the intersection of the row and column statements
:::

We can generalize this idea to any pair of partitions. For example:

Event | $\{2,4\}$  | $\{1,3,5\}$ | $\{6\}$   
:----:|:-------------|:-------------|:-------------
$\{1,2,3\}$  | $1/6$ | $2/6$ | $0$
$\{4,5\}$   | $1/6 $   | $1/6$ | $0$
$\{6\}$   | $0$   | $0 $ | $1/6$


Any joint probability table specifies a categorical distribution. Its entries are joint probabilities. Like any categorical probabilities, they must add to one. So, any table containing:

1. all nonnegative numbers
1. that add to one

could be a valid joint probability table. Fact (2.) is useful since it helps find joint probabilities from partially complete tables. For instance, if we know:

Event | $C$ | not $C$     
:----:|:-------------|:-------------
$D$   | $ 1/6 $| $ 2/6 $
not $D$   | $2/6 $   | ?

Then, to ensure the entries add to one, the missing entry, $\text{Pr}(C^c, D^c)$ must equal $1 - (1/6 + 2/6 + 2/6) = 1/6$. You should check that this is just an application of the complements rule.



## Marginal Probabilities

The addition rule for unions relates the sum of the entries in a row or column of a joint probability table to the probability of the event defining the column. For instance:

$$\begin{aligned} \text{Pr}(A) & = \text{Pr}(\text{even}) \\ & = \text{Pr}(\text{(even and < 6) or (even and = 6)}) \\
& = \text{Pr}((A \cap B) \cup (A \cap B^c)) \\
& = \text{Pr}(A \cap B) + \text{Pr}(A \cap B^c) \\
& = \text{Pr}(A,B) + \text{Pr}(A,B^c) \end{aligned}$$

Confirm for yourself that the addition rule applies for the union: $(A \cap B) \cup (A \cap B^c)$. Check that the two parenthetical events are disjoint before proceeding. 

Then, the probability of event $A$, equals the sum of the joint probabilities in its column of the table:

Event | $A$    
:----:|:-------------
$B$   | $\text{Pr}(A,B) = 2/6 $
not $B$   | $\text{Pr}(A,B^c) = 1/6 $   
$\text{ }$ | $\text{Pr}(A) = 3/6$ 

We've dropped the column corresponding to $A^c$ from the table for this calculation since it is not needed. In plain language, *the probability of the event $A$ is the chance $A$ and $b$ occur, plus the chance $A$ occurs and $B$ does not.*

This example illustrates a general rule. The probability of any event can be represented as a sum of joint probabilities corresponding to a partition of the event.  For instance:

Event | $\{2,4\}$   
:----:|:-------------
$\{1,2,3\}$  | $1/6$ 
$\{4,5\}$   | $1/6 $   
$\{6\}$   | $0$   
$\text{ }$ | $\text{Pr}(\{2,4\}) = 2/6$ 

The probability in the bottom row is an example of a **marginal probability**. 

:::{note} Marginal Probability Definition
Given a collection of joint events, a **marginal probability** is the chance of a union of the events.
:::

Given a joint probability table, the marginal probabilities are the sums of the rows and columns of the table. You can remember the name *marginal* by thinking that the marginal probabilites live at the *margin*, or edge, of the table. In the original example we have four joint probabilities and four marginals:

$$\textbf{Joints:} = \begin{cases} & \text{Pr}(A,B) = p_{AB} \\
& \text{Pr}(A^c,B) = p_{A^cB} \\
& \text{Pr}(A,B^c) = p_{AB^c} \\
& \text{Pr}(A^c,B^c) = p_{A^cB^c} \\  \end{cases}  \quad \textbf{Marginals:} = \begin{cases} & \text{Pr}(A) = p_{AB} + p_{AB^c} \\
& \text{Pr}(A^c) = p_{A^cB} + p_{A^cB^c} \\
& \text{Pr}(B) = ... \\
& \text{Pr}(B^c) = ... \\  \end{cases}$$

üõ†Ô∏è To check your understanding, try to fill in the ... above.

:::{hint} Solutions
:class: dropdown
$$\textbf{Marginals:} = \begin{cases} & \text{Pr}(A) = p_{AB} + p_{AB^c} \\
& \text{Pr}(A^c) = p_{A^cB} + p_{A^cB^c} \\
& \text{Pr}(B) = p_{AB} + p_{A^cB} \\
& \text{Pr}(B^c) = p_{AB^c} + p_{A^cB^c} \\  \end{cases}$$
:::

Here's the completed joint probability table for our original example, including the marginals:

Event | $A$ | not $A$ | $B$ Marginals
:----:|:-------------|:-------------|:-------------
$B$   | $p_{AB} = 2/6 $| $p_{A^cB} = 3/6 $| $2/6 + 3/6 = 5/6$
not $B$   | $p_{AB^c} = 1/6 $   | $p_{A^cB^c} = 0 $ | $2/6 + 0 = 1/6$
$A$ Marginals| $2/6 + 1/6 = 3/6$ | $3/6 + 0 = 3/6$ | 1

Notice that:

:::{tip} Joint Probability Table Rules
1. The sum of the joint probabilities in every column and row return the corresponding marginal
1. The sum of all marginals of $A$, or marginals of $B$, must equal 1, since $A$ and $A^c$ partition $\Omega$. 
:::

So, if we add the marginals, then we have more rules that can be used to fill in missing entries. You can think about this like Sudoku. All the joint entries must add to one, all of the marginals along a boundary must add to one, and all the joint entries in a given row or column must add to the marginal for that row or column:

Event | $A$ | not $A$ | $B$ Marginals
:----:|:-------------|:-------------|:-------------
$B$   | $p_{AB}$| $p_{A^cB}$| $p_B = p_{AB} + p_{A^cB}$
not $B$   | $p_{AB^c}$   | $p_{A^cB^c} = 0 $ | $p_{B^c} = p_{AB^c} + p_{A^cB^c}$
$A$ Marginals| $p_{A} = p_{AB} + p_{AB^c}$ | $p_{A^c} = p_{A^cB} + p_{A^cB^c}$ | 1

Here's the full table for our larger example:

Event | $\{2,4\}$  | $\{1,3,5\}$ | $\{6\}$    | Marginals
:----:|:-------------|:-------------|:-------------|:-------------
$\{1,2,3\}$  | $1/6$ | $2/6$ | $0$ | $3/6$
$\{4,5\}$   | $1/6 $   | $1/6$ | $0$ | $2/6$
$\{6\}$   | $0$   | $0 $ | $1/6$ | $1/6$
Marginals | $2/6$ | $3/6$ | $1/6$ | 1

The procedure we just used to compute marginal probabilities from joint probabilities is called **marginalization.** To marginalize:

:::{tip} Marginalization
1. Identify the desired marginal probability, e.g. $\text{Pr}(A)$
1. Expand the event by partitioning. That is, break the event $A$ into a series of ways it could occur. We used $A \cap B$ and $A \cap B^c$.
1. Look up the corresponding joint probabilities, or, if they are straightforward to calculate, calculate them.
1. Add up the joints:

$$\text{Pr}(A) = \sum_{j=1}^n \text{Pr}(A, B_j) \text{ if } \{B_j\}_{j=1}^n \text{ partition } \Omega. $$
:::

You'll use this strategy a lot in this class. It is often true that we want to find the probability of some event, the probability is hard to compute, but can be computed if we break down the event according to a list of ways it can happen. As long as we can compute the probability of each way it can occur, we can add those chance together to get the chance of the desired event. 

:::{tip} Example
:class: dropdown
For instance, suppose that tomorrow's weather obeys the joint probabilities:

Event | Rain  | Clouds  | Sun    | Marginals
:----:|:-------------|:-------------|:-------------|:-------------
Cold  | $2/10$ | $3/10$ | $1/10$ | $6/10$
Warm  | $1/10 $   | $0$ | $1/10$ | $2/10$
Hot   | $0$   | $0 $ | $2/10$ | $2/10$
Marginals | $3/10$ | $3/10$ | $4/10$ | 1

Then, the chance it is warm is:

$$\begin{aligned} \text{Pr}(\text{Warm}) & = \text{Pr}(\text{Warm, Rain})  + \text{Pr}(\text{Warm, Clouds}) + \text{Pr}(\text{Warm, Sun}) \\ & = 1/10 + 0/10 + 1/10 = 2/10. \end{aligned}$$
:::