---
numbering:
  title:
    offset: 1
---

# Joint and Conditional Probability

So far we've established rules for "not" statements and "or" statements. We haven't worked out what we should do for "if" statements or "and" statements.

This section is all about "if" and "and" statements. We'll see that, to work out the probability that $A$ and $B$ happen, it is easier to first work out the probability that $A$ happens *if* $B$ happens (or visa versa). The probability that $A$ happens if $B$ happens is a conditional probability. We call it a *conditional* probability since the statement *conditions* on some other outcome, i.e. adds an additional condition that restricts the outcome space.

## And Statements and Joint Probabilities

What is the chance that two events happen simultaneously? For example, what's the chance that, when I draw a card from a deck, that it is *both* a face card *and* in  a red suit?

The probability that two (or more) events co-occur is a **joint probability.** Let $A$ and $B$ denote two events. We denote their joint probability:

$$\text{Pr}(A,B) = \text{Pr}(A \text{ and } B) $$

Here the $A,B$ convention is a standard shorthand. You can read the comma inside the parenthesis as an "and". As always, the $\text{Pr}$ asks for the chance that the event inside the parentheses occurs.

We've seen that concatenating two events with an "or" statement produces a new event equal to the union of the component events, $A \text{ or } B \text{ happen} = \omega \in A \cup B$.  Combining two events with an "and" has the opposite effect. Instead of producing a larger set that contains both of its component sets, combining two sets with an "and" produces a smaller set that is a subset of both its parents. For example:

$$\{a,b,c,d\} \text{ and } \{c,d,e,f,g\} = \{c,d\}$$

Notice that the "and" operation selects for only the elements contained in both sets. On a Venn diagram, this corresponds to the region where the sets overlap, or, intersect. Accordingly, we call the operation that selects only the elements in both sets an **intersection.** The intersection of two sets is denotes $\cap$. 

So, we can now write joint probabilities three ways. It is important that you can read all three:

$$\text{Pr}(A,B) = \text{Pr}(A \text{ and } B) = \text{Pr}(A \cap B)$$

### Bounding Joint Probabilities

Often, we would like to compute a chance, but don't know enough about our problem to calculate the actual value. Alternately, the necessary calculation may be too cumbersome to perform, or, may be overly precise. In these situations it can be helpful to bound the chances instead. 

For instance, it is common practice to define statistical tests with controlled error rates. This means that, the test is designed to produce a guarantee of the kind, "this test will return a wrong answer *at most* $p$ percent of the time under some assumptions." It is common to use a bound instead of an equality in these cases since we want tests that apply to a wide variety of systems and are not sensitive to assumptions that we can't justify. As a result, the collection of assumptions are usually chosen to allow many probability models. Then most events don't have a uniquely defined chance. Instead, there is an upper bound that holds for all chances that could be assigned consistent with the listed assumptions.

Let's practice deriving bounds for joint probabilities.

#### An Upper Bound

Earlier in this chapter we observed that expanding an event to include more outcomes cannot make the event less probable. You'll prove this fact on your homework. For now, take it a natural consequence of equating probabilities to proportions. Increasing the number of ways an event can occur cannot decrease its frequency in a series of trials. 

We then used that idea to argue that applying a union never decreased the chance of an event. 

Applying an intersection has the opposite effect. Instead of making an event more generic by allowing alternative outcomes, applying an intersection makes an event more specific. Thinking in terms of conditions, applying an or loosens the conditions that define an event while applying an and tightens those conditions. Formally:

$$A \cap B \subset A \text{ and } A \cap B \subset B $$

since every element of $A \cap B$ is contained in both $A$ and $B$. It follows that:

$$\text{Pr}(A, B) \leq \text{Pr}(A) \text{ and } \text{Pr}(A, B) \leq \text{Pr}(B)$$

Since the left hand sides of both inequalities above are identical, we can put the inequalities together. Suppose that $\text{Pr}(A) = 0.5$ and $\text{Pr}(B) = 0.2$. If $\text{Pr}(A,B) \leq 0.5$ and is $\leq 0.2$, then the smaller upper bound implies the larger upper bound. So, we might as well just say that $\text{Pr}(A,B) \leq 0.2$. Therefore:

$$\text{Pr}(A,B) \leq \text{min}(\text{Pr}(A),\text{Pr}(B)) $$

In other words, *the probability that two events both occur is never larger than the probability that each event occurs, and is never larger than the probability that the less likely of the two occurs.* Less specifically, *adding constraints to the definition of an event can never increase its likelihood.*

At the end of this chapter we'll expand this observation.

#### A Lower Bound?

We derived an upper bound and a lower bound on the chance of a union. One of the bounds followed the argument provided above. Expanding an event never makes it less likely, so the chance of a union is never less than the chance of its most likely parts. The other followed from an observation about frequencies. The probability of a union is never more than the sum of the probabilities of each of its parts.

Try to find an upper bound on the chance of an intersection. This is a good exercise in using rules. Make a table containing the key rules of chance (nonnegativity, normalization, additivity, the complement rule, etc) and see whether you can derive an upper bound on $\text{Pr}(A, B)$ that holds for any events $A$ and $B$. Make sure that your bound is tight, i.e. there exists some example pair of events $A$ and $B$ where the bound is an equality.

## If Statements and Conditional Probability

% what is the probability of _ given _?

% recall that "if" restricts the outcome space, note that it is the only operation that acts on the outcome space instead of the definition of the event

% procedure is clear for equally likely outcomes

% general division rule based on filtering a sequence of trials

% give definition of conditional probability

## Conditioning Preserves Odds

% is this a definition or a new axiom

% idea: odds shouldn't change under conditioning

% show that knowing the odds actually specifies the distribution

% and the normalization constant is the marginal

## Conditional Distributions

% select columns or rows of joint probability table

% normalize

# The Multiplication Rule

% rearrange conditional definition to give the multiplication rule

% A then A example

% expand to general chain rule

## Reasoning with Sequences

% outcome trees example

# Bayes Rule

 % set up problem

 % match joint

 % derive Bayes

 ## Example: Base Rate Neglect

 % set up problem

 % distinguish likelihood from posterior

 % the base rate matters

# Independence and Dependence

% what does it mean for two events to be unrelated?

% it means that knowing one tells us nothing about the chance of the other

% define independence as conditional is invariant to conditioning, = marginal

% dependent otherwise

% show that, if dependent, then probabilities multiply

% A then S example

## Details and Coincidence

% adding detail never increases chance (conditioning can)

% most exactly specified events are spectacularly unlikely, so we usually need to look at bulk properties/summaries to say much useful, relate to test statistics

% coin toss testing example, when bunched, some events are much more likely than others, does provide evidence

% model: digits of continuous random variable (say, position of a spinner) are approximately uniform and approximately independent

% accept this model and show probability vanishes very quickly, foreshadow issues with continuous random variables