# Joint Probability
- Calculate conditional distributions when giving a full distribution.
- Calculate marginal distributions from a joint distribution.
- Obtain the marginal mean from conditional means and marginal probabilities, using the law of total expectation.
- Use the law of total probability to convert between conditional + marginal distributions, and joint distributions.
- Describe the consequences of independent random variables.
- Calculate and describe the pros and cons of dependence measures: covariance, correlation, and kendall's tau.

Conditional Distributions (15 min)
Probability distributions describe an uncertain outcome, but what if we have partial information?

Consider the example of ships arriving at the port of Vancouver again. Each ship will stay at port for a random number of days, which we'll call the length of stay (LOS) or $D$, according to the following (made up) distribution:

Length of Stay (LOS) Probability

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook_connected"

In [6]:
length_of_stay = {
                        1: .25,
                        2: .35,
                        3: .2,
                        4: .1,
                        5: .1}
px.bar(x=length_of_stay.keys(), y=length_of_stay.values(), title='Probabilities of Length of Stays')

In [8]:
los_gang_demand = {
    1: [0.0017, 0.0425, 0.1247, 0.0811], 
    2: [0.0266, 0.1698, 0.1360, 0.0176], 
    3: [0.0511, 0.1156, 0.0320, 0.0013],
    4: [0.0465, 0.0474, 0.0059, 0.0001],
    5: [0.0740, 0.0246, 0.0014, 0.0000]
}

Suppose a ship has been at port for 2 days now, and it'll be staying longer. What's the distribution of length-of-stay now? Using symbols, this is written as $P(D = d \mid D > 2)$, where the bar "|" reads as "given" or "conditional on", and this distribution is called a conditional distribution. We can calculate a conditional distribution in two ways: a "table approach" and a "formula approach".

Table approach:

Subset the pmf table to only those outcomes that satisfy the condition ($D > 2$ in this case). You'll end up with a "sub table".
Re-normalize the remaining probabilities so that they add up to 1. You'll end up with the conditional distribution under that condition.
Formula approach: In general for events $A$ and $B$, the conditional probability formula is $$P(A \mid B) = \frac{P(A \cap B)}{P(B)}.$$

For the ship example, the event $A$ is $D = d$ (for all possible $d$'s), and the event $B$ is $D > 2$. Plugging this in, we get $$P(D = d \mid D > 2) = \frac{P(D = d \cap D > 2)}{P(D > 2)} = \frac{P(D = d)}{P(D > 2)} \text{ for } d = 3,4,5.$$

The only real "trick" is the numerator. How did we reduce the convoluted event $D = d \cap D > 2$ to the simple event $D = d$ for $d = 3,4,5$? The trick is to go through all outcomes and check which ones satisfy the requirement $D = d \cap D > 2$. This reduces to $D = d$, as long as $d = 3,4,5$.

$$P(D = d \mid D > 2) =  \frac {D = d}{D=3 + D=4 + D=5} $$

Conditioning on one Variable
What's usually more interesting than a joint distribution are conditional distributions, when other variables are fixed. This is a special type of conditional distribution and an extremely important type of distribution in data science.

For example, a ship is arriving, and they've told you they'll only be staying for 1 day. What's the distribution of their gang demand under this information? That is, what is $P(\text{gang} = g \mid \text{LOS} = 1)$ for all possible $g$?

Table approach:

Isolating the outcomes satisfying the condition ($\text{LOS} = 1$), we obtain the first row:
Gangs: 1 Gangs: 2 Gangs: 3 Gangs: 4

0.0017 0.0425 0.1247 0.0811

Now, re-normalize the probabilities so that they add up to 1, by dividing them by their sum, which is 0.25:
Gangs: 1 Gangs: 2 Gangs: 3 Gangs: 4

0.0068 0.1701 0.4988 0.3243

Formula Approach: Applying the formula for conditional probabilities, we get $$P(\text{gang} = g \mid \text{LOS} = 1) = \frac{P(\text{gang} = g, \text{LOS} = 1)}{P(\text{LOS} = 1)},$$ which is exactly row 1 divided by 0.25.

Here's a plot of this distribution. For comparison, we've also reproduced its marginal distribution.

In [9]:
los_1_day_probs = [x / .25 for x in los_gang_demand[1]] # .25 comes from the sum of the probabilities
los_1_day_probs

[0.0068, 0.17, 0.4988, 0.3244]

In [19]:
fig = px.bar(x=range(1,5), y=[los_1_day_probs, los_gang_demand[1]], 
            barmode='overlay', title="Conditional Probability of Gang Demand Given LOS=1")

newnames = {'wide_variable_0':'Conditional', 'wide_variable_1': 'Marginal'}
fig.for_each_trace(lambda t: t.update(name = newnames[t.name],
                                      legendgroup = newnames[t.name],
                                      hovertemplate = t.hovertemplate.replace(t.name, newnames[t.name])
                                     )
                  )

Law of Total Probability/Expectation
Quite often, we know the conditional distributions, but don't directly have the marginals. In fact, most of regression and machine learning is about seeking conditional means! (More in DSCI 561/571 +):

For example, suppose you have the following conditional means of gang request given the length of stay of a ship.


This curve is called a model function, and is useful if we want to predict a ship's daily gang request if we know their length of stay. But what if we don't know their length of stay, and we want to produce an expected gang request? We can use the marginal mean of gang request!

In general, a marginal mean can be computed from the conditional means and the probabilities of the conditioning variable. The formula, known as the law of total expectation, is $$E(Y) = \sum_x E(Y \mid X = x) P(X = x).$$

Here's a table that outlines the relevant values:

Length of Stay (LOS) E(Gang | LOS) P(LOS)



Multiplying the last two columns together, and summing, gives us the marginal expectation: 2.3.

Also, remember that probabilities are just means, so the result extends to probabilities: $$P(Y = y) = \sum_x P(Y = y \mid X = x) P(X = x)$$ This is actually a generalization of the law of total probability we saw before: $P(Y=y)=\sum_x P(Y = y, X = x)$.