# Lecture 25
# Beta-Gamma (bank-post office), order statistics, conditional expectation, two envelope paradox

## Connecting the Gamma and Beta Distributions

Say you have to visit both the bank and the post office today. What can we say about the total times you have to wait in the lines?

Let $X \sim Gamma(a, \lambda)$ be the total time you wait in line at the bank, given that there are $a$ people in line in front of you, and the waiting times are i.i.d $Expo(\lambda)$; recall the analogies of geometric $\rightarrow$ negative binomial, and of exponential $\rightarrow$ gamma. The waiting time in line at the bank for everyone individually is $Expo(\lambda)$, and as the $a+1^{th}$ person, your time in line is sum of those $a$ $Expo(\lambda)$ times.

Similarly, let $Y \sim Gamma(b, \lambda)$ be the total time you wait in line at the post office, given that there are $b$ people in line in front of you.

Assume that $X, Y$ are independent.

### Questions

1. What is the distribution of $T = X + Y$?
1. Given $T = X + Y$ and $W = \frac{X}{X+Y}$, what is the joint distribution?
1. Are $T, W$ independent?

### What is the distribution of $T$?

We immediately know that the total time you spend waiting in the lines is

\begin{align}
  T &= X + Y \\
    &\sim Gamma(a+b, \lambda)
\end{align}


### What is the distribution of  $T,W$?

Let $\lambda = 1$, to make the calculation simpler. We do not lose any generality, since we can scale by $\lambda$ later.

So we are looking the joint PDF of $T,W$

\begin{align}
  \text{joint PDF } f_{T,W}(t,w) &= f_{X,Y}(x,y) \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\
  &= \frac{1}{\Gamma(a) \Gamma(b)} \, x^a \, e^{-x} \, y^b \, e^{-y} \, \frac{1}{xy} \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\\\
  \\
  \text{for the Jacobian, let } x + y &= t \\
  \frac{x}{x+y} &= w \\
  \\
  \Rightarrow x &= tw \\
  \\
  1 - \frac{x}{x+y} &= 1 - w \\
  \frac{x + y - x}{t} &= 1 - w \\
  \\
  \Rightarrow y &= t(1-w) \\\\
  \\
  \left| \frac{\partial(x,y)}{\partial(t,w)} \right| &= 
    \begin{bmatrix}
      \frac{\partial x}{\partial t} & \frac{\partial x}{\partial w} \\
      \frac{\partial y}{\partial t} & \frac{\partial y}{\partial w} 
    \end{bmatrix} \\
    &=
    \begin{bmatrix}
      w & t \\
      1-w & -t 
    \end{bmatrix} \\
    &= -tw - t(1-w) \\
    &= -t \\\\
    \\
    \text{returning to PDF } f_{T,W}(t,w) &=  \frac{1}{\Gamma(a) \Gamma(b)} \, x^a \, e^{-x} \, y^b \, e^{-y} \, \frac{1}{xy} \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\
    &= \frac{1}{\Gamma(a) \Gamma(b)} \, (tw)^a \, e^{-(tw)} \, (t(1-w))^b \, e^{-t(1-w)} \, \frac{1}{tw \, t(1-w)} \, t \\
    &= \frac{1}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, t^{a+b} \, e^{-t} \, \frac{1}{t} \, c &\quad \text{ where } c \text{ is the normalizing constant for } T \\
    &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} &\quad \text{ multiplying by } 1 
\end{align}

Since we are able to successfully derive $f_{T,W}(t,w)$ in terms of $T \sim Gamma(a,b)$ and $W \sim Beta(a,b)$, this means we have also answered the third question: _$T,W$ are independent_.

### Unexpected Discovery: Normalizing Constant for Beta

Now say we are interested in finding the marginal PDF for $W$

\begin{align}
  f_{W}(w) &= \int_{-\infty}^{\infty} f_{T,W}(t,w) dt \\
  &= \int_{-\infty}^{\infty} \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} \, dt \\
  &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \, \int_{-\infty}^{\infty} \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} \, dt\\
  &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} 
\end{align}

But notice that since marginal PDF $f_{W}(w)$ must integrate to 1, then $\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)}$ is the normalizing constant for the Beta distribution! If this were not true, then $f_{W}(w)$ could not be a valid PDF.

## Example Usage: Finding $\mathbb{E}(W), W \sim Beta(a,b)$

There are two ways you could find $\mathbb{E}(W)$.

You could use LOTUS, where you would simply do:

\begin{align}
  \mathbb{E}(W) &= \int \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \, w \, dw \\
  &= \int \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a} \, (1-w)^{b-1} \, dw \\
\end{align}

... and would not be so hard to handle, since that also is a $Beta$.

Or, since we are continuing on the topic of $W = X + Y$, we have:

\begin{align}
  \mathbb{E}(W) &= \mathbb{E}\left( \frac{X}{X+Y} \right) \\
  &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \quad \text{ which is true, under certain conditions}
\end{align}

So why is $\mathbb{E}\left( \frac{X}{X+Y} \right) = \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)}$?

Facts

1. since $T$ is independent of $W$, $\frac{X}{X+Y}$ is independent of $X+Y$
2. since independence implies they are uncorrelated, $\frac{X}{X+Y}$ and $X+Y$ are therefore _uncorrelated_ 
3. by definition of uncorreleted, \begin{align}
  \mathbb{E}(AB) - \mathbb{E}(A) \, \mathbb{E}(B) &= 0 \\
  \mathbb{E}(AB) &= \mathbb{E}(A) \, \mathbb{E}(B) \\\\
  \\
  \mathbb{E} \left( \frac{X}{X+Y} \, (X+Y) \right) &= \mathbb{E}(\frac{X}{X+Y}) \, \mathbb{E}(X+Y) \\
  \mathbb{E}(X) &= \mathbb{E}(\frac{X}{X+Y}) \, \mathbb{E}(X+Y) \\
  \Rightarrow \mathbb{E}\left( \frac{X}{X+Y} \right) &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \\\\
  \\
  \therefore \mathbb{E}(W) &= \mathbb{E} \left( \frac{X}{X+Y} \right) \\
  &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \\
  &= \frac{a}{a+b}
\end{align}