# In-class notebook: 2024-01-08

In this notebook, we will re-familiarize ourselves with the basics of probability. Go through an example looking at marginal and conditional probability, and learn how to generate random variables given a probability distribution.

This notebook is intended to support Chapter 3.1 of the textbook, and material is taken from the following script (from astroML):
* https://github.com/astroML/astroML-notebooks/blob/main/chapter3/astroml_chapter3_Overview_of_Probability_and_Random_Variables.ipynb

## Probability axioms

Given an event $A$, such as the outcome of a coin toss, we assign it a real number $p(A)$, called the probability of $A$. To qualify as a probability, $p(A)$ must satisfy three Kolmogorov axioms:

1. $p(A) \geq 0$ for each $A$.
2. $p(\Omega) = 1$, where $\Omega$ is the set of all possible outcomes.
3. If $A_1$, $A_2$, . . . are disjoint events, then $p (\bigcup^{\infty}_{i=1}A_i) = \sum_{i=1}^{\infty}p(A_i)$ where $\bigcup$ stands for “union.”

### Several useful rules can be derived as a consequence of these axioms

**Sum rule**:  The probability that the union of two events, $A$ and $B$, will happen is given by,

$$\qquad\qquad p(A \cup B) = p(A) + p(B) - p(A \cap B)\qquad\qquad\qquad(1)$$


which is the sum of $A$ and $B$'s respective probabilities minus the probability that both $A$ and $B$ will happen. The union of two events is the probability that *either* event occurs. The $\cap$ in the equation stands for "intersection", and subtracting the last term, $p(A \cap B)$, avoids double counting the places that $A$ and $B$ overlap. 

**Rule of complementary events**: The probability of an event happening plus the probability of it not happening is equal to 1.
$$ p(A) + p(\overline{A}) = 1 $$

**Rule of multiplication**: The probability that both events $A$ and $B$ occur is equal to the probability that event $B$ occurs times the probability that event $A$ occurs given that $B$ has occurred. Switching $A$ and $B$ yields an equivalent statement.

$$ p(A \cap B) = p(A|B)p(B) = p(B|A)p(A)$$

In the equation above, "|" is pronounced "given," and $p(A|B)$ is the probability of event $A$ occurring given that B has occurred.

**The law of total probability**: if events $B_i, i=1,..., N$ are disjoint and their union is the set of all possible outcomes, then

$$p(A) = \sum_i p(A \cap B_i) = \sum_i p(A|B_i)p(B_i).$$

Conditional probabilities also satisfy the law of total probability. Assuming that an event $C$ is not mutually exclusive with $A$ or any of $B_i$,

$$    p(A|C) = \sum\limits_{i}p(A|C \cap B_i)p(B_i|C)  $$


**Example**: Find the probability of rolling an odd number greater than 2.

> Event A: Rolling an odd number
>
> Event B: Rolling a number greater than 2
>
>$P(\text{odd and greater than 2}) = P(\text{odd|if} >2) \times P(> 2)$
>    
>- All outcomes: 1 2 3 4 5 6
>- $P(\text{is}$ > 2) = count(3 4 5 6) / count (1 2 3 4 5 6) = 4/6
>- the probability that a number is odd if it's greater than two:  3,5
>- $P(\text{is odd| if} > 2)$ = count(3 5) / count (3 4 5 6) = 1/2
>   
> $P(\text{odd and greater than 2})$ = 4/6 $\times$ 1/2 = $\fbox{1/3}$
>   
>We can get the same answer by determining the odd numbers greater than two on a die (2) and dividing by the total possibilities (6) to get 1/3.


**Example**: Assume we have a box with three bags of marbles: one bag has 3 green and 7 blue marbles, another bag has 6 green and 4 blue marbles, and the last bag has 2 green and 8 blue marbles. We want to find the total probability of grabbing a green marble from any of the three bags.

> $P(G)$ = probability of choosing a green marble:
>
> $P(G|B_i)$ = probability of choosing a green marble from bag B$_i$:
>
> - Bag 1: $P(G|B_1)$ = 3/10
> - Bag 2: $P(G|B_2)$ = 6/10
> - Bag 3: $P(G|B_3)$ = 2/10
>
> Using $P(G) = \sum_i P(G|B_i)*P(B_i)$ where $B_1 = B_2 = B_3 = \frac{1}{3}$
> 
> $P(G) = \big(\frac{3}{10} \times \frac{1}{3}\big) + \big(\frac{6}{10} \times \frac{1}{3}\big) + \big(\frac{2}{10} \times \frac{1}{3}\big) = \frac{11}{30} = 0.37$
>
> Thus, if we randomly select one of the bags and then randomly select one marble from that bag, the probability we choose a green marble is 0.37.



## Marginal and conditional probability and Bayes’ rule

We will use a simple 2D example to demonstrate:
* How to generate a random distribution of a given shape
* How to present a 2D probability distribution
* What is marginal probability distribution
* What is a conditional probability distribution

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.ticker import NullFormatter

def banana_distribution(N=10000):
    """This generates random points in a banana shape"""
    # create a truncated normal distribution
    theta = np.random.normal(0, np.pi / 8, N)
    theta[theta >= np.pi / 4] /= 2
    theta[theta <= -np.pi / 4] /= 2
    # define the curve parametrically
    r = np.sqrt(1. / abs(np.cos(theta) ** 2 - np.sin(theta) ** 2))
    r += np.random.normal(0, 0.08, size=N)
    x = r * np.cos(theta + np.pi / 4)
    y = r * np.sin(theta + np.pi / 4)
    return (x, y)

# Generate the data and compute the normalized 2D histogram
np.random.seed(0)
x, y = banana_distribution(10000)

Ngrid = 41
grid = np.linspace(0, 2, Ngrid + 1)

H, xbins, ybins = np.histogram2d(x, y, grid)
H /= np.sum(H)

In [None]:
# plot the result
fig = plt.figure(figsize=(14, 6))

# define axes
ax_Pxy = plt.axes((0.2, 0.34, 0.27, 0.52))
ax_Px = plt.axes((0.2, 0.14, 0.27, 0.2))
ax_Py = plt.axes((0.1, 0.34, 0.1, 0.52))
ax_cb = plt.axes((0.48, 0.34, 0.01, 0.52))
ax_Px_y = [plt.axes((0.65, 0.62, 0.32, 0.23)),
           plt.axes((0.65, 0.38, 0.32, 0.23)),
           plt.axes((0.65, 0.14, 0.32, 0.23))]

# set axis label formatters
ax_Px_y[0].xaxis.set_major_formatter(NullFormatter())
ax_Px_y[1].xaxis.set_major_formatter(NullFormatter())

ax_Pxy.xaxis.set_major_formatter(NullFormatter())
ax_Pxy.yaxis.set_major_formatter(NullFormatter())

ax_Px.yaxis.set_major_formatter(NullFormatter())
ax_Py.xaxis.set_major_formatter(NullFormatter())

# draw the joint probability
plt.axes(ax_Pxy)
H *= 1000
plt.imshow(H, interpolation='nearest', origin='lower', aspect='auto',
           extent=[0, 2, 0, 2], cmap=plt.cm.binary)

cb = plt.colorbar(cax=ax_cb)
cb.set_label('$p(x, y)$', fontsize = 14)
plt.text(0, 1.02, r'$\times 10^{-3}$',
         transform=ax_cb.transAxes)

# draw p(x) distribution
ax_Px.plot(xbins[1:], H.sum(0), '-k', drawstyle='steps')

# draw p(y) distribution
ax_Py.plot(H.sum(1), ybins[1:], '-k', drawstyle='steps')

# define axis limits
ax_Pxy.set_xlim(0, 2)
ax_Pxy.set_ylim(0, 2)
ax_Px.set_xlim(0, 2)
ax_Py.set_ylim(0, 2)

# label axes
ax_Pxy.set_xlabel('$x$', fontsize = 14)
ax_Pxy.set_ylabel('$y$', fontsize = 14)
ax_Px.set_xlabel('$x$', fontsize = 14)
ax_Px.set_ylabel('$p(x)$', fontsize = 14)
ax_Px.yaxis.set_label_position('right')
ax_Py.set_ylabel('$y$', fontsize = 14)
ax_Py.set_xlabel('$p(y)$', fontsize = 14)
ax_Py.xaxis.set_label_position('top')

ax_Px.tick_params(axis='both', which='major', labelsize=10)
ax_Py.tick_params(axis='both', which='major', labelsize=10)

# draw conditional probabilities
iy = [3 * Ngrid // 4, Ngrid // 2, Ngrid // 4]
colors = 'rgc'
axis = ax_Pxy.axis()

for i in range(3):
    # overplot range on joint probability
    ax_Pxy.plot([0, 2, 2, 0],
                [ybins[iy[i] + 1], ybins[iy[i] + 1],
                 ybins[iy[i]], ybins[iy[i]]], c=colors[i], lw=1)
    Px_y = H[iy[i]] / H[iy[i]].sum()
    ax_Px_y[i].plot(xbins[1:], Px_y, drawstyle='steps', c=colors[i])
    ax_Px_y[i].yaxis.set_major_formatter(NullFormatter())
    ax_Px_y[i].set_ylabel('$p(x | y = %.1f)$' % ybins[iy[i]], fontsize = 14)
    ax_Px_y[i].tick_params(axis='both', which='major', labelsize=10)
ax_Pxy.axis(axis)

ax_Px_y[2].set_xlabel('$x$', fontsize = 14)

ax_Pxy.set_title('Joint Probability', fontsize = 18)
ax_Px_y[0].set_title('Conditional Probability', fontsize = 18)

## Transformations of random variables

In [None]:
from scipy import stats

# Set up the data
np.random.seed(0)

# create a uniform distribution
uniform_dist = stats.uniform(0, 1)
x_sample = uniform_dist.rvs(1000)
x = np.linspace(-0.5, 1.5, 1000)
Px = uniform_dist.pdf(x)

# transform the data
y_sample = np.exp(x_sample)
y = np.exp(x)
Py = Px / y

#------------------------------------------------------------
# Plot the results
fig = plt.figure(figsize=(14, 6))
fig.subplots_adjust(left=0.11, right=0.95, wspace=0.3, bottom=0.17, top=0.9)

ax = fig.add_subplot(121)
ax.hist(x_sample, 20, histtype='stepfilled', fc='#CCCCCC', density=True)
ax.plot(x, Px, '-k')
ax.set_xlim(-0.2, 1.2)
ax.set_ylim(0, 1.4001)
ax.xaxis.set_major_locator(plt.MaxNLocator(6))
ax.text(0.95, 0.95, r'$p_x(x) = {\rm Uniform}(x)$',
        va='top', ha='right',
        transform=ax.transAxes, fontsize = 12)
ax.set_xlabel('$x$', fontsize = 14)
ax.set_ylabel('$p_x(x)$', fontsize = 14)


ax = fig.add_subplot(122)
ax.hist(y_sample, 20, histtype='stepfilled', fc='#CCCCCC', density=True)
ax.plot(y, Py, '-k')
ax.set_xlim(0.85, 2.9)
ax.xaxis.set_major_locator(plt.MaxNLocator(6))
ax.text(0.95, 0.95, '$y=\exp(x)$\n$p_y(y)=p_x(\ln y) / y$',
        va='top', ha='right',
        transform=ax.transAxes, fontsize = 12)
ax.set_xlabel('$y$',fontsize = 14)
ax.set_ylabel('$p_y(y)$', fontsize = 14)

In [None]:
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
np.random.seed(1)

# create distribution with 1% flux errors
dist = norm(1, 0.01)
flux = dist.rvs(10000)
flux_fit = np.linspace(0.001, 2, 1000)
pdf_flux_fit = dist.pdf(flux_fit)

# transform this distribution into magnitude space
mag = -2.5 * np.log10(flux)
mag_fit = -2.5 * np.log10(flux_fit)
pdf_mag_fit = pdf_flux_fit.copy()
pdf_mag_fit[1:] /= abs(mag_fit[1:] - mag_fit[:-1])
pdf_mag_fit /= np.dot(pdf_mag_fit[1:], abs(mag_fit[1:] - mag_fit[:-1]))

# create distribution with 25% flux errors
dist25 = norm(1, 0.20)
flux25 = dist25.rvs(10000)
flux_fit25 = np.linspace(0.001, 2, 1000)
pdf_flux_fit25 = dist25.pdf(flux_fit25)

# transform this distribution into magnitude space
mag25 = -2.5 * np.log10(flux25)
mag_fit25 = -2.5 * np.log10(flux_fit25)
pdf_mag_fit25 = pdf_flux_fit25.copy()
pdf_mag_fit25[1:] /= abs(mag_fit25[1:] - mag_fit25[:-1])
pdf_mag_fit25 /= np.dot(pdf_mag_fit25[1:], abs(mag_fit25[1:] - mag_fit25[:-1]))