# Chapter 7: Joint Distributions
 
This Jupyter notebook is the Python equivalent of the R code in section 7.7 R, pp. 318 - 320, [Introduction to Probability, 1st Edition](https://www.crcpress.com/Introduction-to-Probability/Blitzstein-Hwang/p/book/9781466575578), Blitzstein & Hwang.

----

In [1]:
import numpy as np

## Multinomial

The functions for the Multinomial distribution represented by [`scipy.stats.multinomial`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multinomial.html) are `pmf` (which is the joint PMF of the Multinomial distribution) and `rvs` (which generates realizations of Multinomial random vectors). The joint CDF of the Multinomial is a pain to work with, so like R it is not built supported in `multinomial`.

To use `pmf`, we have to input the value at which to evaluate the joint PMF, as well as the parameters of the distribution. For example,

In [2]:
from scipy.stats import multinomial

# to learn more about scipy.stats.multinomial, un-comment out the following line
#print(multinomial.__doc__)

In [3]:
x = [2, 0, 3]
n = 5
p = [1/3, 1/3, 1/3]

ans = multinomial.pmf(x, n, p)
print('multinomial.pmf(x, n, p) = {}'.format(ans))

multinomial.pmf(x, n, p) = 0.041152263374485576


returns the probability $P(X_1 = 2, \, X_2 = 0, \, X_3 = 3)$, where

\begin{align}
  X = (X_1, \, X_2, \, X_3) \sim Mult_3\left(5, \, (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})\right)
\end{align}

Of course, `n` has to equal `numpy.sum(x)`; if we attempted to do `multinomial.pmf(x, 7, p)`, the return value would simply be 0.0.

For `rvs`, the named function parameter `size` is the number of Multinomial random vectors to generate, and the other inputs are the same. When we typed `rvs(n, p, size=10)` with `n` and `p` as above, `multinomial` gave us the following matrix:

In [4]:
# seed the random number generator
np.random.seed(1234)

rv_vector = multinomial.rvs(n, p, size=10)

print('matrix of Multinomial random vectors has shape {}\n'.format(rv_vector.shape))

print(rv_vector)

matrix of Multinomial random vectors has shape (10, 3)

[[1 2 2]
 [1 3 1]
 [2 1 2]
 [1 3 1]
 [4 1 0]
 [1 2 2]
 [2 2 1]
 [1 2 2]
 [2 0 3]
 [2 3 0]]


Each row of the matrix corresponds to a draw from the $Mult_3\left(5, \, (1/3, 1/3, 1/3)\right)$ distribution. In particular, the sum of each column is 5.

## Multivariate Normal

Functions for the Multivariate Normal distribution are located in the package mvtnorm. Online resources can teach you how to install packages in R for your system, but for many systems an easy way is to use the install.packages command, e.g., by typing install.packages("mvtnorm") to install the mvtnorm package. After installing, load the package with library(mvtnorm). Then dmvnorm can be used for calculating the joint PDF, and rmvnorm can be used for generating random vectors. For example, suppose that we want to generate 1000 independent Bivariate Normal pairs (Z, W), with correlation ρ = 0.7 and N (0, 1) marginals. To do this, we can enter the following:

The covariance matrix here is

\begin{align}
  \begin{pmatrix}
    1 & \rho \\
    \rho & 1
  \end{pmatrix}
\end{align}

because

* $Cov(Z, Z) = Var(Z) = 1$ (this is the upper left entry)
* $Cov(W, W) = Var(W) = 1$ (this is the lower right entry)
* $Cov(Z, W) = Corr(Z, W) \, SD(Z) \, SD(W) = \rho$ (this is the other two entries).

Now r is a 1000 × 2 matrix, with each row a BVN random vector. To see these as points in the plane, we can use plot(r) to make a scatter plot, from which the strong positive correlation should be clear. To estimate the covariance of Z and W, we can use cov(r), which the true covariance matrix.

Example 7.5.10 gives another approach to the BVN generation problem:

This gives the Z-coordinates in a vector z and the W-coordinates in a vector w. If we want to put them into one 1000 × 2 matrix as we had above, we can type cbind(z,w) to bind the vectors together as columns.

----

&copy; Blitzstein, Joseph K.; Hwang, Jessica. Introduction to Probability (Chapman & Hall/CRC Texts in Statistical Science).