# Chapter 5: Continuous Random Variables
 
This Jupyter notebook is the Python equivalent of the R code in section 5.9 R, [Introduction to Probability, 1st Edition](https://www.crcpress.com/Introduction-to-Probability/Blitzstein-Hwang/p/book/9781466575578), Blitzstein & Hwang.

----

In [1]:
import numpy as np

np.random.seed(42)

## Python, SciPy and Matplotlib

In this section we will introduce continuous distributions in Python and SciPy, learn how to make basic plots, demonstrate the universality of the Uniform by simulation, and simulate arrival times in a Poisson process.

## Uniform, Normal, and Exponential distributions 

For [continuous distributions in `scipy.stats`](https://docs.scipy.org/doc/scipy/reference/tutorial/stats/continuous.html), the `pdf` function gives the PDF, the `cdf` function gives the CDF, and the `rvs` function generates random numbers from the continuous distribution. This is in keeping with the application programming interface of the [discrete statistical distributions in `scipy.stats`](https://docs.scipy.org/doc/scipy/reference/tutorial/stats/discrete.html). Thus, we have the following functions: 

### Uniform
[`scipy.stats.uniform`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html#scipy.stats.uniform) provides functions for Uniform continuous random variables. 
* To evaluate the $Unif(a, b)$ PDF at $x$, we use `uniform.pdf(x, a, b)`. 
* For the CDF, we use `uniform.cdf(x, a, b)`. 
* To generate $n$ realizations from the $Unif(a, b)$ distribution, we use `uniform.rvs(a, b, size=n)`. 

In [2]:
from scipy.stats import uniform

#print(uniform.__doc__)

a = 0
b = 4
x = 3
n = 10

print('PDF of Unif({}, {}) evaluated at {} is {}'.format(a, b, x, uniform.pdf(x, a, b)))

print('CDF of Unif({}, {}) evaluated at {} is {}'.format(a, b, x, uniform.cdf(x, a, b)))

print('Generating {} realizations from Unif({}, {}):\n{}'.format(n, a, b, uniform.rvs(a, b, size=n)))

PDF of Unif(0, 4) evaluated at 3 is 0.25
CDF of Unif(0, 4) evaluated at 3 is 0.75
Generating 10 realizations from Unif(0, 4):
[ 1.49816048  3.80285723  2.92797577  2.39463394  0.62407456  0.62397808
  0.23233445  3.46470458  2.40446005  2.83229031]


### Normal
[`scipy.stats.norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy.stats.norm) provides functions for normal continuous random variables. 

* To evaluate the $N(\mu, \sigma^2)$ PDF at $x$, we use `norm.pdf(x, loc, scale)`, where the parameter `loc` corresponds to $\mu$ and `scale` corresponds to standard deviation $\sigma$ (and **not** variance $\sigma^2$).
* For the CDF, we use `norm.cdf(x, loc, scale)`. 
* To generate $n$ realizations from the $N(\mu, \sigma^2)$ distribution, we use `norm.rvs(loc, scale, size=n)`. 

In [3]:
from scipy.stats import norm

#print(norm.__doc__)

mu = 0.0
sigma = 2.0
x = 1.5

print('PDF of N({}, {}) evaluated at {} is {}'.format(mu, sigma, x, norm.pdf(x, mu, sigma)))

print('CDF of N({}, {}) evaluated at {} is {}'.format(mu, sigma, x, norm.cdf(x, mu, sigma)))

print('Generating {} realizations from N({}, {}):\n{}'.format(n, mu, sigma, norm.rvs(mu, sigma, size=n)))

PDF of N(0.0, 2.0) evaluated at 1.5 is 0.15056871607740221
CDF of N(0.0, 2.0) evaluated at 1.5 is 0.7733726476231317
Generating 10 realizations from N(0.0, 2.0):
[-0.93894877  1.08512009 -0.92683539 -0.93145951  0.48392454 -3.82656049
 -3.44983567 -1.12457506 -2.02566224  0.62849467]


&#x2623; 5.9.1 (Normal parameters in `scipy.stats.norm`). Note that we have to input the standard deviation for `scale`, not the variance! For example, to get the $N(10, 3)$ CDF at 12, we use`norm.cdf(12, 10, np.sqrt(3))`. Ignoring this is a common, disastrous coding error.

In [4]:
mu = 10
sigma_sq = 3
sigma = np.sqrt(sigma_sq)
x = 12

print('N({},{}) CDF at {} is {}'.format(mu, sigma, x, norm.cdf(x, mu, sigma)))

N(10,1.7320508075688772) CDF at 12 is 0.8758934605050381


### Exponential
[`scipy.stats.expon`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html#scipy.stats.expon) provides functions for exponential continuous random variables. 

* To evaluate the $Expo(\lambda)$ PDF at $x$, we use `expon.pdf(x, scale=1/lambda)`, where the $\lambda$ corresponds to `scale=1/lambd`.
* For the CDF, we use `expon.cdf(x, scale=1/lambd)`. 
* To generate $n$ realizations from the $Expo(\lambda)$ distribution, we use `expon.rvs(scale=1/lamd, size=n)`.   
* dexp, pexp, rexp. To evaluate the Expo(λ) PDF at x, we use dexp(x,lambda). For the CDF, we use pexp(x,lambda). To generate n realizations from the Expo(λ) distribution, we use rexp(n,lambda).

In [5]:
from scipy.stats import expon

#print(expon.__doc__)

lambd = 2.1
x = 5

expon.pdf(x, scale=1/lambd)

print('PDF of Expo({}) evaluated at {} is {}'.format(lambd, x, expon.pdf(x, scale=1/lambd)))

print('CDF of Expo({}) evaluated at {} is {}'.format(lambd, x, expon.cdf(x, scale=1/lambd)))

print('Generating {} realizations from Expo({}):\n{}'.format(n, lambd, expon.rvs(scale=1/lambd, size=n)))

PDF of Expo(2.1) evaluated at 5 is 5.782654363446904e-05
CDF of Expo(2.1) evaluated at 5 is 0.9999724635506503
Generating 10 realizations from Expo(2.1):
[ 0.1645312   0.21727487  0.2899689   0.73235048  0.1060647   0.34382341
  0.4273832   0.02264945  0.44539668  0.08902917]


Due to the importance of location-scale transformations for continuous distributions, R has default parameter settings for each of these three families. The default for the Uniform is Unif(0, 1), the default for the Normal is N (0, 1), and the default for the Exponential is Expo(1). For example, dunif(0.5), with no additional inputs, evaluates the Unif(0, 1) PDF at 0.5, and rnorm(10), with no additional inputs, generates 10 realizations from the N (0, 1) distribution. This means there are two ways to generate a N (µ, σ2) random variable in R. After choosing our values of µ and σ,

we can do either of the following:

Either way, we end up generating a draw from the N (µ, σ2) distribution.

## Plots in R 

A simple way to plot a function in R is with the curve command. For example,

creates a plot of the standard Normal PDF from −3 to 3. What is actually happening is that R evaluates the function at a finite number of closely spaced points and connects the points with very short lines to create the illusion of a curve. The input n=1000 tells R to evaluate at 1000 points so that the curve looks very smooth; if we were to choose n=20, the piecewise linearity would become very apparent.

Another command that creates plots is called, fittingly, plot. This command has many, many possible inputs to customize what the plot looks like; for the sake of demonstration, we’ll plot the standard Normal PDF once again, using plot instead of curve

The most important inputs to plot are a vector of x values and a vector of y values to plot. A useful command for this purpose is seq. As introduced in Chapter 1, seq(a,b,d) creates the vector of values ranging from a to b, with successive entries spaced apart by d.

So x consists of all numbers from −3 to 3, spaced 0.01 apart, and y contains the values of the Normal PDF at each of the points in x. Now we simply plot the two with plot(x,y). The default is a scatterplot. For a line plot, we use plot(x,y,type="l"). We can also set the axis labels and plot title with xlab, ylab, and main.

The axis limits can be set manually with xlim and ylim. If, for example, you wanted the vertical axis to range from 0 to 1, you would add ylim=c(0,1) inside the plot command.

Finally, to change the color of the plot, add col="orange" or col="green", or whatever your favorite color is!

## Universality with Logistic

We proved in Example 5.3.4 that for U ∼ Unif(0, 1), the r.v. log(U/(1 − U)) follows a Logistic distribution. In R, we can simply generate a large number of Unif(0, 1) realizations and transform them.

Now x contains 104 realizations from the distribution of log(U/(1−U)). We can visualize them with a histogram, using the command hist(x). The histogram resembles a Logistic PDF, which is reassuring. To control how fine-grained the histogram is, we can set the number of breaks in the histogram: hist(x,breaks=100) produces a finer histogram, while hist(x,breaks=10) produces a coarser histogram.

## Poisson process simulation

To simulate n arrivals in a Poisson process with rate λ, we first generate the interarrival times as i.i.d. Exponentials and store them in a vector:

Then we convert the interarrival times into arrival times using the cumsum function, which stands for “cumulative sum”.

The vector t now contains all the simulated arrival times.

----

&copy; Blitzstein, Joseph K.; Hwang, Jessica. Introduction to Probability (Chapman & Hall/CRC Texts in Statistical Science).