# A procedure to generate samples from the Normal distribution

Suppose we have a method to generate independent samples from the uniform distribution over the interval $[-1,1].$ Then we can generate samples from $N(0,1)$ by repeatedly generating $x, y$ from $\text{Uniform}(-1,1)$ until $(x,y)$ lies within the unit circle, and then returning $z = \frac{2x}{r} \sqrt{-\log r},$ where $r$ is the distance of $(x,y)$ to the origin.

The fact that $z \sim N(0,1)$ follows immediately from the results in the calculation below, where we see that the CDF of z, given by $\mathbb{P}(X\leq X \ | \ S<1)$ in the notation below, is indeed the CDF of the standard normal distribution.


###  Functions of random variables

$u$ and $v$ are independently sampled from the standard uniform distribution on the unit interval, $[0,1]$. 

If $s=(2u-1)^2+(2v-1)^2 \geq 1$ then $x$ is sampled from the standard normal distribution, $\mathcal{N}(0,1)$. Otherwise $x=(2u-1)\sqrt{-2 \log(s)/s}$.

How is $x$ distributed?

### Solution description

We make a substitution $P = 2U-1, Q = 2V-1.$ Then $P, Q$ are i.i.d. samples from $\text{Uniform}(-1,1)$ and $S = P^2 + Q^2.$ If $S\geq 1$ then $X$ is sampled from the standard normal distribution, otherwise $X = P \sqrt{-2 \log(S)/S}.$ The joint distribution of $P,Q$ is the uniform distribution on $[-1,1]^2,$ and has density $f_{P,Q} (p,q) = \frac{1}{4} \cdot \mathbb{1}_{[-1,1]^2} (p,q).$ By definition, for any (Lebesgue measurable) subset $\mathcal{E}\subseteq \mathbb{R}^2,$ we have 
\begin{equation}
\mathbb{P}( (P,Q)\in  \mathcal{E}) = \int_{\mathcal{E}} \frac{1}{4} \cdot \mathbb{1}_{[-1,1]^2} (p,q) dp dq \ \ \  \ \ \ \text{(*)}
\end{equation}

Thus, since $S < 1$ iff $(P,Q)$ lies inside the unit circle, we have $\mathbb{P}(S < 1) = \pi/4.$

The CDF of $X$ can be decomposed as:

\begin{align*}
\mathbb{P}( X \leq x) &= \mathbb{P}( X \leq x, S \geq 1) + \mathbb{P}(X \leq x, S < 1) \\
                      &= \mathbb{P}(S \geq 1) \ \mathbb{P}(X \leq x \ | \ S \geq 1) + \mathbb{P}(X \leq x, S < 1) \\
                      &= (1 - \pi/4) \ \mathbb{P}(X \leq x \ | \ S \geq 1) + \mathbb{P}(X \leq x, S < 1) \\
\end{align*}

The first term is a constant times the CDF of the standard normal, so we can focus on the second term. Again from (*), we have 
$ \mathbb{P}(X \leq x, s < 1) = \frac{1}{4} \cdot \text{Area}\left(\mathcal{R} \right)$

where $$\mathcal{R} = \left\{(p,q)\in [-1,1]^2 \ \big| \ p \sqrt{-2\log(s)/s} \leq x\right\} .$$

In polar coordinates $P = r \cos \theta, Q = r \sin \theta$ this becomes 

$$ \mathcal{R} = \left\{ r\in (0,1), \theta \in (-\pi,\pi] \ \big| \ 2 \cos \theta \sqrt{-\log r} \leq x \right\}$$

By continuity, the boundary $\partial \mathcal{R}$ divides the plane into regions where the inequaity is valid or invalid. It has equation $ 2 \cos \theta \sqrt{-\log r} = x$ which can be arranged as $r = \exp\left(- \left( \frac{x}{2\cos \theta} \right)^2 \right).$ This defines two loops symmetrical about the $y$-axis inside the unit circle $\mathcal{R}_1, \mathcal{R}_2$ that are traced by the curve from $\theta \in (-\pi/2,\pi/2 ]$ and $\theta\in (-\pi, -\pi/2 ] \cup (\pi/2, \pi ] $ respectively.

Lemma: If $x< 0,$ then $\mathcal{R}$ is the region inside the left loop $\mathcal{R}_1,$ and if $x\geq 0,$ then $\mathcal{R}$ is the region outside the right loop $\mathcal{R}_2.$

Proof: For the first case, consider a point $(r, \theta)$ near the boundary of the left loop. Note that $\cos \theta$ is negative here. The value of $2 \cos \theta \sqrt{-\log r}$ is monotonically increasing with $r,$ so the inequality holds as we go further inside the left loop. Similarly, for the second case consider a point $(r, \theta)$ near the boundary of the right loop. Since $\cos \theta$ is positive, the inequality holds as $r$ increases (i.e. go away from the loop).

Let $I(x)$ be the area inside the left loop (or equivalently, the right one). We've established that $\text{Area}(\mathcal{R}) = I(x)$ if $x < 0,$ and $\text{Area}(\mathcal{R}) = \pi - I(x)$ if $x\geq 0.$ Recall that the area inside a closed loop given in polar coordinates is $\int^{\theta_2}_{\theta_1} r^2/2 \ d\theta.$ Then we have 

$$ I(x) = \int^{\pi/2}_{-\pi/2} \frac{1}{2} \exp\left( - \frac{x^2}{2\cos^2 \theta}\right) \ d\theta= \int^{\pi/2}_{0} \exp\left( - \frac{x^2 }{2}\sec^2 \theta\right) \ d\theta.$$

Differentiating under the integral sign gives:

$$ I'(x) = -x \int^{\pi/2}_0 \sec^2 \theta \ \exp\left( - \frac{x^2 }{2}\sec^2 \theta\right) \ d\theta$$

Making the substitution $t = \tan \theta,$ we have 
\begin{align*}
I'(x) &= -x \int^{\infty}_0 \exp\left( - \frac{x^2}{2} (t^2+1) \right) \ dt \\
      &= -x \exp(-x^2/2) \int^{\infty}_0 \exp( -x^2 t^2/2) \ dt \\
      &= - x \exp(-x^2/2) \frac{\sqrt{\pi}}{|x| \cdot \sqrt{2}} \\
      &= - \  \text{sign}(x) \ \frac{\sqrt{\pi}}{\sqrt{2}} \exp(-x^2/2)
\end{align*}

Therefore, $\frac{d}{dx} \text{Area}(\mathcal{R}) = \frac{\sqrt{\pi}}{\sqrt{2}} \exp(-x^2/2).$ Returning to the decomposition of $\mathbb{P}(X \leq x)$ near the start, differentiating gives the density:

\begin{align*}
f_X(x) &= \left( 1 - \frac{\pi}{4} \right) \cdot \frac{1}{\sqrt{2\pi}} \exp(-x^2/2) + \frac{1}{4} \cdot \frac{\sqrt{\pi}}{\sqrt{2}} \exp(-x^2/2) \\
 &= \frac{1}{\sqrt{2\pi}} \exp(-x^2/2)
\end{align*}

So $X$ is distributed as a standard normal.