#Problem Statement#
Given a random variable $X \sim \mathcal{N}(\mu, \sigma^{2})$, how is it that the random variable $Z = \frac{X-\mu}{\sigma}$ is $\mathcal{N}(0, 1)$? It is this question we are going to answer.

**Intuitive Justification**

Notice that if $\langle X \rangle = \mu$, then $\langle Z \rangle = \frac{\langle X \rangle - \mu}{\sigma^{2}} = 0$. Further, we have $\langle Z^{2} \rangle = \frac{1}{\sigma^{2}}\langle (X-\mu)^{2}\rangle = \frac{\langle X^{2} \rangle - 2 \mu \langle X \rangle + \mu^{2}}{\sigma^{2}} = \frac{\langle X^{2} - \mu^{2}\rangle}{\sigma^{2}} = \frac{Var(X)}{\sigma^{2}} = 1$.

However, just because the expected value and variance of $Z$ match the standard normal distribution does not necessarily mean the _distribution_ of $Z$ is $\mathcal{N}(0,1)$. So while the above provides some justification, we have more work to do.

#The Main Technique - Characteristic Functions#

To demonstrate the result, we will rely on the characteristic function. Given a probability measure $\mu(x)$, the characteristic function $\psi(t)$ is defined as

$$\psi(t) = \langle e^{itX}\rangle = \int e^{itx}~d\mu(x)$$

Assuming $\mu(x) = p(x)~dx$ for some probability density function (pdf) $p$, we have

$$\psi_{X}(t) = \int p(x)e^{itx}~dx$$

You may recognize this as the _Fourier transform_ of the pdf. Hence, we have little to be worried about, since Fourier transforms occur in many diverse branches of mathematics, and are well-studied. Note: We use the subscript $_{X}$ to denote which random variable the characteristic function is representing.


**Joint pdfs $p(x,y)$**

Eventually, we will need to consider joint pdfs $p(x,y)$. Let's show a useful result about characteristic functions when the random variables $X$ and $Y$ are independent and we are considering quanties of the form $X + Y$.

By definition, the characteristic function of $\psi(t)$ is given as 

$$\psi_{X+Y}(t) = \int p(x, y)e^{it(x+y)}~dx~dy$$

Now, under the assumption $X$ and $Y$ are independent, then $p(x, y) = a(x)b(y)$ for some pdfs $a$ and $b$. Plugging this in above, we see

$$\boxed{\psi_{X+Y}(t) = \int a(x)e^{itx}~dx \int b(y)e^{ity}~dy = \psi_{X}(t)\psi_{Y}(t)}$$

Thus, if $X$ and $Y$ are independent random variables, then the characteristic function of their sum is simply the product of their characterstic functions.

**Rescaling**

Another result we will need is to consider random variables of the form $X/c$ for some constant $c$. Plugging this into the definition gives

$$\boxed{\psi_{X/c}(t) = \int p(x)e^{it(x/c)}~dx = \int p(x)e^{i(t/c)x} = \psi_{X}(t/c)}$$

That is, the characteristic function of a (constant) rescaled random variable is simply the characteristic function for that random variable evaluated at a different point.

#First step: $\psi_{X}$ for $X\sim \mathcal{N}(\mu,\sigma^{2})$

First we will simply compute the characterstic function for a normally distributed random variable. That way we have a formula to compare our results to.

By definition, we have

$$\psi_{X}(t) = \frac{1}{\sigma\sqrt{2\pi}}\int e^{-(x-\mu)^{2}/2\sigma^{2}}e^{itx}~dx$$

Making a change of variables $u = x-\mu$ gives

$$\psi_{X}(t) = \frac{e^{it\mu}}{\sigma\sqrt{2\pi}}\int e^{-u^{2}/2\sigma^{2}}e^{itu}~du$$

We then substitute $x  = u /\sigma$:

$$\psi_{X}(t) = \frac{e^{it\mu}}{\sqrt{2\pi}}\int e^{-x^{2}/2}e^{i\sigma tx}~dx$$

To do this integral rigoriously, we would introduce complex variables. Instead, we will ask what happens if we just try to compute it directly. To do this, we complete the square by writing

$$(x - i\sigma t)^{2}/2 = x^{2}/2 -i\sigma t x - (\sigma t)^{2}/2 \implies e^{-x^{2}/2 + i\sigma tx} = e^{-(x-it\sigma)^{2}/2}e^{-(\sigma t)^{2}/2}$$ 

Thus the characteristic function becomes

$$\psi_{X}(t) = \frac{e^{it\mu - \sigma^{2}t^{2}/2}}{\sqrt{2\pi}}\int e^{-(x-i\sigma t)^{2}}~dx$$

Again, a proper treatment would involve doing a complex integral; instead, we make a substitution $u = x - i\sigma t$ to give

$$\boxed{\psi_{X}(t) = \frac{e^{it\mu - \sigma^{2}t^{2}/2}}{\sqrt{2\pi}}\int e^{-u^{2}}~du = e^{it\mu - \sigma^{2}t^{2}/2}}$$

Thus the characteristic function for a single Gaussian random variable is an oscillating function with a Gaussian envelope.


#Step 2: Computing $\psi_{X-\mu}$#

Now that we have $\psi_{X}(t)$, let's compute $\psi_{X-\mu}(t)$. How do we do so? One way is to observe that $\mu$ is a constant, and that constants are special cases of random variables. Further, because $\mu$ is a constant, it is obviously independent of $X$ statistically. How do we factor the joint pdf $p(X,\mu)$? Although it sounds like a slight trick, let's observe that we can write

$$\mu = \int \delta(y - \mu)~dy$$

where $\delta(y-\mu)$ is the [Dirac delta function](https://en.wikipedia.org/wiki/Dirac_delta_function). This suggests we could write $-\mu$ as $\int \delta(y + \mu)~dy$, giving

$$p(x, \mu) = \mathcal{N}(\mu,\sigma^{2})(x)\delta(y+\mu)$$

Applying our result for $\psi_{X+Y}$ for two indpendent random variables, we have

$$\boxed{\psi_{X-\mu}(t) = \psi_{X}(t)\int \delta(y+\mu)e^{ity}~dy = e^{it\mu - \sigma^{2}t^{2}/2}\times e^{-it\mu} = e^{-\sigma^{2}t^{2}/2}}$$

#Step 3: Computing $\psi_{(X-\mu)/\sigma}$#

Since we now have $\psi_{X-\mu}$, all that remains is to divide by $\sigma$. Using the other result above regarding $\psi_{X/c}(t) = \psi_{X}(t/c)$, we find

$$\boxed{\psi_{(X-\mu)/\sigma}(t) = \psi_{X-\mu}(t/\sigma) = e^{-\sigma^{2}(t/\sigma)^{2}/2} = e^{-t^{2}/2}}$$

Observing the function on the right is the characteristic function for a random variable distributed as $\mathcal{N}(0, 1)$, we conclude

$$\psi_{(X-\mu)/\sigma}(t) = \psi_{Z}(t)~~~~Z\sim \mathcal{N}(0,1)$$

#Step 4: Establishing the Equivalence#

We have shown the characteristic function for $\frac{X-\mu}{\sigma}$, where $X\sim \mathcal{N}(\mu, \sigma^{2})$ is the characteristic function for a random variable $Z \sim \mathcal{N}(0, 1)$. Let's now show that the pdfs are equal by taking an inverse Fourier transform:

$$p((X-\mu)/\sigma) = \int e^{-itx}\psi_{(X-\mu)/\sigma}(t)~dt = \int e^{-itx}\psi_{Z}(t)~dt = e^{-x^{2}/2} $$

where the last equality follows because $Z \sim \mathcal{N}(0, 1)$. Hence, because the pdf of $\frac{X-\mu}{\sigma}$ is $\mathcal{N}(0, 1)$, we have demonstrated what we set out to show:


> Given a random variable $X \sim \mathcal{N}(\mu, \sigma^{2})$, the random variable $Z = \frac{X-\mu}{\sigma}$ has the standard normal distribution.