# Characteristic Functions & Moments
### 1.1 Moments
The $k$th moment of a random variable $X$ with distribution function $f_X$ is defined as:

$$\mu_k = E[X] = \int_{-\infty}^{\infty} x^k f_X (x) dx$$

The first order moment, $\mu_1$, is the **mean** of the random variable $X$. The second order moment, $\mu_2$, is needed to compute the **variance**:

$$Var(X) = E \big[ (X - \mu_1)^2 \big] = E[X^2] - \big( E[X]\big)^2 = \mu_2 - \mu_1^2$$

Moments are specific quantitative measure of the shape of a function. Specifically:
* The _first moment_ $\longrightarrow$ **expected value**
* The _second central moment_ $\longrightarrow$ **variance**
* The _third standardized moment_ $\longrightarrow$ **skewness**
* The _fourth standardized moment_ $\longrightarrow$ **kurtosis**

### 1.2 Characteristic Function 
However, moments of a distribution can be rather difficult to compute. Because of this we often will make use of the **Characteristic Function** of random variable. This function is defined as:

$$\Phi_X (t) = E \big[ e^{itX}\big]$$

Where again our random variable is $X$, and $e^{itX}$ is a function of the random variable, which we can call $g$:

$$g(X) = e^{itX}$$

And the expected value of $g(X)$:

$$E \big[ g(X) \big] = \int_{-\infty}^{\infty} g(x) f_X (x) dx$$

Hence, we can write:

$$\Phi_X (t) = E \big[ e^{itX}\big] = \int_{-\infty}^{\infty} e^{itx} f_X (x) dx$$

Now, based on the definition of $e^x$:

$$e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!} = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots $$

And in our case where $e$ is raised to the $itx$, $e^{itx}$:

$$e^{itx} = \sum_{k=0}^{\infty} \frac{i^k t^k x^k}{k!} = 1 + itx + \frac{i^2 t^2 x^2}{2!} + \dots + \frac{i^k t^k x^k}{k!} $$

We can substitute this expansion in our equation of $\Phi_X$:

$$\Phi_X (t) = E \big[ e^{itX}\big] = 
\int_{-\infty}^{\infty} \Big( 1 + itx + \frac{i^2 t^2 x^2}{2!} + \dots + \frac{i^k t^k x^k}{k!}  \Big) f_X (x) dx$$

We can see that in this representation the density function multiplies each term:

$$\Phi_X (t) = \int_{-\infty}^{\infty} \Big( f_X (x) dx + it \cdot x f_X(x)dx + \frac{i^2 t^2 \cdot x^2 f_X (x) dx}{2!} + \dots + \frac{i^k t^k \cdot x^k f_X (x) dx}{k!}  \Big)$$

From the Sum Rule in Integration, we can apply the integral to each separate term:

$$\Phi_X (t) = \int_{-\infty}^{\infty}  f_X (x) dx + 
\int_{-\infty}^{\infty}  it \cdot x f_X(x)dx + 
\int_{-\infty}^{\infty} \frac{i^2 t^2 \cdot x^2 f_X (x) dx}{2!} + \dots +
\int_{-\infty}^{\infty}  \frac{i^k t^k \cdot x^k f_X (x) dx}{k!}  $$

Keeping in mind that we are integrating with respect to $x$, both $i$ and $t$ can be pulled out and treated as constants, leaving us with expressions for our moments: 

$$\Phi_X (t) = \int_{-\infty}^{\infty}  f_X (x) dx + 
 it \overbrace{ \int_{-\infty}^{\infty} x f_X(x)dx}^{E[X]} + 
i^2 t^2 \overbrace{ \int_{-\infty}^{\infty} \frac{x^2 f_X (x) dx}{2!} }^{E[X^2]} + \dots +
i^k t^k \overbrace{ \int_{-\infty}^{\infty}  \frac{x^k f_X (x) dx}{k!} }^{E[X^k]}  $$

$$\Phi_X (t) = 1 + it E[X] + \frac{i^2 t^2 E[X^2]}{2!} + \dots + \frac{i^k t^k E[X^k]}{k!}$$

Hence, we have shown that our characteristic function has all the moments embedded in it! It is worth noting that outside of probability theory the characteristic function is known as the **Fourier Transform**; so a characteristic function can be viewed as a Fourier Transform of it's distribution.

### 1.3 Computing Moments via Characteristic Function
#### Zeroth Moment $\longrightarrow$ Evaluate $\Phi_X(0)$ 
So, how exactly do we compute our moments via the distrubtion function? Well, let us first look at what happens when we evaluate $\Phi_X$ at $0$:

$$\Phi_X (0) = 1 + i\cdot0\cdot E[X] + \frac{i^2 \cdot 0^2 \cdot E[X^2]}{2!} + \dots + \frac{i^k \cdot 0^k \cdot E[X^k]}{k!} $$

$$\Phi_X (0) = 1 = \int_{-\infty}^{\infty} f_X (x) dx $$

So we can see that our zeroth moment is simply $1$, the area under the distribution. 

#### First Moment $\longrightarrow$ Evaluate $\Phi'_X(0)$
Now let's evaluate the first derivative of $\Phi_X$, $\Phi'_X$, with respect to $t$, at 0:

$$\Phi'_X (t) = \frac{d \Phi_X(t)}{dt} = 0 + iE[X] +\frac{2i^2 t E[X^2]}{2!} + \dots + \frac{k i^k t^{k-1} E[X^k]}{k!}$$

$$ \frac{d \Phi_X(0)}{dt} = iE[X] $$

$$E[X] = \frac{ \frac{d \Phi_X(0)}{dt} }{ i } = \frac{\Phi'_X(0)}{i}$$

Where above we have found an expression for our first moment. 

#### Second Moment $\longrightarrow$ Evaluate $\Phi''_X(0)$
Now if we evaluate the second derivate of $\Phi_X$ with respect to $t$ at $0$:

$$\Phi''_X (t) = \frac{d^2 \Phi_X(t)}{dt^2} = 0 +\frac{2i^2 E[X^2]}{2!} + \dots + \frac{k \cdot(k-1) i^k t^{k-2} E[X^k]}{k!}$$

$$\Phi''_X (0) = 0 +\frac{2i^2 E[X^2]}{2!} + \dots + \frac{k \cdot(k-1) i^k 0^{k-2} E[X^k]}{k!}$$

$$\Phi''_X (0) = i^2 E[X^2] $$

$$ E[X^2] = \frac{\Phi''_X (0)}{ i^2}$$

And we have our second moment.

#### Kth Moment $\longrightarrow$ Evaluate $\Phi^k_X(0)$
In order to find the $k$th moment we just continue the process started about, taking the $k$th derivative of $\Phi_X$ and evaluating at $0$:

$$ E[X^k] = \frac{\Phi^k_X (0)}{ i^k}$$

### 1.4 Characteristic Function Example - Poisson Distribution

#### Find characteristic function
Let us have a poission distributed random variable, $X$:

$$f(x; \lambda) = P(X = x) = \frac{\lambda^x e^{-\lambda}}{x!}$$

Where $\lambda$ is the distribution parameter. Let us start by finding the characteristic function:

$$\Phi_X(t) = E \big[ e^{itX} \big] = \sum_{x=0}^{\infty} e^{itx}  P(X = x) =
\sum_{x=0}^{\infty} e^{itx}  \frac{\lambda^x e^{-\lambda}}{x!} $$

$$\Phi_X(t) =  e^{-\lambda} \sum_{x=0}^{\infty} \frac{(\lambda e^{i t})^x}{x!} $$

Taking a moment to recall the definition of an exponential function:

$$e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!}$$

We can see that our function above has the same form: 

$$e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!} \longleftrightarrow \sum_{x=0}^{\infty} \frac{(\lambda e^{i t})^x}{x!}$$

Where on the left $x$ was the exponent that $e$ was raised to and it becomes the base in the summation, on the right our base is $\lambda e^{i t}$, hence that must have been the exponent that $e$ was raised to:

$$e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!} \longleftrightarrow \sum_{x=0}^{\infty} \frac{(\lambda e^{i t})^x}{x!} = e^{\lambda e^{i t}}$$

Hence, our characteristic function is:

$$\Phi_X(t) =  e^{-\lambda} e^{\lambda e^{i t}}$$

#### Find first moment
We can use this formula to find the mean of our random variable $X$, recalling our derived property from earlier:

$$E[X] = \frac{\Phi'_X(0)}{i}$$

First we take the derivative with respect to $t$. We start with a slight rearrangment. 

$$\Phi_X(t) =  e^{-\lambda} e^{\lambda e^{i t}} = e^{\lambda e^{i t} -\lambda} = 
e^{- \lambda (- e^{i t} + 1)} = e^{- \lambda (1- e^{i t})} $$

And now take the derivative:

$$\Phi_X'(t) = e^{- \lambda (1- e^{i t})} \cdot \frac{d \big( - \lambda (1- e^{i t}) \big)}{dt} = 
e^{- \lambda (1- e^{i t})} \cdot \lambda e^{i t} \cdot i 
$$

Evaluating at $0$:

$$\Phi_X'(0) = e^{- \lambda (1- e^{i \cdot 0})} \cdot \lambda e^{i \cdot 0} \cdot i = i \lambda$$

And, based on our property $E[X] = \frac{\Phi'_X(0)}{i}$, we can see that:

$$E[X] = \frac{i \lambda}{i} = \lambda$$

With that we have arrived at the mean of the poisson distributed random variable $X$ (see [here](https://en.wikipedia.org/wiki/Poisson_distribution#Probability_mass_function) for proof of correctness).

#### Find second moment
To find the variance via the second moment we simply need to take the derivative once again. 

$$\Phi_X'(t) = \lambda i e^{i t} \cdot e^{- \lambda (1- e^{i t})} = 
\lambda i \big( e^{i t} \cdot e^{- \lambda (1- e^{i t})} \big)
$$

Using the product rule:

$$\Phi_X''(t) = \lambda i 
\Big [  
i e^{it} \cdot e^{- \lambda (1- e^{i t})}  + e^{it} \cdot e^{- \lambda (1- e^{i t})} \cdot \lambda e^{i t} \cdot i 
\Big ]
$$

And evaluating at $0$:

$$\Phi_X''(0) = \lambda i 
\Big [  
i e^{i\cdot 0} \cdot e^{- \lambda (1- e^{i \cdot 0})}  + e^{i \cdot 0} \cdot e^{- \lambda (1- e^{i \cdot 0})} \cdot \lambda e^{i \cdot 0} \cdot i 
\Big ] = \lambda i^2 \cdot (1 + \lambda ) 
$$

And, recalling our derived property from earlier:

$$ E[X^2] = \frac{\Phi''_X (0)}{ i^2}$$

We can see that:

$$ E[X^2] = \frac{\Phi''_X (0)}{ i^2} = \frac{\lambda i^2 \cdot (1 + \lambda ) }{i^2} = \lambda + \lambda^2$$

With this second moment handy, our variance is:

$$Var(X) = E[X^2] - \big(E[X] \big)^2 = \lambda + \lambda^2 - \lambda^2 = \lambda$$

Therefore in the case of Poisson random variables the mean and variance are the same. 

### 1.4 Useful properties of characteristic function
The most useful property of a characteristic function is that the sum of **independent random variables** is the product of their individual characteristic functions. Let:

$$S = X + Y$$

$$p_S(s) = \overbrace{\int p_X(u) p_Y(s - u)du }^\text{convolution integral}$$

Where above we are looking at the probability that $X$ takes on the value $u$ and the probability $Y$ takes on the value $s - u$, meaning that the sum of $x$ and $y$ will be $s$ as desired, $u + (s - u) = s$. $X$ and $Y$ are independent random variables by hypothesis, the joint probability is simply the product of those two probabilities. The reason that we integrate over $u$ is made clear if we simply realize that we are trying to capture $p_S$, and until we integrate we are only looking at a single, arbitrary value of $u$. Integrating over $u$ allows us to evaluate the expression for all possible values of our sum $s$. 

$$\Phi_S(t) = \Phi_X(t) \Phi_Y(t)$$

Where the last line follows immediately from the [fourier convolution theorem](https://en.wikipedia.org/wiki/Convolution_theorem) (in fact it _is_ the fourier convolution theorem):

$$\mathcal{F} \big\{ \overbrace{(f * g)(t)}^\text{convolution} \big\} = \overbrace{ \mathcal{F} \big\{ f(t) \big\} \mathcal{F} \big\{ g(t) \big\} }^\text{mutliplication in transform domain}$$

What this theorem says is that rather than do this complicated convolution integral, if we know the characteristic functions of $X$ and $Y$, then we just need to multiply them together as simple functions of $t$ and that will be the characteristic function $s$ (as a function of $t$). 

##### Proof of fourier convolution theorem
A quick proof of the fourier convolution theorem for reference. We start with a Fourier transform pair 

$$
\overbrace{ \Phi_X(t) = \int_{-\infty}^{\infty}e^{itx} p_X(x)dx } ^\text{characteristic function of x (fourier transform)} 
$$

$$
\overbrace{ p_X(x) = \frac{1}{2 \pi} \int_{-\infty}^{\infty} \Phi_X(t) e^{-itx} dt}^\text{inverse fourier transform} 
$$

And our proof proceeds as:

$$p_S(s) = \int_{-\infty}^{\infty} p_X(u) p_Y(s - u)du$$

We can substitute the inverse fourier formula for $p_Y(s - u)$:

$$p_S(s) = \int_{-\infty}^{\infty} p_X(u) \Big[\frac{1}{2 \pi} \int_{-\infty}^{\infty} \Phi_Y(t) e^{-it(s-u)} dt \Big] du$$

We then interchange the order of integration and rearrange some terms:

$$
p_S(s) = 
\frac{1}{2 \pi} \int_{-\infty}^{\infty}\Phi_Y(t) e^{-its}  
\Big[ \overbrace{ \int_{-\infty}^{\infty}  p_X(u) e^{itu} du }^\text{characteristic function of $X$} \Big] 
dt
$$

So we have proved the following relation:

$$
p_S(s) = 
\frac{1}{2 \pi} \int_{-\infty}^{\infty}\Phi_Y(t)\Phi_X(t) e^{-its}  
dt
$$

Now, what exactly is the above relation? It is simply an inverse fourier transform! Hence, we have completed our proof:

$$\Phi_S(t) = \Phi_X(t) \Phi_Y(t)$$

### 1.5 Moment Generating Function
Now, a [**moment generating function**](https://en.wikipedia.org/wiki/Moment-generating_function) of a random variable $X$ is:

$$M_X(t) = E \big[ e^{tX} \big]$$

The moment generating function is so named because it can be used to find the moments of the distribution. If we look at the series expansion of $e^{tX}$:

$$e^{tX} = 1 + tX + \frac{t^2 X^2}{2!} +  \frac{t^3 X^3}{3!} + \dots +  \frac{t^n X^n}{n!}$$

Hence, due to linearity of expectation:

$$M_X(t) = E \big[ e^{tX} \big] = 1 + t E[X]+ \frac{t^2 E[X^2]}{2!} +  \frac{t^3 E[X^3]}{3!} + \dots +  \frac{t^n E[X^n]}{n!}$$

We are able to recover all moments of a distribution by differentiatin $M_X(t)$ $i$ times with respect to $t$ and setting $t=0$, we obtain the $i$th moment about the origin. 

##### Example
See [here](https://www.le.ac.uk/users/dsgp1/COURSES/MATHSTAT/5binomgf.pdf).

### 1.5 Generating Function
Now, having talking about generating functions, we should define what mean exactly. In mathematics, a [**generating function**](https://en.wikipedia.org/wiki/Generating_function) is a way of encoding an infinite sequence of numbers, $a_n$, by treating them as the coefficients of a power series. This formal power series is the generating function. Polya put this most acurately: 

> "A generating function is a device somewhat similar to a bag. Instead of carrying many little objects detachedly, which could be embarrassing, we put them all in a bag, and then we have only one object to carry, the bag."

#### Ordinary Generating Function
The _ordinary generating function_ of a sequence $a_n$ is:

$$G(a_n ; x) = \sum_{n=0}^{\infty} a_n x^n$$

Where if $a_n$ is the probability mass function of a discrete random variable, then this is called a _probability generating function_.

#### Probability Generating Function
The [**probability generating function**](https://en.wikipedia.org/wiki/Probability-generating_function) of a discrete random variable is a [power series](https://en.wikipedia.org/wiki/Power_series) representation (the generating function) of the probability mass function of the random variable. 

If $X$ is a discrete random variable (non negative), then the probability generating function of $X$ is defined as:

$$G(z) = E \big[ z^X \big] = \sum_{x=0}^{\infty} p(x) z^x$$

Where $p$ is the probability mass function of $X$. 

##### Example
Consider for a moment a binomial random variable, with a PMF:

$$f(x, n, p) = \binom{n}{x} p^x (1-p)^{n-x} $$

Where $x$ is the number of successes in $n$ trials, with probability of success $p$ in each trial. It's probability generating function is:

$$G(z) = E(z^X) = \sum_{x=0}^{n} f(x, n, p) z^x$$

$$G(z) = \sum_{x=0}^{n} f(x, n, p) z^x = f(0, n, p) z^0 + f(1, n, p) z^1 + f(2, n, p)z^2 + \dots + f(n, n, p) z^n $$

Where, since $f(x, n, p)$, for a given number of successes, $x$, is equivalent to $ \binom{n}{x}  p^x(1-p)^{n-x}$:

$$G(z) = \sum_{x=0}^{n} \binom{n}{x}  p^x(1-p)^{n-x} z^x  = \sum_{x=0}^{n} \binom{n}{x}  (pz)^x(1-p)^{n-x} $$

And, via the [binomial theorem](https://en.wikipedia.org/wiki/Binomial_theorem) we can write this as:

$$G(z) = \sum_{x=0}^{n} \binom{n}{x}  (pz)^x(1-p)^{n-x} = \Big( (1-p) + pz \Big)^n$$

$$G(z) = \Big( (1-p) + pz \Big)^n$$


#### Relationship between Probability Generating Function and Moment Generating Function
Since the probability generating function is defined as:

$$G(z) = E \big[ z^X \big] = \sum_{x=0}^{\infty} p(x) z^x$$

And the moment generating function:

$$M_X(t) = E \big[ e^{tX} \big]$$

We can then define:

$$log(z) = t$$

So that $e^t = z $. Then:

$$G(z) = E[z^X] = E[e^t]^X = E[e^{tX}] = M_X(t) = M_X(logz)$$

Hence, the relationsip is simply:

$$G(z) = M_X (log(z))$$

Summarized as:

> The probability generating function is usually used for (nonnegative) integer valued random variables, but is really only a repackaging of the moment generating function. So the two contains the same information.


#### References
* https://youtu.be/_DWnI-gk0ys?t=1021