**From a series of application of modern interpolation methods for economics: written by [Mahdi E Kahou](https://sites.google.com/site/mahdiebrahimikahou/about-me)**


# Goal of this notebook  

The purpose of this notebook is to demonstrate:  
1. How to numerically calculate the expectation of a univariate function $v(x)$ with respect to a normal distribution.  
2. How to apply the Gauss–Hermite quadrature method for this task.  
3. Since these lecture notes focus on machine learning methods, we will emphasize the case where $v(x)$ is represented by a neural network.

 ---

## The Problem

Consider a function $v: \mathbb{R} \to \mathbb{R}$.  
We are interested in numerically computing  

$$
\begin{align*}
\mathbb{E}[v(X)] = \int_{-\infty}^{\infty} v(x) f(x;\mu,\sigma) dx
\end{align*}
$$

where $f(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ is the pdf of a normal distribution $\mathcal{N}(\mu,\sigma^2)$.

---

## Gauss–Hermite quadrature

This method approximates the following integral by calculating a weighted sum of the function values at specific points, called **nodes**:

$$
\begin{align*}
\int_{-\infty}^{\infty} h(z) e^{-z^2} \, dz \;\approx\; \sum_{i=1}^n w_i \, h(z_i)
\end{align*}
$$

- $n$: the number of nodes  
- $w_i$: the quadrature weights  
- $z_i$: the nodes, which are the roots of the *physicists’ version* of the [Hermite polynomial](https://en.wikipedia.org/wiki/Hermite_polynomials) of degree $n$.  

---

**Important note:** Our original goal is to calculate  

$$
\begin{align*}
\frac{1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^{\infty} v(x)\, e^{-\frac{(x-\mu)^2}{2\sigma^2}} \, dx,
\end{align*}
$$  

which involves a normal distribution. The Gauss–Hermite formula above looks slightly different because it is tailored to integrals of the form  

$$
\begin{align*}
\int_{-\infty}^{\infty} h(z) e^{-z^2} dz.
\end{align*}
$$  

We will reconcile the two expressions by applying an appropriate change of variables.



## Going from $\int_{-\infty}^{\infty} h(z) e^{-z^2} \, dz$ to $\frac{1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^{\infty} v(x) e^{-\tfrac{(x-\mu)^2}{2\sigma^2}} \, dx$

Consider the change of variable  
$$
z = \frac{x - \mu}{\sqrt{2}\sigma}.
$$  

Then  
$$
x = \mu + \sqrt{2}\sigma z, \qquad dx = \sqrt{2}\sigma \, dz.
$$  

With this substitution, we can write  
$$
\frac{1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^{\infty} v(x) 
e^{-\tfrac{(x-\mu)^2}{2\sigma^2}} \, dx
= \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\infty} 
v(\mu + \sqrt{2}\sigma z) e^{-z^2} \, dz.
$$  

Define  
$$
\tilde{v}(z) \equiv \frac{1}{\sqrt{\pi}} v(\mu + \sqrt{2}\sigma z).
$$  

Then  
$$
\mathbb{E}[v(X)] 
= \int_{-\infty}^{\infty} \tilde{v}(z) e^{-z^2} \, dz
\approx \frac{1}{\sqrt{\pi}} \sum_{i=1}^n w_i \, v(\mu + \sqrt{2}\sigma z_i),
$$  
where the last equality follows from Gauss–Hermite quadrature.

---

Let's implement this in Python:


# importing packages we need

In [1]:
import matplotlib.pyplot as plt
import numpy as np

In [2]:
def Ev(v,μ,σ,n):
    #v: function, μ : mean, σ: stdev, n: number of nodes
    nodes, weights = np.polynomial.hermite.hermgauss(n)
    nodes_scaled = μ + (nodes*σ*np.sqrt(2))
    weights_scaled = weights/np.sqrt(np.pi)
    function_values = v(nodes_scaled)
    approx_mean = np.sum(function_values*weights_scaled)
    return approx_mean

## Simple examples

### Example 1: Linear functions

Consider the function  
$$
v(x) = a x + b.
$$  

Suppose $X \sim \mathcal{N}(\mu,\sigma^2)$. We want to compute the expectation $\mathbb{E}[v(X)]$. 

The closed-form solution is  
$$
\mathbb{E}[v(X)] = a \mu + b.
$$

In [3]:
def v_1(x):
    a = 2
    b = 1
    return  a*x+b

def true_mean_v_1(a,b,μ,σ):
    true_mean = a*μ + b
    print("True mean =", true_mean)

In [4]:
print("Approximate mean =", Ev(v = v_1, μ = 1, σ = 2, n = 3))

Approximate mean = 3.0


In [5]:
true_mean_v_1(a = 2,b = 1, μ = 1, σ = 2)

True mean = 3


### Example 2: Quadratic Functions

Consider the function  
$$
v(x) = a x^2 + bx + c
$$  

We want to compute the expectation $\mathbb{E}[v(X)]$. 

The closed-form solution is  
$$
E[v(X)] = a(\mu^2+\sigma^2)+ b\mu+c.
$$




In [6]:
def v_2(x):
    a = 1.1
    b = -0.1
    c = 0.8
    return (a*x**2)+(b*x)+c

def true_mean_v_2(a, b, c, μ, σ):
    true_mean =  a*(σ**2 + μ**2) + b*μ + c
    print("True mean =", true_mean)

In [7]:
print("Approximate mean =", Ev(v = v_2, μ = 0.2, σ = 0.5, n = 3))

Approximate mean = 1.0990000000000002


In [8]:
true_mean_v_2(a = 1.1, b = -0.1, c = 0.8, μ = 0.2, σ = 0.5)

True mean = 1.0990000000000002


### Example 3: A function with a kink  
Let's try something **more challenging**. 

Consider the function $v(x) = \max\{0, x\}$, with $\mu = 0$. 

The closed-form solution is  
$$
E[v(X)] = \frac{\sigma}{\sqrt{2\pi}}.
$$

In [11]:
def v_3(x):
    return np.maximum(0.0, x)

def true_mean_v_3(σ):
    true_mean =  σ/np.sqrt(2*np.pi)
    print("True mean =", true_mean)

In [10]:
print("Approximate mean =", Ev(v = v_3, μ = 0, σ = 2, n = 3))

Approximate mean = 0.5773502691896258


In [13]:
true_mean_v_3(σ = 2)

True mean = 0.7978845608028654


With nonsmooth functions, such as those with a kink, Gauss–Hermite quadrature performs poorly when using only a few nodes.

In general, the performance of quadrature methods depends on the smoothness of the function. For more details, [Art Owen's lecture notes](https://artowen.su.domains/mc/Ch-quadrature.pdf)

Let's increase the number of nodes:

In [32]:
print("Approximate mean =", Ev(v = v_3, μ = 0, σ = 2, n = 300))

Approximate mean = 0.7989796104552142


## Implementing it for a neural network
From a theoretical perspective, there is nothing special about neural networks. They are just a *function*. So one should be able to do what I just did above very easily.

The problem arises because of computational reasons. Usually the data points are in a batch, and you have to broadcast it ... fix it later



I have an application of this sort in mind. Consider the following example:  

We want to compute the expectation  

$$
\mathbb{E}[v(K, x') \mid x],
$$  

where $K \in \mathbb{R}^k$ denotes other inputs to $v$, which we treat as fixed.  

$$
x' = \rho x + \epsilon, \quad \epsilon \sim \mathcal{N}(0,\sigma^2).
$$  

Therefore, conditional on \(x\), we have  

$$
x' \mid x \sim \mathcal{N}(\rho x, \sigma^2).
$$  




When training a neural network, a `batch` of size $b$ looks like this

$$ \text{batch} = 
\begin{align*}
\begin{bmatrix}
K_{11} & K_{21} & \cdots & K_{k1} & x_1\\
K_{12} & K_{22} & \cdots & K_{k2} & x_2\\
\vdots  & \vdots & \ddots & \vdots & \vdots\\
K_{1B} & K_{2B} & \cdots & K_{kB} & x_B
\end{bmatrix}
\end{align*}
$$