# Lecture 16 - Uncertainty Propagation: Polynomial Chaos I

## Objectives

+ Introduce quadrature rules in 1D and in particular nested quadrature rules.
+ Expand quadrature rules in multiple dimensions using sparse grids.
+ Solve stochastic dynamical systems using intrusive polynomial chaos.

## Readings

+ These notes.
+ Sullivan, Chapter 9.


In [None]:
import numpy as np
import math
import scipy.stats as st
import scipy
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
mpl.rcParams['figure.dpi'] = 300
import seaborn as sns
sns.set_style('white')
sns.set_context('talk')
import design
import warnings
warnings.filterwarnings('ignore')
import orthpol  # This is the package we will use to construct orthogonal polynomials

## Quadrature Rules

### Simple Quadrature
Consider the problem of evaluating an integral:
$$
I = \int_a^b f(x) dx,
$$
where $a < b$.
A *quadrature rule* is an approximation to that integral of the form:
$$
Q(f) = \sum_{k=1}^nw_kf(x_k).
$$
The $x_k$'s are the *nodes* of the rule and the $w_k$'s are the *weights* of the rule.

### Newton-Cotes rule (don't use it)
Let's introduce this one by one.
This rule is constructed as follows:

+ Pick equidistant points in $[a,b]$:
$$
x_k = a + h k,
$$
for $k=0,\dots,n+1$, where $h = \frac{b-a}{n+1}$.

+ Approximate $f$ using the [Lagrange polynomials](https://en.wikipedia.org/wiki/Lagrange_polynomial):
$$
f(x) \approx \sum_{k=1}^n f(x_k)\ell_k(x).
$$

+ Approximate the integral by:
$$
Q_{nc}(f) = \sum_{k=1}^n \int_a^b\ell_k(x)dx \cdot f(x_k),
$$
i.e.,
$$
w_k = \int_a^b\ell_k(x)dx.
$$

In other words, the Newton-Cotes rule approximate the integral with the integral of the Lagrange polynomial that approximates the function based on these points.

In [None]:
# The function we will integrate
f = lambda x: np.cos(x * 3)

# Pick Newton-Cotes quadrature points
nq = 3
X = np.linspace(-1, 1, nq)

# Get the Lagrange interpolating polynomial
Lf = scipy.interpolate.lagrange(X, f(X))

# Visualize the actual function and the Lagrange interpolating polynomial
fig, ax = plt.subplots()
x = np.linspace(-1, 1, 200)

ax.plot(x, f(x))
ax.fill_between(x, np.zeros(x.shape), f(x), alpha=0.25)
ax.plot(x, Lf(x))
ax.fill_between(x, np.zeros(x.shape), Lf(x), alpha=0.25)
ax.plot(X, f(X), '.')
ax.set_xlabel('$x$')
ax.set_ylabel('$f(x)$')
ax.set_title('$n=%d$' % nq)
plt.legend(['$f(x)$', '$L_f(x)$'], loc='best');

### Questions

+ Change the function above to $f(x) = H(x)$ (step function) and see that the Newton-Cotes rule has trouble approximating the integral no matter how many quadrature points you use.

### Gaussian quadrature (don't use it)

This is just like the Newton-Cotes rule, but instead of equidistant nodes it uses nodes that are the zeros of the $n$-degree orthogonal polynomial with weight $w(x) = 1$.

**Note:** You can generalize the Gaussian quadrature rule, for other weights.

**Note:** Gaussian quadrature integrates exactly polynomials up to degree $2n + 1$ and there is no other quadrature rule with $n$ points that can do that.

In [None]:
# The function we will integrate
f = lambda x: np.cos(x * 3)


# Pick Newton-Cotes quadrature points
nq = 2
# Get the roots of the nq - 1 degree polynomial with w[x] = 1 in [-1, 1] (Legendre)
Xs, ws = scipy.special.roots_legendre(nq-1)
# Get the roots of the nq degree polynomial
X, w = scipy.special.roots_legendre(nq)

# Get the Lagrange interpolating polynomial
Lf = scipy.interpolate.lagrange(X, f(X))

# Visualize the actual function and the Lagrange interpolating polynomial
fig, ax = plt.subplots()
x = np.linspace(-1, 1, 200)

ax.plot(x, f(x))
ax.fill_between(x, np.zeros(x.shape), f(x), alpha=0.25)
ax.plot(x, Lf(x))
ax.fill_between(x, np.zeros(x.shape), Lf(x), alpha=0.25)
ax.plot(X, f(X), '.')
#ax.plot(Xs, f(Xs), '.')
ax.set_xlabel('$x$')
ax.set_title('$n=%d$' % nq)
ax.set_ylabel('$f(x)$')
plt.legend(['$f(x)$', '$L_f(x)$'], loc='best');

### Questions

+ Notice that the Gaussian quadrature nodes are not nested. Every time you increase the number of quadrature points, you get completely different nodes. This means that you cannot reuse the function evaluations you have seen so far.
+ Change the function above to $f(x) = H(x)$ (step function) and see that the Gaussian quadrature rule has less trouble approximating the integral no matter how many quadrature points you use.
+ Try $f(x) = \cos(10 x) * \exp\{-(10x)^2/2\}$ to see that some problems do persist.

### Clenshaw-Curtis Quadrature

We are looking for a quadrature rule with nested nodes. As we increase the number of points, we would like to be able to reuse the function evaluations.

The derivation is quite involved, but it goes like this.
Let us look at the case $a=-1, b=1$.
First, we transform the integral by setting $x = \cos\theta$:
$$
\int_{-1}^1 f(x)dx = \int_0^\pi f(\cos\theta)\sin(\theta)d\theta.
$$
Then, expand the $f(\cos\theta)$ in cosine series:
$$
f(\cos\theta) = \frac{a_0}{2} + \sum_{k=1}^\infty a_k \cos(k\theta),
$$
where
$$
a_k = \frac{2}{\pi}\int_0^\pi f(\cos\theta)\cos(k\theta)d\theta.
$$
From this, we get:
$$
\int_0^\pi f(\cos\theta)\sin\theta d\theta = a_0 + \sum_{k=1}^\infty\frac{2a_k}{1 - (2k)^2}.
$$
Now, using the [Nyquist-Shannon sampling theorem](https://en.wikipedia.org/wiki/Nyquist–Shannon_sampling_theorem) from signal processing, we see that we can evaluate the coefficients $a_k$ for $k\le n$ **exactly** if we evaluate $f(\cos\theta)$ at $n+1$ equidistant nodes $\theta_j = \frac{j\pi}{n}, j=0,\dots,n$. It is:
$$
a_k = \frac{2}{n}\left((-1)^k\frac{f(-1)}{2} + \frac{f(1)}{2} + \sum_{j=1}^{n-1}f\left(\cos\frac{j\pi}{n}\right)\cos\frac{kj\pi}{n}\right).
$$
So, we see that the nodes are:
$$
x_j = \cos\frac{j\pi}{n},
$$
$j=0,1,\dots,n$ and that they are indeed nested (double $n$).
To get the weigths, you re-arrange terms and you try to identify the $w_j$'s.

**Note:** Clenshaw-Curtis quadrature integrates exactly polynomials up to degree $n+1$.

**Note:** We will only be using the Clenshaw-Curtis quadrature from now on.

In [None]:
# The function we will integrate
#f = lambda x: np.cos(x * 5)

# Pick Newton-Cotes quadrature points
fig, ax = plt.subplots()
for l in [4, 3, 2, 1]:
    X, w = design.sparse_grid(1, l, rule='CC') # CC = Clenshaw-Curtis
    ax.plot(X, np.zeros(X.shape[0]), '.')
ax.set_xlabel('$x$')
ax.set_ylabel('$f(x)$')

### Example: Verify that the Legendre Polynomials are Orthogonal

Let $X\sim\mathcal{U}(-1, 1)$. The orthogonal polynomials in this case are known as the [Laguerre polynomials](https://en.wikipedia.org/wiki/Laguerre_polynomials).
They are known analytically.
The first few are:
$$
\begin{array}{ccc}
\phi_1(x) &=& 1,\\
\phi_2(x) &=& x,\\
\phi_3(x) &=& \frac{1}{2}\left(3x^2 -1\right).
\end{array}
$$

We constructed them we ``orthpol`` in the previous handout.
Let's verify now that they are also orthogonal by using a CC quadrature rule.

In [None]:
# Now, instead of the random variable let's use the the p(x) derectly
p = lambda x: 0.5
# The maximum polynomial degree you want
degree = 4
# Construct the orthogonal polynomials
Phi_set = orthpol.OrthogonalPolynomial(degree,
                                       wf=p,    # The weight function (or pdf)
                                       left=-1, # The left bound
                                       right=1  # The right bound
                                       )

# Plot them
fig, ax = plt.subplots()
# Evaluate the orhtogonal polynomials on all these x's
x = np.linspace(-1, 1, 200)
phi_x = Phi_set(x)    # 200 x (degree + 1)
# Plot each one of them
ax.plot(x, phi_x);
ax.set_xlabel('$x$')
ax.set_ylabel('$\phi_i(x)$')
ax.set_title('$X\sim\mathcal{U}(-1, 1)$: Legendre Polynomials')
plt.legend(['$\phi_{%d}(x)$' % i for i in range(1, degree + 1)], loc='best');

In [None]:
X, w = design.sparse_grid(1, 5, 'CC') # w(x) = 1 and x in [-1, 1] for this one
w = w / 2.  # We need to normalize the weights
phi_q = Phi_set(X)
for i in range(phi_q.shape[1]):
    for j in range(i, phi_q.shape[1]):
        print '<%d, %d> \t= %1.3f' % (i, j, np.sum(w * phi_q[:, i] * phi_q[:, j]))

### Evaluating Arbitrary Expecations with the CC Quadrature Rule

The CC rule can only evaluate integrals of the form $\int_{-1}^1f(x)dx$.
Let's see how we can extend it to the evaluation of arbitrary expectations of the form:
$$
\mathbb{E}[f(X)]:=\int_{-\infty}^\infty f(x)p(x)dx.
$$
Let $F(x)$ be the CDF of $p(x)$ and define the transformation:
$$
z = 2F(x) - 1.
$$
The inverse, of course, is:
$$
x = F^{-1}\left(\frac{z+1}{2}\right).
$$
Notice that $z\in [-1,1]$.
We also have that:
$$
dz = 2F'(x)dx = 2p(x)dx
$$
and that as $x\rightarrow \pm \infty$ we get that $z\rightarrow\pm 1$.
Therefore, we can rewrite the expcation as:
$$
\mathbb{E}[f(X)] = \frac{1}{2}\int_{-1}^1 f\left(F^{-1}\left(\frac{z+1}{2}\right)\right)dz
$$
Now, if $z_k$ and $v_k$, $k=1,\dots,n$ are nodes and weights for the common CC rule, we get the following quadrature rule for our special case:
$$
w_k = \frac{1}{2}v_k,
$$
and
$$
x_k = F^{-1}\left(\frac{z_k+1}{2}\right).
$$

Let's try it out by testing the Hermite polynomials.

### Example 1: The Standard Normal and the Hermite Polynomials

Let $X\sim\mathcal{N}(0,1)$. The orthogonal polynomials in this case are known as the [Hermite polynomials](https://en.wikipedia.org/wiki/Hermite_polynomials).
They are known analytically.
The first few are:
$$
\begin{array}{ccc}
\phi_1(x) &=& 1,\\
\phi_2(x) &=& x,\\
\phi_3(x) &=& x^2 - 1,\\
\phi_4(x) &=& x^3 - 3x,\\
\phi_5(x) &=& x^4 - 6x^2 + 3.
\end{array}
$$

In [None]:
# The random variable you wish to consider
X = st.norm()
# The maximum polynomial degree you want
degree = 3
# Construct the orthogonal polynomials
Phi_set = orthpol.OrthogonalPolynomial(degree, X, ncap=1000)

# Plot the probability density
fig, ax = plt.subplots()
x = np.linspace(-3, 3, 200)
ax.plot(x, X.pdf(x))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')

# Plot them
fig, ax = plt.subplots()
# Evaluate the orhtogonal polynomials on all these x's
phi_x = Phi_set(x)    # 200 x (degree + 1)
# Plot each one of them
ax.plot(x, phi_x);
ax.set_xlabel('$x$')
ax.set_ylabel('$\phi_i(x)$')
ax.set_title('$X\sim\mathcal{N}(0,1)$: Hermite Polynomials')
plt.legend(['$\phi_{%d}(x)$' % i for i in range(0, degree + 1)], loc='best');

In [None]:
Z, v = design.sparse_grid(1, 8, 'F1') # We do not use CC because it is closed (includes -1, 1) - Fejer 1 is open
X = st.norm.ppf(0.5 * (Z + 1.))
w = v / 2
phi_q = Phi_set(X)
for i in range(phi_q.shape[1]):
    for j in range(i, phi_q.shape[1]):
        print '<%d, %d> \t= %1.3f' % (i, j, np.sum(w * phi_q[:, i] * phi_q[:, j]))

### Example 2: The Exponential and the Laguerre Polynomials

Let $X\sim\mathcal{E}(1)$. The orthogonal polynomials in this case are known as the [Laguerre polynomials](https://en.wikipedia.org/wiki/Laguerre_polynomials).
They are known analytically.
The first few are:
$$
\begin{array}{ccc}
\phi_1(x) &=& 1,\\
\phi_2(x) &=& -x + 1,\\
\phi_3(x) &=& \frac{1}{2}\left(x^2 - 4x + 2\right).
\end{array}
$$

In [None]:
# The random variable you wish to consider
X = st.expon()
# The maximum polynomial degree you want
degree = 3
# Construct the orthogonal polynomials
Phi_set = orthpol.OrthogonalPolynomial(degree, X)

# Plot the probability density
fig, ax = plt.subplots()
x = np.linspace(0, 5, 200)
ax.plot(x, X.pdf(x))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')

# Plot them
fig, ax = plt.subplots()
# Evaluate the orhtogonal polynomials on all these x's
phi_x = Phi_set(x)    # 200 x (degree + 1)
# Plot each one of them
ax.plot(x, phi_x);
ax.set_xlabel('$x$')
ax.set_ylabel('$\phi_i(x)$')
ax.set_title('$X\sim\mathcal{E}(1)$: Laguerre Polynomials')
plt.legend(['$\phi_{%d}(x)$' % i for i in range(1, degree + 1)], loc='best');

In [None]:
Z, v = design.sparse_grid(1, 9, 'F1') # Again F1 isntead of CC
w = v / 2.
X = st.expon.ppf(0.5 * (Z + 1.))
phi_q = Phi_set(X)
for i in range(phi_q.shape[1]):
    for j in range(i, phi_q.shape[1]):
        print '<%d, %d> \t= %1.3f' % (i, j, np.sum(w * phi_q[:, i] * phi_q[:, j]))

## Example 4: We can do it for any probability density

We can construct orthonormal for any random variable $X$.
Let's do it for a mixture of Gaussians:
$$
p(x) = \pi_1 \mathcal{N}(x|\mu_1,\sigma_1^2) + \pi_2\mathcal{N}(x|\mu_2,\sigma_2^2),
$$
for $\pi_1 + \pi_2 = 1$.

In [None]:
# The random variable you wish to consider
X1 = st.norm(loc=-1, scale=0.4)
pi_1 = 0.2
X2 = st.norm(loc=+1, scale=0.4)
pi_2 = 0.8

p = lambda x: pi_1 * X1.pdf(x) + pi_2 * X2.pdf(x)

class MGRV(st.rv_continuous):
    
    def _pdf(self, x):
        return p(x)

mgrv = MGRV()
    
# The maximum polynomial degree you want
degree = 5
# Construct the orthogonal polynomials
Phi_set = orthpol.OrthogonalPolynomial(degree, wf=mgrv.pdf, left=-np.inf, right=np.inf, ncap=5000)
Phi_set.normalize()

# Plot the probability density
fig, ax = plt.subplots()
x = np.linspace(-2, 2, 200)
ax.plot(x, mgrv.pdf(x))
ax.set_xlabel('$x$')
ax.set_ylabel('$p(x)$')

# Plot them
fig, ax = plt.subplots()
# Evaluate the orhtogonal polynomials on all these x's
phi_x = Phi_set(x)    # 200 x (degree + 1)
# Plot each one of them
ax.plot(x, phi_x);
ax.set_xlabel('$x$')
ax.set_ylabel('$\phi_i(x)$')
ax.set_title('$X\sim\sum_i\pi_i\mathcal{N}(\mu_i,\sigma_i)$: Whatever Polynomials')
plt.legend(['$\phi_{%d}(x)$' % i for i in range(0, degree + 1)], loc='best');

In [None]:
# For this one we need a quadrature rule in (-infty, +infty) with w(x) = the pdf of the mixture
# Let's use a rule in (-1, 1) (the rule is open if the boundaries are not included) and transform it to (-infty, infty)
Z, v = design.sparse_grid(1, 5, 'F1') # Fejer 2, open, fully nested, w(x) = 1 and x in (-1, 1)
X = mgrv.ppf(0.5 * (Z + 1))
w = v / 2.

#X = st.norm.ppf(0.5 * (Z + 1))
#w = v / (2. * st.norm.pdf(X)) * p(X)
plt.plot(X, np.zeros(X.shape[0], ), '.')
plt.plot(x, mgrv.pdf(x))

#help(st.norm.ppf)
#w = w / 2.  # We need to normalize the weights
phi_q = Phi_set(X)
for i in range(phi_q.shape[1]):
    for j in range(i, phi_q.shape[1]):
        print '<%d, %d> \t= %1.3f' % (i, j, np.sum(w * phi_q[:, i] * phi_q[:, j]))

### More 1D Quadrature Rules

There are many more quadrature rules with various properties.
Some of the ones you can use from ``py-design`` are described [here](http://people.sc.fsu.edu/~jburkardt/f_src/sparse_grid_mixed_dataset/sparse_grid_mixed_dataset.html):

In [None]:
help(design.sparse_grid)

## Quadrature Rules in High-Dimensions


### Tensor Products of Quadrature Rules
The simplest approach to create high-dimensional quadrature rules is to take the tensor product of 1D ones.
For example, suppose you have a quadrature rule in 1D:
$$
Q^{(1)}(f) = \sum_{k=1}^n w_k f(x_k).
$$
The *tensor* product of $Q^{(1)}$ with itself is the 2D quadrature rule:
$$
Q^{(2)} = Q^{(1)}\otimes Q^{(1)}
$$
defined by:
$$
Q^{(2)}(f) = \left(Q^{(1)}\otimes Q^{(1)}\right)(f) = \sum_{i=1}^n\sum_{j=1}^n w_i w_j f(x_i, x_j).
$$
The tensor product can be generalized between any two quadrature rules in arbitrary dimensions.

**Note:** The number of nodes in the tensor product of two quadrature rules grows exponential with the dimensionality (*curse of dimensionality*).

In [None]:
from itertools import izip

level = 4
Z, v = design.sparse_grid(1, level, 'CC')

# Make the tensor rule
nq = Z.shape[0]
Z21, Z22 = np.meshgrid(Z, Z)
Z2 = np.hstack([Z21.flatten()[:, None], Z22.flatten()[:, None]])

# Plot it
fig, ax = plt.subplots()
ax.plot(Z2[:, 0], Z2[:, 1], '.')
ax.set_title('Tensor Product Rule')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$');

### Sparse Grid Quadrature

You can build a sparse grid (SG) quadrature out of any 1D quadrature rule $Q_\ell^{(1)}$ of level $\ell$.
The only restriction is that $Q_\ell^{(1)}$ must be nested, in the sense that all that nodes that are in $Q_{\ell}^{(1)}$ must be included in $Q_{\ell+1}^{(1)}$.
The whole point is to get a rule in which the number of nodes does not grow as fast as in a tensor product.

The construction of high-dimensional sparse grids is given by the *Smolyak quadrature formula*:
$$
Q_\ell^{(d)}(f) := \left(\sum_{i=1}^\ell\left(Q_i^{(1)} - Q_{i-1}^{(1)}\right)\otimes Q_{\ell-i+1}^{(d-1)}\right)(f).
$$
Understanding this formula is beyond what we want to do.
Let's just look at how sparse grids look like.

In [None]:
fig, ax = plt.subplots()
level = 6
count = 0
for i in range(level):
    Z, v = design.sparse_grid(2, i, 'CC')
    ax.plot(Z[count:, 0], Z[count:, 1], '.', color=sns.color_palette()[i], label='$L=%d$' % (i + 1))
    count = Z.shape[0]
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_title('Max level = %d' % level);
#plt.legend(loc='best');

Sparse grids grow much slower than tensor products. See below how the number of nodes grows as a function of the space dimensionality.

In [None]:
fig, ax = plt.subplots()
max_dim = 20
for level in range(1, 3):
    Z1, _ = design.sparse_grid(1, level, 'CC')
    sparse_grid_num_nodes = []
    tensor_num_nodes = []
    for d in range(1, max_dim + 1):
        Z, _ = design.sparse_grid(d, level, 'CC')
        sparse_grid_num_nodes.append(Z.shape[0])
        tensor_num_nodes.append(Z1.shape[0] ** d)
    ax.semilogy(range(1, max_dim + 1), sparse_grid_num_nodes, '-.', label='Sparse grid ($L=%d$)' % level)
    ax.semilogy(range(1, max_dim + 1), tensor_num_nodes, '--.', label='Tensor product ($L=%d)$' % level)
ax.set_xlabel('$d$')
ax.set_ylabel('Number of quadrature points')
plt.legend(loc='best');

## Example 5: Multidimensional Orthogonal Polynomials

Let's construct orthogonal polynomials for a random vector:
$$
X = (X_1,X_2),
$$
where
$$
X_1\sim\mathcal{U}(-1,1),
$$
and
$$
X_2\sim\mathcal{U}(-1, 1).
$$

In [None]:
X1 = st.uniform(loc=-1, scale=2)
X2 = st.uniform(loc=-1, scale=2)
X = (X1, X2)
dim = len(X)

# The maximum polynomial degree you want
degree = 4
# Construct the orthogonal polynomials - See documentation for more ways to do create
# multi-dimensional polynomials
Phi_set = orthpol.ProductBasis(X, degree=degree)

Z, v = design.sparse_grid(dim, 4, 'CC') # Gauss-Hermite which uses w(x) = e^{-x^T x} - need to scale:
X = Z
w = v / (2 ** dim)

phi_q = Phi_set(Z)
for i in range(phi_q.shape[1]):
    for j in range(i, phi_q.shape[1]):
        print '<%d, %d>\t= %1.2f' % (i, j, np.sum(w * phi_q[:, i] * phi_q[:, j]))

## Example 6: Multidimensional Orthogonal Polynomials

Let's construct orthogonal polynomials for a random vector:
$$
X = (X_1,X_2),
$$
where
$$
X_1\sim\mathcal{N}(0,1),
$$
and
$$
X_2\sim\mathcal{N}(0, 1).
$$

In [None]:
X1 = st.norm()
X2 = st.norm()
X = (X1, X2)
dim = len(X)

# The maximum polynomial degree you want
degree = 4
# Construct the orthogonal polynomials - See documentation for more ways to do create
# multi-dimensional polynomials
Phi_set = orthpol.ProductBasis(X, degree=degree)

Z, v = design.sparse_grid(2, 5, 'GH') # Gauss-Hermite which uses w(x) = e^{-x^T x} - need to scale:
X = Z * np.sqrt(2.)
w = v / np.sqrt(np.pi ** dim)

phi_q = Phi_set(X)
for i in range(phi_q.shape[1]):
    for j in range(i, phi_q.shape[1]):
        print '<%d, %d>\t= %1.2f' % (i, j, np.sum(w * phi_q[:, i] * phi_q[:, j]))

## Intrusive Uncertainty Propagation Methods: Dynamical Systems

Let $X$ be a random vector with probability density $p(x)$.
Let $\phi_1,\phi_2,\dots$ be an orthonormal basis with respect to $p(x)$.
Consider the stochastic dynamical system:
Consider the $m$-dimensional dynamical system:
$$
\frac{dy}{dt} = g(y;X),
$$
with initial conditions
$$
y(0) = y_0(X).
$$

Assume that the solution $y(t;x)$ is square integrable.
Then, at a given timestep $t$, we take the solution and we expand it in the polynomial basis:
$$
y(t;x) = \sum_{i=1}^\infty c_i(t)\phi_i(x).
$$
Note that the coefficients are functions of time.
According to our discussion above, if you find these $c_i(t)$'s, the expected value of the dynamical system will be:
$$
\mathbb{E}[y(t;X)] = c_1(t),
$$
and the variance will be:
$$
\mathbb{V}[y(t;X)] = \sum_{i=2}^\infty c_i^2(t).
$$

### Derivation of the Dynamical System for the Polynomial Coefficients
We will derive a dynamical system that the coefficients must satisfy.
At the initial conditions we have:
$$
y(0;x) = y_0(x)\Rightarrow \sum_{i=1}^\infty c_i(0)\phi_i(x) = y_0(x),
$$
so we get that:
$$
c_i(0) = \langle \phi_i, y_0\rangle.
$$

Now, take the derivative of $y(t;x)$ with respect to $t$:
$$
\frac{dy}{dt} = \sum_{i=1}^\infty \frac{dc_i}{dt}\phi_i(x).
$$
This looks good, but notice that $\frac{dy}{dt} = g(y;x)$ is also a function of $y(t; x)$.
We must think of $g(y;x)$ as a function of $x$ with a fixed $y$.
Then we get:
$$
\frac{dc_i}{dt} = \left\langle \phi_i, g\left(\sum_{j=1}^\infty c_j\phi_j, \cdot\right)\right\rangle.
$$

Thus, the dynamical systme that we need to solve to find the coefficents at any time is the following:
$$
\frac{dc_i}{dt} = \left\langle \phi_i, g\left(\sum_{j=1}^\infty c_j\phi_j, \cdot\right)\right\rangle,
$$
with initial conditions:
$$
c_i(0) = \langle \phi_i, y_0\rangle,
$$
for $i=1,2,\dots$ (in practice, we truncate at a given order).


### Example 7: Dynamical System with Uncertain Parameters
Take the random vector:
$$
X = (X_1, X_2),
$$
and assume that the components are independent Gaussian:
$$
X_i \sim \mathcal{N}(\mu_i, \sigma_i^2).
$$
So, for the full random vector we have a mean:
$$
\mu = (\mu_1, \mu_2),
$$
and a covariance matrix:
$$
\Sigma = \operatorname{diag}(\sigma_1^2,\sigma_2^2).
$$

Consider the ODE:
  \begin{align*}
    &\dot{y} = \frac{d y(t)}{dt} =-X_1y(t) \equiv g(y,X),\\
    &\qquad y(0) = X_2 \equiv y_0(X).
  \end{align*}

Let's see if we can carry out the inner produces that are required for setting up the dynamical system for the polynomial coefficients:
$$
c_i(0) = \langle \phi_i, y_0\rangle = \langle \phi_i, x_2\rangle.
$$

We should be able to do this numerically with some simple quadrature rule $\{(w_q,x_q)\}_{q=1}^{N_q}$:
$$
c_{i0} = \langle \phi_i, y_0\rangle \approx \sum_{q=1}^{N_q}w_q \phi_i(x_q)x_{q,2}.
$$

The other integrals that we need are:
$$
\langle \phi_i, g\rangle = \langle \phi_i, -x_1 \sum_{j=1}^\infty c_j \phi_j\rangle = -\sum_{j=1}^\infty c_j \langle \phi_i, x_1\phi_j\rangle.
$$
We can approximate all the integrals inside the summation by:
$$
A_{ij} = \langle \phi_i, x_1\phi_j\rangle \approx \sum_{q=1}^{N_q}w_q\phi_i(x_q)x_{q1}\phi_j(x_q). 
$$

With these definitions, the dynamical system that we need to solve is:
$$
\frac{dc_i}{dt} = -\sum_{j=1}^\infty c_j A_{ij},
$$
with initial conditions:
$$
c_i(0) = c_{i0},
$$
for $i=1,2,\dots$ (we will truncate).

Let's do it.

In [None]:
# SOLUTION WITH ORTHOGONAL POLYNOMIALS

# Construct the random variables - It is not very stable to work with the original 
# random varibales (too little uncertainty).
# So, we work with scaled versions.
mu1 = 0.05; sigma1 = 0.01
sX1 = st.norm()
mu2 = 8; sigma2 = 0.01
sX2 = st.norm()
sX = (sX1, sX2)
mu = np.array([mu1, mu2])
Sigma = np.diag([sigma1 ** 2, sigma2 ** 2])

# Construct the orthonormal polynomials
degree = 3
Phi_set = orthpol.ProductBasis((sX1, sX2), degree=degree, ncap=1000)
Phi_set.polynomials[0].normalize()
Phi_set.polynomials[1].normalize()
# Get a quadrature rule - we will talk about the quadrature rules in Lecture 17.
Z, v = design.sparse_grid(2, 5, 'GH') # Gauss-Hermite which uses w(x) = e^{-x^T x} - need to scale:
sXq = Z * np.sqrt(2.)
w = v / np.sqrt(np.pi ** dim)
Xq = np.ndarray(sXq.shape)
Xq[:, 0] = sXq[:, 0] * sigma1 + mu1
Xq[:, 1] = sXq[:, 1] * sigma2 + mu2

# Evaluate the integrals needed for defining the dynamical system
# Evaluate the orthogonal polynomials on the quadrature points
phi_q = Phi_set(sXq)
c0 = np.einsum('q,q,qj->j', w, Xq[:, 1], phi_q)
# Evaluate the integrals giving rise to the matrix A
A = np.einsum('q,q,qi,qj->ij', w, Xq[:, 0], phi_q, phi_q)

# Define the dynamical system
pc_rhs = lambda c, t: -np.dot(A, c)

# Solve the system
t = np.linspace(0, 100, 500)
c = scipy.integrate.odeint(pc_rhs, c0, t)

# Extract the mean
y_pc_m = c[:, 0]

# Extract the variance
y_pc_v = np.sum(c[:, 1:] ** 2, axis=1)

In [None]:
# SOLUTION WITH SAMPLING (FOR COMPARISON)
import scipy.integrate

class Ex1Solver(object):
    """
    An object that can solver the afforementioned ODE problem.
    It will work just like a multivariate function.
    """
    
    def __init__(self, nt=100, T=5):
        """
        This is the initializer of the class.
        
        Arguments:
            nt - The number of timesteps.
            T  - The final time.
        """
        self.nt = nt
        self.T = T
        self.t = np.linspace(0, T, nt) # The timesteps on which we will get the solution
        # The following are not essential, but they are convenient
        self.num_input = 2             # The number of inputs the class accepts
        self.num_output = nt           # The number of outputs the class returns
    
    def __call__(self, x):
        """
        This special class method emulates a function call.
        
        Arguments:
            x - A 1D numpy array with 2 elements. This represents the stochastic input x = (x1, x2).
        """
        # The dynamics of the adjoint z = y, dy/dx1, dy/dx2
        def g(z, t, x):
            return -x[0] * z[0], -x[0] * z[1] - z[0], -x[0] * z[2]
        # The initial condition
        y0 = (x[1], 0, 1)
        # We are ready to solve the ODE
        y = scipy.integrate.odeint(g, y0, self.t, args=(x,))
        return y
    
import design
num_lhs = 10000
X_lhs = design.latin_center(num_lhs, 2) # These are uniformly distributed - Turn them to standard normal
X_samples = mu + np.dot(st.norm.ppf(X_lhs), np.sqrt(Sigma))
solver = Ex1Solver(nt=500, T=100)
s = 0.
s2 = 0.
for x in X_samples:
    y = solver(x)[:, 0]
    s += y
    s2 += y ** 2
y_mu_lhs = s / num_lhs
y_var_lhs = s2 / num_lhs - y_mu_lhs ** 2

In [None]:
# Make the figure
fig1, ax1 = plt.subplots()

# Plot the mean and compare to LHS
ax1.plot(solver.t, y_mu_lhs, color=sns.color_palette()[0], label='LHS mean ($n=%d$)' % num_lhs)
ax1.plot(t, y_pc_m, '--', color=sns.color_palette()[1], label=r'PC mean ($\rho=%d$)' % degree)
ax1.set_xlabel('$t$')
ax1.set_ylabel('$\mu(t)$', color=sns.color_palette()[0])
ax1.tick_params('y', colors=sns.color_palette()[0])
plt.legend(loc='upper right')

# Plot variance and compare to LHS
ax2 = ax1.twinx()
ax2.plot(solver.t, y_var_lhs, color=sns.color_palette()[2], label='LHS variance ($n=%d$)' % (num_lhs))
ax2.plot(solver.t, y_pc_v, '--', color=sns.color_palette()[3], label=r'PC variance ($\rho=%d$)' % degree)
ax2.set_ylabel('$\sigma^2(t) = k(t, t)$', color=sns.color_palette()[2])
ax2.tick_params('y', colors=sns.color_palette()[2])
plt.legend(loc='center right');

In [None]:
# Let's do 95% intervals
s = np.sqrt(y_pc_v)
l = y_pc_m - 2 * s
u = y_pc_m + 2 * s
fig, ax = plt.subplots()
ax.plot(t, y_pc_m)
ax.fill_between(t, l, u, alpha=0.25)
ax.set_xlabel('$t$')
ax.set_ylabel('$y(t)$')

In [None]:
# Let's take some sample paths
fig, ax = plt.subplots()
ax.set_xlabel('$t$')
ax.set_ylabel('$y(t)$')
for _ in range(5):
    s_x_s = np.random.randn(2)
    y_s = np.dot(c, Phi_set(s_x_s[None, :]).T)
    ax.plot(t, y_s)

#### Questions
+ Repeat the analysis with higher polynomial degrees. What $\rho$ do you need to get convergent results?

+ Modify the code above so that you solve the problem with $X_1$ and $X_2$ that are Log-Normally distributed (choose your own mean and variance).

### Example 7: Dynamical System with Uncertain Parameters

Consider the stochastic harmonic oscillator:
$$
\begin{array}{ccc}
\ddot{y} + \omega^2(X)y &=& 0,\\
y(0) &=& y_0(X),\\
\dot{y}(0) &=& v_0(X),
\end{array}
$$
where $X$ is a random variable with PDF $p(x)$.

First, let's bring this to the form of a first order dynamical system.
We set:
$$
y_1 = y,
$$
and 
$$
y_2 = \dot{y}.
$$
The dynamical system becomes:
$$
\begin{array}{ccc}
\dot{y}_1 &=& y_2,\\
\dot{y}_2 &=& -\omega^2(X) y_1,\\
y_1(0) &=& y_0(X),\\
y_2(0) &=& v_0(X).
\end{array}
$$

Now, let $\phi_1(x), \phi_2(x),\dots,\phi_n(x)$ be the orthonormal polynomials of $p(x)$, i.e.
$$
\langle \phi_i, \phi_j \rangle = \int \phi_i(x) \phi_j(x) p(x) dx = \delta_{ij}.
$$
Expand the solution of the dynamical system in these polynomials:
$$
\begin{array}{ccc}
y_1(t;x) &=& \sum_{i=1}^n c_{1i}(t) \phi_i(x),\\
y_2(t;x) &=& \sum_{i=1}^n c_{2i}(t) \phi_i(x).
\end{array}
$$
We will derive the dynamical system that the $c_{ki}(t)$'s satisfy.

Expand also, $\omega(x), y_0(x)$ and $v_0(x)$:
$$
\begin{array}{ccc}
\omega(x) &=& \sum_{i=1}^n \omega_i \phi_i(x),\\
y_0(x) &=& \sum_{i=1}^n y_{0i} \phi_i(x),\\
v_0(x) &=& \sum_{i=1}^n v_{0i} \phi_i(x).
\end{array}
$$
These coefficients, can all be found with a sparse grid quadrature before we even start:
$$
\begin{array}{ccc}
\omega_i &=& \langle \omega, \phi_i\rangle \approx \sum_{k=1}^{n_q}w_k\omega(x_k)\phi_i(x_k),\\
y_{0i} &=& \langle y_0, \phi_i\rangle \approx \sum_{k=1}^{n_q}w_k y_0(x_k)\phi_i(x_k),\\
v_{0i} &=& \langle v_0, \phi_i\rangle \approx \sum_{k=1}^{n_q}w_k v_0(x_k)\phi_i(x_k).
\end{array}
$$

Now back to the dynamical system.
We have:
$$
\dot{y}_1 = y_2 \Rightarrow \sum_{i=1}^n \dot{c}_{1i} \phi_i(x) = \sum_{i=1}^n c_{2i}\phi_i(x) \Rightarrow \dot{c}_{1i} = c_{2i},
$$
and
$$
\dot{y}_2 = -\omega^2(x) y_1 \Rightarrow \sum_{i=1}^n\dot{c}_{2i}\phi_i(x) = -\left(\sum_{i=1}^n \omega_i \phi_i(x)\right)^2\sum_{j=1}^nc_{1j}\phi_j(x) = -\sum_{i,j,r=1}^n \omega_i\omega_r c_{1j}\phi_i(x)\phi_r(x)\phi_j(x). 
$$
So, if we define:
$$
H_{ijr} = \langle \phi_i, \phi_j,\phi_r\rangle \approx \sum_{k=1}^{n_q}w_k \phi_i(x_k)\phi_j(x_k)\phi_r(x_k),
$$
we get:
$$
\dot{c}_{2i} = -\omega_i \sum_{j,r=1}^n H_{ijr} \omega_r c_{1j}.
$$

To wrap it up, the dynamical sytem we need to solve is:
$$
\begin{array}{ccc}
\dot{c}_{1i} &=& c_{2i},\\
\dot{c}_{2i} &=& -\omega_i \sum_{j,r=1}^n H_{ijr} \omega_r c_{1j},\\
c_{1i}(0) &=& y_{0i},\\
c_{2i}(0) &=& v_{0i},
\end{array}
$$
for $i=1,\dots,n$.

In [None]:
# Random variable
X1 = st.norm()
X2 = st.norm()
X3 = st.norm()
X = (X1, X2, X3)
dim = len(X)

# Modeling of the natural frequency:
omega = lambda x: 2. * np.pi + x[:, 0]
# Initial position
y0 = lambda x: np.ones((x.shape[0],)) + 0.1 * x[:, 1]
# Initial velocity
v0 = lambda x: np.zeros((x.shape[0],)) + 0.1 * x[:, 2]

# Construct the orthonormal polynomials
degree = 5
Phi_set = orthpol.ProductBasis(X, degree=degree, ncap=500)

# Get a quadrature rule - we will talk about the quadrature rules in Lecture 17.
Zq, v = design.sparse_grid(dim, 5, 'GH') # Gauss-Hermite which uses w(x) = e^{-x^T x} - need to scale:
Xq = Zq * np.sqrt(2.)
w = v / np.sqrt(np.pi ** dim)

# The polynomials on the quadrature points
phi_q = Phi_set(Xq)

# Evaluate the matrix H
H = np.einsum('k,ki,kj,kr->ijr', w, phi_q, phi_q, phi_q)

# Evaluatae all needed coefficients
omegas = np.einsum('k,k,ki->i', w, omega(Xq), phi_q)
y0s = np.einsum('k,k,ki->i', w, y0(Xq), phi_q)
v0s = np.einsum('k,k,ki->i', w, v0(Xq), phi_q)

# Define the dynamical system
n = Phi_set.num_output
def c_rhs(c, t):
    c1 = c[:n]
    c2 = c[n:]
    c1_rhs = c2
    c2_rhs = -np.einsum('i,ijr,r,j->i', omegas, H, omegas, c1)
    return np.hstack([c1_rhs, c2_rhs])

c0 = np.hstack([y0s, v0s])

# Solve the system
t = np.linspace(0, 5, 500)
c = scipy.integrate.odeint(c_rhs, c0, t)

# Post proces the results
# Coefficients for the position
c1 = c[:, :n]
# Coefficients for th velocity
c2 = c[:, n:]
# Mean position
y1_m = c1[:, 0]
# Mean velocity
y2_m = c2[:, 0]
# Variance of position
y1_v = np.sum(c1[:, 1:] ** 2, axis=1)
# Variance of velocity
y2_v = np.sum(c2[:, 1:] ** 2, axis=1)
# Lower and upper prediction intervals
y1_s = np.sqrt(y1_v)
y1_l = y1_m - 2. * y1_s
y1_u = y1_m + 2. * y1_s
y2_s = np.sqrt(y2_v)
y2_l = y2_m - 2. * y2_s
y2_u = y2_m + 2. * y2_s

In [None]:
fig, ax = plt.subplots()
ax.plot(t, y1_m)
ax.fill_between(t, y1_l, y1_u, alpha=0.25)
ax.set_xlabel('$t$')
ax.set_ylabel('$y(t)$');

In [None]:
fig, ax = plt.subplots()
ax.plot(t, y2_m)
ax.fill_between(t, y2_l, y2_u, alpha=0.25)
ax.set_xlabel('$t$')
ax.set_ylabel('$\dot{y}(t)$');