# Function interpolation and integration in Python
ECON 3127/4414/8014 Computational methods in economics  
Week 7
Fedor Iskhakov  
<img src="../img/lecture.png" width="64px"/>

## Plan for the lecture
1. Spline interpolation
2. Polynomial interpolation
3. Quadrature integration methods
4. Application

### General objective

* $f(x)$ is function of interest, hard to compute
* Have data about $f(x)$ in $n$ points $(x_1,\dots,x_n)$

$$
f(x_1), f(x_2), \dots f(x_n)
$$

* Need to find the approximate value of the function $f(x)$ in arbitrary points $x$

### Approaches

1. Use a _similar_ function $g(x)$ to represent $f(x)$ between the data points
    - Which simpler function?
    - What data should be used?
    - How to control the accuracy of the approximation?
    
2. Use _piece-wise_ approach (connect the dots)
    - How exactly to connect?
    - What are advantages and disadvantages?

### Distinction between function approximation (interpolation) and curve fitting

* Functions approximation and interpolation refers to the situations when __data__ on function values is matched __exactly__
    - The approximation curve passes through the points of the data
* Curve fitting refers to the statistical problem when the data has __noise__, the task is to find an approximation for the central tendency in the data
    - Linear and non-linear regression models, econometrics
    - The model is _over-identified_ (there is more data than needed to exactly identify the regression function)
* Yet, the computational methods sometimes are _identical_ in both cases..

<img src="img/curvefit3.gif" width="800px">


## Spline interpolation

Spline = curve composed of independent pieces

**Definition** A function $s(x)$ on $[a,b]$ is a spline of order $n$ iff
- $s$ is $C^{n-2}$ on $[a,b]$,
- There are $a=x_0<x_1<\dots<x_m=b$ such that $s(x)$ is a polynomial of degree $n-1$ on each subinterval $[x_i,x_{i+1}]$, $i=0,\dots,m-1$

- A function with $n$ continuous derivatives is $C^{n}$
- Linear interpolation is spline of order 2.

### Cubic splines = spline of order 4
- Data set $\{(x_i,f(x_i), i=0,\dots,n\}$
- Functional form $s(x) = a_i + b_i x + c_i x^2 + d_i x^3$ on $[x_{i-1},x_i]$ for $i=1,\dots,n$
- $4n$ unknown coefficients
- $n\big[$interpolation, continuity equations$\big]$ + $2n-2$ equations for $C^2$
    * $s(x)$ passes through all data points
    * $s(x)$ is continuous
    * Derivative conditions in points $x_1,\dots,x_{n-1}$
- Additional 2 equation for the $x_0$ and $x_n$
    * $s''(x_0)=s''(x_n)=0$ (natural spline)
    * $s'(x_0)=\frac{s(x_1)-s(x_0)}{x_1-x_0}$, $s'(x_n)=\frac{s(x_n)-s(x_{n-1})}{x_n-x_{n-1}}$ (secant-Hermite)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
%matplotlib inline

np.random.seed(2008)
xd=np.linspace(0,10,1000)
x=np.sort(np.random.uniform(0,10,12))

f=lambda x: np.exp(-x/4)*np.sin(x)

def plot1(f,fi,x,xd,color='k',label=''):
    plt.figure(num=1, figsize=(10,8))
    plt.scatter(x,f(x),color='r',label='data')
    plt.plot(xd,f(xd),color='grey',label='true')
    xdi=xd[np.logical_and(xd>=x[0],xd<=x[-1])]
    plt.plot(xdi,fi(xdi),color=color,label=label)
    plt.show

In [None]:
plot1(f,f,x,xd)

In [None]:
fi = interpolate.interp1d(x,f(x))
plot1(f,fi,x,xd)

In [None]:
help(interpolate.interp1d)

In [None]:
fi = interpolate.interp1d(x,f(x),kind='linear')
plot1(f,fi,x,xd)

In [None]:
fi = interpolate.interp1d(x,f(x),kind='previous')
plot1(f,fi,x,xd,color='b')

In [None]:
fi = interpolate.interp1d(x,f(x),kind='next')
plot1(f,fi,x,xd,color='g')

In [None]:
fi = interpolate.interp1d(x,f(x),kind='nearest')
plot1(f,fi,x,xd,color='r')

In [None]:
for knd, clr in ('slinear','m'),('quadratic','b'),('cubic','g'):
    fi = interpolate.interp1d(x,f(x),kind=knd)
    plot1(f,fi,x,xd,color=clr,label=knd)

In [None]:
# Approximation errors
fig, ax = plt.subplots(figsize=(10,8))
ax.axhline(y=0, color='k',linewidth=1) 
ax.set_yscale('log')
for knd, clr in ('slinear','m'),('quadratic','b'),('cubic','g'):
    fi = interpolate.interp1d(x,f(x),kind=knd,bounds_error=False)
    erd=np.abs(f(xd)-fi(xd))
    plt.plot(xd,erd,color=clr)
    print('Max error with  %s splines is %1.5e'%(knd,np.nanmax(erd)))
# How to reduce approximation errors?

In [None]:
x=np.sort(np.random.uniform(0,10,25))
x=np.sort(np.random.uniform(0,10,11))

### Accuracy of the interpolation
- Number of nodes
- Location of nodes
    - What is important?
- When are some splines better than the other?    

### Example: two period consumption model

Consider the 


$$
max_{0\le c \le W} \big[u(c_1) + u(c_2)\big], \text{ s.t.} 
$$

$$
c_2 = (W-c_1)(1+r),\text{ where}
$$

$$
u(c)=\frac{c^\lambda + 1}{\lambda}
$$





## Polynomial approximation/interpolation

Back to the beginning to explore the idea of replacing original $f(x)$ with simpler $g(x)$
- Data set $\{(x_i,f(x_i)\}, i=0,\dots,n$
- Functional form is polynomial of degree $n-1$ such that $g(x_i)=f(x_i)$
- If $x_i$ are distinct, coefficients of the polynomial are uniquely identified

Does polynomial $g(x)$ converge to $f(x)$ when there are more points?

In [None]:
from numpy.polynomial import polynomial

p=polynomial.polyfit(x,f(x),len(x)-1)
fi=lambda x: polynomial.polyval(x,p)
plot1(f,fi,x,xd)

In [None]:
f2=lambda x: 1/(x**2+1)
xd2=np.linspace(-10,10,1000)
n=3
for clr in 'b','g','c','m','y':
#     np.random.seed(2008)
#     x=np.sort(np.random.uniform(-10,10,n))
    x=np.linspace(-10,10,n)
    p=polynomial.polyfit(x,f2(x),len(x)-1)
    fi=lambda x: polynomial.polyval(x,p)
    plot1(f2,fi,x,xd2,color=clr)
    n+=1

In [None]:
n=5
for clr in 'b','g','c','m','y':
    np.random.seed(2008)
    x=np.sort(np.random.uniform(0,10,n))
    p=polynomial.polyfit(x,f(x),len(x)-1)
    fi=lambda x: polynomial.polyval(x,p)
    plot1(f,fi,x,xd,color=clr)
    n+=1

### Hermite polynomial approximation
- Data set $\{(x_i,f(x_i),f'(x_i)\}, i=0,\dots,n$ **(data on derivatives**)
- Functional form is polynomial of degree $2n-1$ such that $g(x_i)=f(x_i)$ and $g'(x_i)=f'(x_i)$
- If $x_i$ are distinct, coefficients of the polynomial are uniquely identified


### Least squares approximation
- Data set $\{(x_i,f(x_i)\}, i=0,\dots,n$
- **Any** functional form $g(x)$ from class $G$ that best approximates $f(x)$

$$
g = \arg\min_{g \in G} \lVert f-g \rVert ^2
$$


## Orthogonal  polynomial approximation/interpolation

- Polynomials over domain $D$
- Weighting function $w(x)>0$
- Inner product $\langle f,g \rangle = \int_D f(x)g(x)w(x)dx$

$\{\phi_i\}$ is a family of orthogonal polynomials w.r.t. $w(x)$ iff

$$
$\langle \phi_i,\phi_j \rangle = 0, i\ne j
$$

### Chebyshev polynomials

- $[a,b] = [-1,1]$ and $w(x)=(1-x^2)^{(-1/2)}$
- $T_n(x)=\cos\big(n\cos^{-1}(x)\big)$
- Recursive formulas: $T_0(x)=1$, $T_1(x)=x$, $T_{n+1}(x)=2x T_n(x) - T_{n-1}(x)$

<img src="img/ChebyshevT_802.gif" width="800px">

### General interval
- Not hard to adapt the polynomials for the general interval $[a,b]$ through linear change of variable

$$
y = 2\frac{x-a}{b-a}-1
$$

- Orthogonality holds with weights function with the same change of variable

### Accuracy of Chebyshev approximation
Suppose $f: [-1,1]\rightarrow R$ is $C^k$ function for some $k\ge 1$, and let $I_n$ be the degree $n$ polynomial interpolation of $f$ with nodes at zeros of $T_{n+1}(x)$. Then

$$
\lVert f - I_n \rVert_{\infty} \le \left( \frac{2}{\pi} \log(n+1) +1 \right) \frac{(n-k)!}{n!}\left(\frac{\pi}{2}\right)^k \lVert f^{(k)}\rVert_{\infty}
$$

- Chebyshev approximation will work for $C^1$ smooth functions
- easy to compute
- but _does not_ approximate $f'(x)$ well


### Chebyshev approximation algorithm

1. Given $f(x)$ and $[a,b]$
2. Compute Chebyshev interpolation nodes on $[-1,1]$
3. Adjust nodes to $[a,b]$ by change of variable, $x_i$
4. Evaluate $f$ at the nodes, $f(x_i)$
5. Compute Chebyshev coefficients $a_i = g\big(f(x_i)\big)$
6. Arrive at approximation

$$
f(x) = \sum_{i=0}^n a_i T_i(x)
$$

In [None]:
import numpy.polynomial.chebyshev as cheb
n=5
for clr in 'b','g','c','m','y':
    fi=cheb.Chebyshev.interpolate(f,n,[0,10])
    plot1(f,fi,x,xd,color=clr)
    n+=1

## Adaptive grid methods

![surplus_slow.gif](img/surplus_slow.gif)


## Extrapolation

Extrapolation is computing the approximated function outside of the original data interval

__Should be avoided in general__

* Exact _only_ when theoretical properties of the extrapolated function are known
    - Examples?
* Can be used with extreme caution for _preliminary_ results
    - What can be wrong with it?

## Shape issues

Approximated function may not have the theoretical properties of the original function 

- Shape issues
    - When more pronounces?
    - Remedy?
- Contraction properties
- Etc.

Schumaker formulas for shaper preserved spline interpolation

## Choosing the right interpolation method
1. Linear interpolation
2. Quadratic and cubic splines
3. Polynomial interpolation
4. Shape-preserving methods
---
- Polynomial functions
- Functions with kinks
- Functions with discontinuities
- Expensive to compute functions
- Monotone functions
- Functions with unknown theoretical properties

## Multidimensional interpolation

**Generally much harder!**


### Nested linear (Multi-linear) interpolation
Interpolated linearly in first dimension, then between the interpolated points in the second dimension, and so on.
- _Problem:_ overall interpolation not linear any more!
- But easy to implement due to recursive structure

### Triangulation

- Regular grid triangulation
- Delaunay (irregular grid) triangulation

![surplus_slow.gif](img/Delaunay_circumcircles.png)

![surplus_slow.gif](img/Delaunay_triangulation_small.png)

![surplus_slow.gif](img/Delaunay_Triangulation.png)

### Sparse methods
- Smolyak grid
- Adaptive sparse grid

![adaptive_grid_slow.gif](img/adaptive_grid_slow.gif)


## Further learning resources
* https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
* https://docs.scipy.org/doc/numpy/reference/generated/numpy.interp.html
* Using Adaptive Sparse Grids to Solve High‐Dimensional Dynamic Models https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA12216