###### Content provided under a Creative Commons Attribution license, CC-BY 4.0; code under MIT License. (c)2018 Aksel Hiorth

In [1]:
from IPython.core.display import HTML
css_file = 'style.css'
HTML(open(css_file, "r").read())

# Numerical Integration

Before diving into the details of this section, it is worth pointing out that the derivation of the algorithms in this section follows a general pattern:
<ol>
<li>We start with a mathematical model (in this case an integral) </li>
    <li> The mathematical model is formulated in descrete form </li>
    <li> Then we design an algorithm to solve the model </li>
    <li> The numerical solution for a test case is compared with the true solution (could be an analytical solution or data)</li>
        <li> Error analysis: we investigate the accuracy of the algorithm by changing the number of iterations and/or make changes to the implementation or algorithm</li>
 </ol>   
In practice you would not use your own implementation, but in order to understand which method to use it is important to understand the limitation and advantages of the different algorithms. For some applications you might want the implementation to be as fast as possible. Then you might want to use some of the methods below, e.g. the Gaussian quadrature algorithm, and taylor it to your specific problem.  Thus you should read this part not only to learn about implementation of numerical integration, but rather as an illustration of computational thinking and algorithmic development. The path that is taken here can be applied to other models as well. 

## The Midpoint Rule

Numerical integration is encountered in numerous of applications in physics and engineering sciences. Let us first consider the most simple case, a function $f(x)$, which is a function of one variable, $x$. The most straight foreward way of calculating the area $\int_a^bf(x)dx$ is simply to divide the area under the function into $N$ equal rectangular slizes with size $h=(b-a)/N$, as illustrated in the figure below. The area of one box is: 
\begin{equation}
M(x_k,x_k+h)=f(x_k+\frac{h}{2}) h,
\end{equation}
and the area of all the boxes is:
\begin{eqnarray}
I(a,b)&=&\int_a^bf(x)dx\simeq\sum_{k=0}^{N-1}M(x_k,x_k+h)=h\sum_{k=0}^{N-1}f(x_k+\frac{h}{2})=h\sum_{k=0}^{N-1}f(a+(k+\frac{1}{2})h).
\end{eqnarray}
Note that the sum goes from $k=0,1,\ldots,N-1$, a total of $N$ elements. We could have chosen to let the sum go from $k=1,2,\ldots,N$. In Python, C, C++ and many other programming languages the arrays start by indexing the elements from $0,1,\ldots$ to $N-1$, therefore we choose this convention, because then the formulas we develop can be directly implemented.

<img src="func_sq.png" width="600"/>

Below is a Python code, where this algorith is implemented for $f(x)=\sin (x)$

In [3]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)

def int_midpoint(lower_limit, upper_limit,func,N):
    h    = (upper_limit-lower_limit)/N
    area = 0.
    for k in range(0,N): # loop over k=0,1,..,N-1
        val = lower_limit+(k+0.5)*h # midpoint value 
        area += func(val)*h
    return area
        
        
N=5
a=0
b=np.pi
Area = int_midpoint(a,b,f,N)
print('Numerical value= ', Area)
print('Error= ', (2-Area)/2) # Analytical result is 2 



Numerical value=  2.000329024698625
Error=  -0.0001645123493125844


By increasing $N$ the numerical result will get closer to the true answer. How much do you need to increase $N$ in order to reach an accuarcy higher than $10^{-8}$. What happens when $N$ increases? 

## The Trapezoidal Rule
The numerical error in the above example is quite low, only about 2$\%$ for $N=5$. However, by just looking at the graph above it seems likely that we can develop a better algorithm by using trapezoids instead of rectangels, see the figure below:

<img src="func_tr.png" width="600"/>

Earlier we approximated the area using the midpoint value: $f(x_k+h/2)\cdot h$. Now we use $A=A_1+A_2$, where $A_1=f(x_k)\cdot h$ and $A_2=(f(x_k+h)-f(x_k))\cdot h/2$, hence the area of one trapezoid:
\begin{equation}
A\equiv T(x_k,x_k+h)=(f(x_k+h)+f(x_k))h/2.
\end{equation}
This is the trapezoidal rule, and for the whole intervall we get:
\begin{eqnarray}
I(a,b)&=&\int_a^bf(x)dx\simeq\frac{1}{2}h\sum_{k=0}^{N-1}\left[f(x_k+k h)+f(x_k)\right] \\
&=&h\left[\frac{1}{2}f(a)+f(a+h) + f(a+2h) + \cdots + f(a+(N-2)h)+\frac{1}{2}f(b)\right]\\
&=&h\left[\frac{1}{2}f(a)+\frac{1}{2}f(b)+\sum_{k=1}^{N-1}f(a+k h)\right].
\end{eqnarray}
Note that this formula was bit more involved to derive, but it requires only one more function evaluations as the midpoint rule. Below is a python implementation:

In [23]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)

#In the implementation below the calculation goes faster 
#when we avoid unecessary multiplications by h in the loop
def int_trapez(lower_limit, upper_limit,func,N):
    h       = (upper_limit-lower_limit)/N
    area    = func(lower_limit)+func(upper_limit)
    area   *= 0.5
    val     = lower_limit
    for k in range(1,N): # loop over k=1,..,N-1
        val  += h # midpoint value 
        area += func(val)
    return area*h
        
        
N=10
a=0
b=np.pi
Area = int_trapez(a,b,f,N)
print('Numerical value= ', Area)
print('Error= ', (2-Area)) # Analytical result is 2 



Numerical value=  1.9835235375094546
Error=  0.01647646249054535


Note that we get the surprising result that this algorithm performs poorer (not by much) than the midpoint rule. How can this be explained? By just looking at the first graph in this document, we see that the midpoint rule actually overpredict the area from $[x_k,x_k+h/2]$ and underpredict in the interval $[x_k+h/2,x_{k+1}]$ or vice versa. If the function to be integrated was a linear function the midpoint rule would give the exact result (see exercise), the net effect is that for many cases the midpoint rule give a slightly better performance than the trapezoidal rule. In the next section we will investigate this more formally.

## Numerical Errors on Integrals

Before we proceed it is worth mentioning numerical errors. It is important to know the accuracy of the methods we are using, othervise we do not know if the computer produce correct results. In the previous examples we were able to estimate the error because we knew the analytical result. However, if we knew the analytical result there is no reason to use the computer to calculate the result(!). Thus, we need a general method to estimate the error, and let the computer run until a desired accuracy is reached. In order to analyse this in more detail we approximate the function by a Taylor series between the points $x_k$ and $x_k+h$. To analyse the midpoint rule it is convenient to expand the function about $x_k+h/2$:
\begin{eqnarray}
f(x)=f(x_k+h/2)+f^\prime(x_k+h/2)(x-(x_k+h/2))+\frac{1}{2!}f^{\prime\prime}(x_k+h/2)(x-(x_k+h/2))^2+{\cal O}(h^3)
\end{eqnarray}
Since $f(x_k+h/2)$ and its derivatives are constants it is straight forward to integrate $f(x)$:
\begin{eqnarray}
I(x_k,x_k+h)=\int_{x_k}^{x_k+h}\left[f(x_k+h/2)+f^\prime(x_k+h/2)(x-(x_k+h/2))+\frac{1}{2!}f^{\prime\prime}(x_k+h/2)(x-(x_k+h/2))^2+{\cal O}(h^3)\right]dx
\end{eqnarray}
The first term is simply the midpoint rule, to evaluate the two other terms we make the substitution: $u=x-x_k$: 
\begin{eqnarray}
I(x_k,x_k+h)&=&f(x_k+h/2)\cdot h+f^\prime(x_k+h/2)\int_0^u(u-h/2)du+\frac{1}{2}f^{\prime\prime}(x_k+h/2)\int_0^u(u-h/2)^2du+{\cal O}(h^4)\\
&=&f(x_k+h/2)\cdot h-\frac{h^3}{24}f^{\prime\prime}(x_k+h/2)+{\cal O}(h^4).
\end{eqnarray}
Thus the error for the midpoint rule, $E_{M,k}$, on this particular interval is:
\begin{eqnarray}
E_{M,k}=I(x_k,x_k+h)-f(x_k+h/2)\cdot h=-\frac{h^3}{24}f^{\prime\prime}(x_k+h/2),
\end{eqnarray}
where we have ignored higher order terms. We can easily sum up the error on all the intervals, but clearly $f^{\prime\prime}(x_k+h/2)$ will not be the same on all intervalls (unless it is zero). However, an upper bound for the error can be found by replacing $f^{\prime\prime}(x_k+h/2)$ with the maximal value on the intervall $[a,b]$, $f^{\prime\prime}(\eta)$:
\begin{eqnarray}
E_{M}=\sum_{k=0}^{N-1}E_{M,k}=-\frac{h^3}{24}\sum_{k=0}^{N-1}f^{\prime\prime}(x_k+h/2)\leq-\frac{Nh^3}{24}f^{\prime\prime}(\eta)=-\frac{(b-a)^3}{24N^2}f^{\prime\prime}(\eta).
\end{eqnarray}
We can do the exact same analysis for the trapezoidal rule, but then we expand the function around $x_k-h$ instead of the midpoint. The error term is then:
\begin{equation}
E_T=\frac{(b-a)^3}{12N^2}f^{\prime\prime}(\overline{\eta}).
\end{equation}
At the first glance it might look like the midpoint rule always is better than the trapezoidal rule, but note that the second derivative is evaluated in different points ($\eta$ and $\overline{\eta}$). Thus it is possible to construct examples where the midpoint rule performs poorer than the trapezoidal rule. 

Before we end this section we will rewrite the error terms in a more useful form as it is not so easy to evaluate $f^{\prime\prime}(\eta)$ (since we do not know which value of $\eta$ to use). By taking a closer look at the term in the equation aboove where $f(\eta)$ originates from, it is closely related to the midpoint rule for $\int_a^bf^{\prime\prime}(x)dx$:
\begin{eqnarray}
E_{M}&=&-\frac{h^2}{24}h
\sum_{k=0}^{N-1}f^{\prime\prime}(x_k+h/2)\simeq-\frac{h^2}{24}\int_a^b
f^{\prime\prime}(x)dx\\
E_M&\simeq&=\frac{h^2}{24}\left[f^\prime(b)-f^\prime(a)\right]=-\frac{(b-a)^2}{24N^2}\left[f^\prime(b)-f^\prime(a)\right]
\end{eqnarray}
The corresponding formula for the trapezoid formula is:
\begin{equation}
E_T\simeq \frac{h^2}{12}\left[f^\prime(b)-f^\prime(a)\right]=\frac{(b-a)^2}{12N^2}\left[f^\prime(b)-f^\prime(a)\right]
\end{equation}
Below is a Python implementation of the midpoint rule, where the number of steps are chosen to reach (at least) the specific accuracy:

In [3]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)
#Numerical derivative of function
def df(x,func):
    dh=1e-4 # some low step size
    return (func(x+dh)-func(x))/dh 

#Adaptive midpoint rule, "adaptive" because the number of function evaluations depends on the integal
def int_adaptive_midpoint(lower_limit, upper_limit,func,tol):
    dh = 1e-4
    dfa  = df(lower_limit,func) # calculate the numerical derivative in point a 
    dfb  = df(upper_limit,func) # calculate the numerical derivative in point b
    N    = abs((upper_limit-lower_limit)**2*(dfb-dfa)/24/tol)
    N    = int(np.sqrt(N)) + 1 #add one extra as int rounds down, e.g. int(3.9)=3
    h    = (upper_limit-lower_limit)/N
    area = 0.
    print('Number of intervalls = ', N)
    for k in range(0,N): # loop over k=0,1,..,N-1
        val = lower_limit+(k+0.5)*h # midpoint value 
        area += func(val)
    return area*h
        
        
prec=1e-8
a=0
b=np.pi
Area = int_adaptive_midpoint(a,b,f,prec)
print('Numerical value = ', Area)
print('Error           = ', (2-Area)) # Analytical result is 2 

Number of intervalls =  9069
Numerical value =  2.000000009999997
Error           =  -9.999996830600821e-09


## Practical Estimation of Errors on Integrals

From the example above we were able to estimate the number of steps needed to reach (at least) a certain precision. In many practical cases we do not deal with functions, but with data and it can be difficult to evaluate the derivative. We also saw from the example above that the algorithm gives a higher precision than what we asked for. How can we avoid doing too many iterations? A very simple solution to this question is to double the number of intervals until a desired accuracy is reached. The following analysis holds for both the trapezoid and midpoint method, because in both cases the error scale as $h^2$. Assume that we have evaluated the integral with a step size $h_1$, and denote the result by $I_1$. Then we know that the true integral is $I=I_1+c h_1^2$, where $c$ is a constant. If we now half the step size: $h_2=h_1/2$, then the true integral is $I=I_2+c h_2^2$. Taking the difference between $I_2$ and $I_1$ gives us an estimate for the error:
\begin{eqnarray}
I_2-I_1=I-c h_2^2-(I-ch_1^2)=3c h_2^2, 
\end{eqnarray}
where we have used the fact that $h_1=2h_2$, Thus the error term is:
\begin{eqnarray}
E(a,b)=c h_2^2=\frac{1}{3}(I_2-I_1). 
\end{eqnarray}
This might seem like we need to evaluate the integral twice as many times as needed. This is not the case, by choosing to exactly half the spacing we only need to evaluate for the values that lies halfway between the original points. We will demonstrate how to do this by using the trapezoidal rule, because it operates directly on the $x_k$ values and not the midpoint values. The trapezoidal rule is:
\begin{eqnarray}
I_2(a,b)&=&h_2\left[\frac{1}{2}f(a)+\frac{1}{2}f(b)+\sum_{k=1}^{N_2-1}f(a+k h_2)\right],\\
&=&h_2\left[\frac{1}{2}f(a)+\frac{1}{2}f(b)+\sum_{k=\text{even values}}^{N_2-1}f(a+k h_2)+\sum_{k=\text{odd values}}^{N_2-1}f(a+k h_2)\right],
\end{eqnarray}
in the last equation we have split the sum into odd an even values. The sum over the even values can be rewritten:
\begin{eqnarray}
\sum_{k=\text{even values}}^{N_2-1}f(a+k h_2)=\sum_{k=0}^{N_1-1}f(a+2k h_2)=\sum_{k=0}^{N_1-1}f(a+k h_1),
\end{eqnarray}
note that $N_2$ is replaced with $N_1=N_2/2$, we can now rewrite $I_2$ as:
\begin{eqnarray}
I_2(a,b)&=&h_2\left[\frac{1}{2}f(a)+\frac{1}{2}f(b)+\sum_{k=0}^{N_1-1}f(a+k h_1)+\sum_{k=\text{odd values}}^{N_2-1}f(a+k h_2)\right]
\end{eqnarray}
Note that the first terms are actually the trapezoidal rule for $I_1$, hence:
\begin{eqnarray}
I_2(a,b)&=&\frac{1}{2}I_1(a,b)+h_2\sum_{k=\text{odd values}}^{N_2-1}f(a+k h_2)
\end{eqnarray}
A possible algorithm is then:
<ol>
    <li> Choose a low number of steps to evaluate the integral, $I_0$, the first time, e.g. $N_0=10$</li>
    <li> Double the number of steps, $N_1=2N_0$ </li>
    <li> Calculate the missing values by summing over the odd number of steps $\sum_{k=\text{odd values}}^{N_1-1}f(a+k h_1)$</li>
    <li> Check if $E_1(a,b)=\frac{1}{3}(I_1-I_0)$ is lower than a specific toleranse
    <li> If yes quit, if not, return to 2, and continue until $E_i(a,b)=\frac{1}{3}(I_{i+1}-I_{i})$ is lower than the toleranse  </li>
</ol>
Below is a Python implementation:

In [5]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)
# step size is choosen automatically to reach the specified tolerance 
def int_adaptive_trapez(lower_limit, upper_limit,func,tol):
    N0      = 10
    h       = (upper_limit-lower_limit)/N0
    area    = func(lower_limit)+func(upper_limit)
    area   *= 0.5
    val     = lower_limit
    for k in range(1,N0): # loop over k=1,..,N-1
        val   += h # midpoint value 
        area  += func(val)
    area   *=h
    calc_tol = 2*tol + 1 # just larger than tol to enter the while loop 
    while(calc_tol>tol):
        N = N0*2
        h = (upper_limit-lower_limit)/N
        odd_terms=0
        for k in range (1,N,2): # 1, 3, 5, ... , N-1
            val  = lower_limit + k*h
            odd_terms += func(val)
        new_area = 0.5*area + h*odd_terms
        calc_tol = abs(new_area-area)/3 
        area     = new_area # store new values for next iteration
        N0       = N        # update number of slices
    print('Number of intervalls = ', N)
    return area #while loop ended and we can return the area
        
prec=1e-8
a=0
b=np.pi
Area = int_adaptive_trapez(a,b,f,prec)
print('Numerical value = ', Area)
print('Error           = ', (2-Area)) # Analytical result is 2 

Number of intervalls =  20480
Numerical value =  1.9999999960781696
Error           =  3.9218304159760464e-09


What is a good number to start with, what happens if we choose $N_0$ too large? Compare the adaptive midpoint rule with the adaptive trapzoidal rule, is it possible to get the same accuracy with the same number of iterations? Check the expected number of iterations with the theoretical value $N=\sqrt{\frac{(b-a)^2}{12E_T}\left[f^\prime(b)-f^\prime(a)\right]}$. 

# Romberg Integration
The adaptive algorithm for the the adaptive trapezoidal rule in the previous section can be improved easily by remembering that the true integral was given by: $I=I_i+ch_i^2+{\cal O}(h^4)$. The error term was in the previous example only used to check if the desired tolerance was achieved, but explisit using it to evaluate the integral the trapezoidal rule is acurate to fourth order:
\begin{equation}
I=I_{i+1}+\frac{1}{3}\left[I_{i+1}-I_{i}\right]+{\cal O}(h^4).
\end{equation}
The error term can be found as in the previous section:
\begin{eqnarray}
I_{i+1}-I_{i}=I-c h_{i+1}^4-(I-ch_i^4)=-c h_{i+1}^4+c(2h_{i+1})^4=15c h_i^4, 
\end{eqnarray}
but now we are in the exact situation as before, we have not only the error term but the correction up to order $h^4$ for this integral:
\begin{equation}
I=I_{i+1}+\frac{1}{15}\left[I_{i+1}-I_{i}\right]+{\cal O}(h^6).
\end{equation}
Note that there are two iteration going on at the same time; one is the iteration that half the step size ($i$) and the other one is the increasing number of higher order terms added (which we will denote $m$). We will replace the approximation to the integral with $R_{i,m}$. Thus prevois equation can then be written:
\begin{equation}
I=R_{i+1,2}+\frac{1}{15}\left[R_{i+1,2}-R_{i,2}\right]+{\cal O}(h^6).
\end{equation}
A genereal formula valid for any $m$ can be found by realising:
\begin{eqnarray}
I&=&R_{i+1,m+1}+c_mh_i^{2m+2}+{\cal O}(h_i^{2m+4})\\
I&=&R_{i,m+1}+c_mh_{i-1}^{2m+2}+{\cal O}(h_{i-1}^{2m+4})=R_{i,m+1}+2^{2m+2}c_mh_{i}^{2m+2}+{\cal O}(h_{i-1}^{2m+4})
\text{, hence:}\\
c_mh_{i}^{2m+2}&=&\frac{1}{4^{m+1}-1}(R_{i,m}-R_{i-1,m}) 
\end{eqnarray}
A final estimate for the integral is then:
\begin{eqnarray}
I&=&R_{i,m+1}+{\cal O}(h_i^{2m+2})\\
R_{i,m+1}&=&R_{i,m}+\frac{1}{4^{m+1}-1}(R_{i+1,m+1}-R_{i,m+1}) 
\end{eqnarray}
A possible algorithm is then:
<ol>
    <li> Evaluate $R_{0,0}=\frac{1}{2}\left[f(a)+f(b)\right]$ as the first evaluation</li>
    <li> Double the number of steps, $N_{i+1}=2N_i$ or half the step size $h_{i+1}=h_i/2$ </li>
    <li> Calculate the missing values by summing over the odd number of steps $\sum_{k=\text{odd values}}^{N_1-1}f(a+k h_{i+1})$</li>
    <li> Correct the estimate by adding the higher order error term $R_{i,m+1}=R_{i,m}+\frac{1}{4^m-1}(R_{i+1,m+1}-R_{i,m+1})$ 
    <li> Check if the error term is lower than a specific toleranse $E_{i,m}(a,b)=\frac{1}{4^{m+1}-1}(R_{i,m}-R_{i-1,m})$, if yes quit, if no goto 2, increase $i$ and $m$ by one  </li>
</ol>
Note that the tolerance term is not the correct one as it uses the error estimate for the current step, which we used to update the integral in the currect step to reach a higher accuracy. Thus the error on the integral will be lower than the user specified toleranse.
Below is a Python implementation:

In [4]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)
# step size is choosen automatically to reach (at least) the specified tolerance 
def int_romberg(lower_limit, upper_limit,func,tol):
    Nmax = 100
    R = np.empty([Nmax,Nmax])                 # storage buffer
    R[0,0]    =.5*(func(lower_limit)+func(upper_limit))
    N         = 1
    for i in range(1,Nmax):
        N = N*2
        h = (upper_limit-lower_limit)/N
        odd_terms=0
        for k in range (1,N,2): # 1, 3, 5, ... , N-1
            val        = lower_limit + k*h
            odd_terms += func(val)
        R[i,0]   = 0.5*R[i-1,0] + h*odd_terms # add the odd terms to the previous estimate
        for m in range(0,i):                  # m = 0, 1, ..., i-1          
            R[i,m+1]   = R[i,m] + (R[i,m]-R[i-1,m])/(4**(m+1)-1) # add all higher order terms in h
                   
        calc_tol = abs(R[i,i]-R[i,i-1])       # check toleranse, best guess
        if(calc_tol<tol):
            break                             # estimated precision reached 
    if(i == Nmax-1):
        print('Romberg routine did not converge after ', Nmax, 'iterations!')
    else:      
        print('Number of intervalls = ', N)
    
    return R[i,i] #while loop ended and we can return the best estimate
        
prec=1e-8
a=0
b=np.pi
Area = int_romberg(a,b,f,prec)
print('Numerical value = ', Area)
print('Error           = ', (2-Area)) # Analytical result is 2 


Number of intervalls =  32
Numerical value =  2.000000000001321
Error           =  -1.3211653993039363e-12


## Gausian Quadrature
Many of the methods we have looked into are of the type:
\begin{equation}
\int_a^bf(x)dx=\sum_{k=0}^{N-1}\omega_kf(x_k),
\end{equation}
where the function is evaluated at fixed interval. For the midpoint rule $\omega_k=h$ for all values of $k$, for the trapezoid rule $\omega_k=h/2$ for the endpoints and $h$ for all the interior points. For the Simpsons rule (see exercise) $\omega_k=h/3, 4h/3,2h/3,4h/3,\ldots,4h/3,h/3$. Note that all the methods we have looked at so far samples the function in equal spaced points, $f(a+k h)$, for $k=0, 1, 2\ldots, N-1$. If we now allow for the function to be evaluated at unevenly spaced points, we can do a lot better. This realization is the basis for Gaussian Quadrature. We will explore this in the following, but to make the development easier and less cumbersome, we transform the integral from the domain $[a,b]$ to $[-1,1]$:
\begin{eqnarray}
\int_a^bf(t)dt&=&\frac{b-a}{2}\int_{-1}^{1}f(x)dx\text{ , where:}\\
x&=&\frac{2}{b-a}t-\frac{b+a}{b-a}.
\end{eqnarray}
The factor in front comes from the fact that $dt=(b-a)dx/2$, thus we can develop our algorithms on the domain $[-1,1]$, and then do the transformation back using: $t=(b-a)x/2+(b+a)/2$. 

The idea we will explore is as follows: If we can approximate the function to be integrated on the domain $[-1,1]$ (or on $[a,b]$) as a polynomal of as large a degree as possible then the numerical integral of this polynomal will be very close to the integral of the function we are seeking. This idea is best understood by a couple of examples, assume that we want to use $N=1$ in the formula at the top:
\begin{equation}
\int_{-1}^{1}f(x)\,dx\simeq\omega_0f(x_0).
\end{equation}
We now choose $f(x)$ to be a polynomal of as large a degree as possible, but with the requirement that the integral is exact. If $f(x)=1$, we get:
\begin{equation}
\int_{-1}^{1}f(x)\,dx=\int_{-1}^{1}1\,dx=2=\omega_0,
\end{equation}
hence $\omega_0=2$. If we choose $f(x)=x$, we get:
\begin{equation}
\int_{-1}^{1}f(x)\,dx=\int_{-1}^{1}x\,dx=0=\omega_0f(x_0)=2x_0,
\end{equation}
hence $x_0=0$. Thus the integration rule for $N=1$ is:
\begin{equation}
\int_{-1}^{1}f(x)\,dx\simeq 2f(0)\text{, or: } \int_{a}^{b}f(t)\,dt\simeq\frac{b-a}{2}\,2f(\frac{b+a}{2})=(b-a)f(\frac{b+a}{2}).
\end{equation}
This last equation is the midpoint rule, by choosing $b=a+h$ we get exactly the formula used in the midpoint rule with stepsize $h$. If we choose $N=2$:
\begin{equation}
\int_{-1}^{1}f(x)\,dx\simeq\omega_0f(x_0)+\omega_1f(x_1), 
\end{equation}
we can show that now $ f(x)=1,\,x,\,x^2\,x^3$ can be integrated exact:
\begin{eqnarray}
\int_{-1}^{1}1\,dx&=&2=\omega_0f(x_0)+\omega_1f(x_1)=\omega_0+\omega_1\,,\\
\int_{-1}^{1}x\,dx&=&0=\omega_0f(x_0)+\omega_1f(x_1)=\omega_0x_0+\omega_1x_1\,,\\
\int_{-1}^{1}x^2\,dx&=&\frac{2}{3}=\omega_0f(x_0)+\omega_1f(x_1)=\omega_0x_0^2+\omega_1x_1^2\,,\\
\int_{-1}^{1}x^3\,dx&=&0=\omega_0f(x_0)+\omega_1f(x_1)=\omega_0x_0^3+\omega_1x_1^3\,,
\end{eqnarray}
hence there are four unknowns and four equations. The solution is: $\omega_0=\omega_1=1$ and $x_0=-x_1=1/\sqrt{3}$, and the corresponding integration rule:
\begin{eqnarray}
\int_{-1}^{1}f(x)\,dx&\simeq& f(-\frac{1}{\sqrt{3}})+f(\frac{1}{\sqrt{3}})\, \text{, or:}\\
\int_{a}^{b}f(x)\,dx&\simeq& \frac{b-a}{2}\left[f(-\frac{b-a}{2}\frac{1}{\sqrt{3}}+\frac{b+a}{2})
+f(\frac{b-a}{2}\frac{1}{\sqrt{3}}+\frac{b+a}{2})\right].
\end{eqnarray}


In [1]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)
# Gaussian Quadrature for N=2
def int_gaussquad2(lower_limit, upper_limit,func):
    N=2
    x = [-1/np.sqrt(3.),1/np.sqrt(3)]
    w = [1, 1]
    area = 0.
    for i in range(0,N):
        xp = 0.5*(upper_limit-lower_limit)*x[i]+0.5*(upper_limit+lower_limit)
        area += w[i]*func(xp)
    return area*0.5*(upper_limit-lower_limit)
        
        
a=0
b=np.pi
Area = int_gaussquad2(a,b,f)
print('Numerical value = ', Area)
print('Error           = ', (2-Area)) # Analytical result is 2 

Numerical value =  1.9358195746511366
Error           =  0.06418042534886337


### The case N=3
For the case $N=3$, we find that $f(x)=1,x,x^2,x^3,x^4,x^5$ can be integrated exactly:
\begin{eqnarray}
\int_{-1}^{1}1\,dx&=&2=\omega_0+\omega_1+\omega_2\,,\\
\int_{-1}^{1}x\,dx&=&0=\omega_0x_0+\omega_1x_1+\omega_2x_2\,,\\
\int_{-1}^{1}x^2\,dx&=&\frac{2}{3}=\omega_0x_0^2+\omega_1x_1^2+\omega_2x_2^2\,,\\
\int_{-1}^{1}x^3\,dx&=&0=\omega_0x_0^3+\omega_1x_1^3+\omega_2x_2^3\,,\\
\int_{-1}^{1}x^4\,dx&=&\frac{2}{5}=\omega_0x_0^4+\omega_1x_1^4+\omega_2x_2^4\,,\\
\int_{-1}^{1}x^5\,dx&=&0=\omega_0x_0^5+\omega_1x_1^5+\omega_2x_2^5\,,
\end{eqnarray}
the solution to these equations are $\omega_{0,1,2}=5/9, 8/9, 5/9$ and $x_{1,2,3}=-\sqrt{3/5},0,\sqrt{3/5}$. Below is a Python implementation:

In [2]:
import numpy as np
# Function to be integrated
def f(x):
    return np.sin(x)
# Gaussian Quadrature for N=2
def int_gaussquad2(lower_limit, upper_limit,func):
    N=3
    x = [-np.sqrt(3./5.),0.,np.sqrt(3./5.)]
    w = [5./9., 8./9., 5./9.]
    area = 0.
    for i in range(0,N):
        xp = 0.5*(upper_limit-lower_limit)*x[i]+0.5*(upper_limit+lower_limit)
        area += w[i]*func(xp)
    return area*0.5*(upper_limit-lower_limit)
        
        
a=0
b=np.pi
Area = int_gaussquad2(a,b,f)
print('Numerical value = ', Area)
print('Error           = ', (2-Area)) # Analytical result is 2 

Numerical value =  2.0013889136077436
Error           =  -0.0013889136077436248


Note that the Gaussian quadrature converges very fast. From $N=2$ to $N=3$ function evaluation we reduce the error (in this spefic case) from 6.5% to 0.1%. Our standard trapezoidal formula needs more than 20 function evaluations to achieve this. How can this be? If we use the standard Taylor formula for the function to be integrated, we know that for $N=2$ the Taylor formula must be integrated up to $x^3$, so the error term is proportional to $h^4f^{(4)}(\xi)$ (where $\xi$ is some x-value in $[a,b]$). $h$ is the step size, and we can replace it with $h\sim (b-a)/N$, thus the error scale as $c_N/N^4$ (where $c_N$ is a constant). Following the same argument, we find for $N=3$ that the error term is $h^6f^{(6)}(\xi)$ or that the error term scale as $c_N/N^6$. Each time we increase $N$ by a factor of one, the error term reduces by $N^2$. Thus if we evaluate the integral for $N=10$, increasing to $N=11$ will reduce the error by a factor of $11^2=121$. 

# Simpsons Rule
Simpsons rule is an improvement over the midpoint and trapezoidal rule. It can be derived in different ways, we will make use of the results in the previous section. If we assume that the second derivative is reasonably well behaved on the intervall $x_k$ and $x_k+h$ and fairly constant we can assume that $f^{\prime\prime}(\eta)\simeq f^{\prime\prime}(\overline{\eta})$, hence $E_T=-2E_M$.
\begin{eqnarray}
I(x_k,x_k+h)&=&M(x_k,x_k+h)+E_M\text{ (midpoint rule)}\\
I(x_k,x_k+h)&=&T(x_k,x_k+h)+E_T=T(x_k,x_k+h)-2E_M\text{ (trapezoidal rule)},
\end{eqnarray}
we can now cancel out the error term by multiplying the first equation with 2 and adding the equations:
\begin{eqnarray}
3I(x_k,x_k+h)&=&2M(x_k,x_k+h)+T(x_k,x_k+h)\\
&=&2f(x_k+\frac{h}{2}) h+\left[f(x_k+h)+f(x_k)\right] \frac{h}{2}\\
I(x_k,x_k+h)&=&\frac{h}{6}\left[f(x_k)+4f(x_k+\frac{h}{2})+f(x_k+h)\right].
\end{eqnarray}
Now we can do as we did in the case of the trapezoidal rule, sum over all the elements:
\begin{eqnarray}
I(a,b)&=&\sum_{k=0}^{N-1}I(x_k,x_k+h)=\frac{h}{6}\left[f(a)+ 4f(a+\frac{h}{2})+2f(a+h)+4f(a+3\frac{h}{2})+2f(a+2h)+\cdots+f(b)\right]\\
&=&\frac{h^\prime}{3}\left[f(a)+ f(b) + 4\sum_{k= \text{odd}}^{N-2}f(a+k h^\prime)+2\sum_{k= \text{even}}^{N-2}f(a+k h^\prime)\right],
\end{eqnarray}
note that in the last equation we have changed the step size $h=2h^\prime$.

## Which method to use in a specific case?
There are no general answers to this question, and one need to decide from case to case. If computational speed is not an issue, and the function to be integrated can be evaluated at any points all the methods above can be used. If the function to be integrated is a set of observations at different times, that might be unevenly spaced, I would use the midpoint rule: 
\begin{eqnarray}
I(a,b)&=&\int_a^bf(x)dx\simeq\sum_{k=0}^{N-1}M(x_k,x_k+h)=\sum_{k=0}^{N-1}h_if(x_k+\frac{h_i}{2})
\end{eqnarray}
This is because we do not know anything about the function between the points, only when it is observed, and the formula uses only the information at the observation points. There is a second more subtible reason, and that is the fact that in many cases the observations a different times are the {\it average} value of the observable quantity and it those cases the midpoint rule would be the exact answer.  

## Exercises

<ol>
    
  <li>Show that for a linear function, $y=a\cdot x+b$ both the trapeziodal rule and the rectangular rule are exact </li>
    <li> Consider $I(a,b)=\int_a^bf(x)dx$ for $f(x)=x^2$. The analytical result is $I(a,b)=\frac{b^3-a^3}{3}$. Use the Trapezoidal and Midpoint rule to evalute these integrals and show that the error for the Trapezoidal rule is exactly twice as big as the Midpoint rule. 
    </li>
    <li> Use the fact that the error term on the trapezoidal rule is twice as big as the midpoint rule to derive Simpsons formula: $I(a,b)=\sum_{k=0}^{N-1}I(x_k,x_k+h)=\frac{h}{6}\left[f(a)+ 4f(a+\frac{h}{2})+2f(a+h)+4f(a+3\frac{h}{2})+2f(a+2h)+\cdots+f(b)\right]$ (Hint: $I(x_k,x_k+h)=M(x_k,x_k+h)+E_M$(midpoint rule) and 
    $I(x_k,x_k+h)=T(x_k,x_k+h)+E_T=T(x_k,x_k+h)-2E_M$(trapezoidal rule).)
    </li>
  <li> Derive a $N=2$ ($f(x)=1,x,x^3$) Gaussian quadrature rule for $\int_{a}^{b}x^{1/3}f(x)\,dx$. 
    </li>
    <li> Integrate $\int_0^1x^{1/3}\cos x\,dx$ using the rule derived in the excersice above and compare with the standard Gaussian quadrature rule. 
    </li>
     <li> Make a Python program that uses the Midpoint rule to integrate experimental data that are unevenly spaced and given in the form of two arrays.  </li>
</ol>

Advanced:

* Modify the code to support multiple materials. Remember to use harmonic averaging for the cell face properties.