# Intégrer numériquement avec les approximations de _Newton-Cotes_

**Newton-Cotes** integration involves interpolating over a given set of $n$ points either to rewrite a difficult or unsolvable integral in order to approximate its area or just to integrate a data set which expression is not explicitly known.
To simplify this process and avoid the general issues that arise with single-polynomial interpolation for large $n$, we may decide to split the data set in sub-intervals, then summing the resulting integrals arising from that subdivision. Symbolically:
$$I = \int_a^b f(x)dx = \sum_{i=0}^{n} \int_{x_i}^{x_{i+m}} f(x)dx$$

where $m$ also happens to be the degree of the piecewise polynomials we define.
$[a,b]$ can be decomposed in $n$ subintervals $n = \frac{b-a}{h}$.
For simplicity let's start with approximating the data set with lines $\implies m = 1$:
$$\int_{x_i}^{x_{i+1}}f(x)dx$$


Let's use the GN-forward formula up to the first order:
$$f(x_i+rh) \simeq f_i + r\Delta f_i$$
and now we want to somehow make the distance $r$ become our integration variable.
Remember that $x = x_i + rh$, so $dx = hdr$. Considering the previous integral, $x = x_i \implies r = 0, x = x_{i+1} = x_i + h \implies r = 1$. So,
$$\int_{x_i}^{x_{i+1}}f(x)dx \simeq \int_0^1 (f_i + r\Delta f_i)hdr = hf_i\int_0^1dr + h \Delta f_i \int_0^1 rdr$$

$$\int_0^1 dr = 1$$ 
$$\int_0^1 rdr = \frac 12$$

$$\implies h\left(f_i + \frac 12 \Delta f_i\right) = h\left(f_i + \frac 12 (f_{i+1} - f_i) \right) =\frac h2 (f_i + f_{i+1})

and this result is the ***trapezium rule***. This is only valid with an odd number of points, which allows an even number of intervals. The error in this formula is given by the integral of the $m+1$-th order term:
$$\Delta^2 f_i \int_0^1 \frac{r(r-1)}{2}hdr = \frac h2 \Delta^2 f_i \int_0^1 [r^2 - r] dr=\\-\frac h{12} \Delta^2 f_i

Remember that $\Delta f(x) = f(x+h) - f(x)$, for which, for god-knows-why, we can write the Taylor expansion, resulting in $$\Delta f(x) \simeq hf'(x)\implies \Delta^2 f(x) \simeq h^2 f''(x)$$ Let's denote the error term's $x$ as $\xi$: $$-\frac{h}{12}\cdot h^2 f''(\xi) = \left|\frac{h^3}{12}f''(\xi)\right|$$ So the completed trapezium rule becomes $$\int_{x_i}^{x_{i+1}} f(x)dx \approx \frac h2 \sum_{i=0}^n [f_i + f_{i+1}]\pm\left|\frac{h^3}{12}f''(\xi)\right|\\=\frac h2 \left[(f_0 + f_{n+1}) + 2\sum_{i=0}^{n} f_i\right]\pm\left|\frac{h^3}{12}f''(\xi)\right|$$

The second formula is equivalent to the simpler-looking first formula, but the second one is **less computationally expensive** as it (almost) halves the amount of values being called in the sum (first formula will look like $f_0+f_1+f_1+f_2+f_2+f_3\dots f_{n+1}$, second formula gets rid of the duplicate calls).

It should be noted that $\left|\frac{h^3}{12}f''(\xi)\right|$ is actually the error of the single $i$-th step: we must sum the errors we accumulate approximating a curve over the $n$ intervals. Let's then index the i-th step error as $\xi_i$ and evaluate the *upper bound*, that is the highest error:
$$\sum_{i=0}^n \left|\frac{h^3}{12}\max{[f''(\xi_i)]}\right|$$

We could instead choose to get the maximum of the second derivative over the whole interval of integration $[a,b]$, which provides a general upper bound and avoids calculating $n$ times:
$$\left|\frac{h^3}{12}\right|\sum_{i=0}^n \left|\max_{[a,b]}{[f''(x)]}\right|$$

and having removed the index from the summand, this greatly simplifies (computationally speaking) the expression:

$$\xi = \sum_{i=0}^n \left|\frac{h^3}{12}f''(\xi_i)\right| \leq \left|\frac{h^3}{12}\right|\sum_{i=0}^n \left|\max_{[a,b]}{[f''(x)]}\right| = \\ = \left|\frac{h^3}{12}\cdot n \cdot \max_{[a,b]}{[f''(x)]}\right|

but remember that $n = \frac{b-a}{h}\implies nh = b-a$:
$$\left|-\frac{h^2}{12}\cdot (b-a) \cdot \max_{[a,b]}{[f''(x)]}\right|$$

We notice that the power of $h$ **has decreased**: considering $h$ generally is a small number (smaller than 1), $h^2$ appears actually as a worse error term than the $h^3$ we obtained from the Newton-Cotes error, which accounts for error only between two points within the interval. This is exactly what we wanted though: an upper bound.

Final note: the formula
$$\frac h2 \left[(f_0 + f_{n+1}) + 2\sum_{i=1}^{n} f_i\right]\pm\left|-\frac{h^2}{12}\cdot (b-a) \cdot \max_{[a,b]}{[f''(x)]}\right|$$
is called the ***composite Newton-Cotes*** formula (because we apply NC repeatedly to subintervals). This is especially useful for large intervals of integration, where $h$ may be too large to yield an acceptable final error.

## Example of code

Let's evaluate $$\int_{0}^{0.8} e^{x^2}dx$$ (with second derivative $2e^{x^2}(1+2x^2)$ for the error) with step sizes $h = 0.4, 0.2, 0.1, 0.05$, using the composite NC formula.

In [1]:
# In this first case a for loop is just overkill as we're dealing with just 3 function calls: f(0), f(0.4), f(0.8).

import numpy as np

f = lambda x: (np.e)**(x**2)
fp = lambda x: 2*f(x) * (1 + 2*(x**2))

h = 0.4
result = (h/2)*(f(0) + 2*f(0.4) + f(0.8))
error = ((h**2)/12) * 0.8 * fp(0.8) #why in 0.8 tho

print(result, "+-", error)

1.0487005242577145 +- 0.09224482996939287


Now to actually iterate:

In [2]:
import numpy as np

f = lambda x: (np.e)**(x**2)
fp = lambda x: 2*f(x) * (1 + 2*(x**2))

h = 0.2 #step, just vary it to see results
result = f(0)/2 + f(0.8)/2
error = ((h**2)/12) * 0.8 * fp(0.8)
arg = np.linspace(h, 0.8 - h, int((0.8 - h)/h)) #this could use more numbers but works in a general case as b-a/h is the num of intervals, whereas the sum needs to stop at the n-th term, which is the second to last step (ergo b - h)

for i in arg:
    result += f(i)

result = h*result

print(round(result, 9), "+-", round(error, 9))

print(h*(f(0)/2 + f(0.8)/2 + f(0.2) + f(0.4) + f(0.6)))

1.0191783 +- 0.023061207
1.019178299879403


# Simpson's rule

We now extend from using trapezoids to parabolic arcs, meaning we'll take 3 instead of 2 points per interval and interpolate a parabola for each of these. Clearly the amount of points must be odd (3 points will have one parabola, 5 points will have two parabola arcs since one point is shared...) and specifically we could say we must have $3+2k$ points. We're now dealing with the integral $$\int_{x_i}^{x_{i+2}}f(x)dx$$ and like previously done let's use the forward GN formula, this time up to the second order: $$ f(x_i+rh) = f_i + r\Delta f_i + \frac{r(r-1)}{2}\Delta^2f_i \qquad \left(\text{error term to work on: }\frac{r(r-1)(r-2)}{3!}\Delta^3f_i\right)$$
then integrating the above expression just like earlier $$\int_{x_i}^{x_{i+2}} f(x_i+rh)dx =h\int_{0}^{2} (f_i + r\Delta f_i + \frac{r(r-1)}{2}\Delta^2f_i)dr$$


evaluating each integral $$h\left[f_i\int_0^2 dr + \Delta f_i\int_0^2 rdr + \Delta^2 f_i/2 \int_0^2 (r^2-r) dr\right]=\\=h\left[2f_i + 2\Delta f_i + \frac 13 \Delta^2 f_i\right]$$

expand differences $$2\Delta f_i = 2(f_{i+1}-f_i) \\ \frac 13 \Delta^2 f_i = \frac 13(\Delta f_{i+1} - \Delta f_i) = \frac 13(f_{i+2}-2f_{i+1} + f_i)$$ and back to the above formula: $$h\left[\cancel{2f_i} - \cancel{2f_i} + \frac 13(f_{i+2}+4f_{i+1} + f_i)\right]$$ which is the **Simpson formula**. Looking at the formula we can see that the central point $f_{i+1}$ "weighs" more as it influences the parabolic arc's behavior.

Let's work on the error term $$\frac 16\int_0^2 {r(r-1)(r-2)}dr = \frac 16\int_0^2 r^3 - r^2 - 2r^2 + 2r dr = \frac 16\left[\frac{r^4}{4} - r^3 + r^2\right]_0^2 = 0$$ oopsie doodle you fucking noodle you thought you'd get something after a painful chain of products, sike start working on the 4th order dumbass

Let's pick up from the middle term and multiply $r-3$ (fundamentally keeping the product going like the first term):$$\frac1{24}\int(r^3 - 3r^2 + 2r)(r-3)dr = \dots \int (r^4 - 6r^3 + 11r^2 -6r)\,dr=\frac{1}{24}\left[\frac{r^5}{5}-\frac{3r^4}{2}+\frac{11r^3}{3} - 3r^2\right]_0^2=\frac{1}{90}$$
so our error will be $$\xi_{NC} = \left|-\frac{1}{90}h\Delta^4f_i\right|=\left|-\frac{h}{90}h^4 f^{(4)}(\xi)\right|\simeq \left|-\frac{h^5}{90}f^{(4)}(\xi)\right|$$

This degree of accuracy for a fairly simple-looking formula is what makes the Simpson formula extremely handy in numerical integration. Let's find the composite version of it: $$\frac h3 \sum_{i=0(2)}^{n-2}(f_{i+2}+4f_{i+1} + f_i) = \frac h3 \left[f_0 + f_n + 4\sum_{i=1(2)}^{n-1}f_i + 2\sum_{j=2(2)}^{n-2}f_i\right]$$

where the number in brackets indicates the summation step. For the error we take $$\xi_C=\left|-\frac{h^5}{90}\sum_{2(2)}^{n}\max f^{(4)}(\xi)\right|=\left|-\frac{1}{90}\frac{(nh)h^4}{2}\max_{[a,b]} f^{(4)}(\xi)\right|=\left|\frac{b-a}{180}h^4\max_{[a,b]} f^{(4)}(\xi)\right|$$ where we notice the same decrease in degree.

# Open integration

As opposed to closed integration, in this case the function's value is known over the whole interval *except* on the extrema. There's also the *half-open* case, with just one known bound.
This type of integration is useful for functions where the bounds are undefined/infinite. This naturally implies we need *at least* 4 points: $$\int_{x_i}^{x_{i+3}} f(x)dx$$ where $f(x_i)$ and $f(x_{i+3})$ aren't known.

With 4 points for instance we'll use the first order forward GN formula, **offset by 1** (because $f_{i+1}$ is the first known point): $$x = x_{i+1}+rh \implies r = -1, x_{i+3} \implies r = 2;\\ \int_{-1}^2 h(f_{i+1} + r\Delta f_{i+1})dr = \frac 32 h (f_{i+1} + f_{i+2}) $$

...which we never use. So why the fuck did I note this down in the first place?
Since lines are for lazy asses ~~like me~~ we quickly move on to the parabolic version of it $$\int_{x_i}^{x_{i+4}} f(x)dx = \frac 43 h (2f_{i+1} - f_{i+2} + 2f_{i+3}), \xi = \frac{14}{45} h^5 f^{(4)}(\xi_i)$$
and this open version of Simpson's formula is called **Milne's formula**. This is used more.