<a href="https://colab.research.google.com/github/cu-applied-math/SciML-Class/blob/main/Labs/lab10_curse_of_dimensionality.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import numpy as np
from scipy.integrate import quad, nquad, simpson
from matplotlib import pyplot as plt

# Lab 10: curse of dimensionality and integration

The "curse of dimensionality" is a term thrown around a lot these days, and it means different things to different people and in different contexts.  One context in which it is used is when the work required to reach a fixed accuracy grows exponentially in the dimension.

We'll explore this phenomenon in the context of integration, and compare two methods (quadrature vs Monte Carlo) which have very different pros and cons.

- Bonus: if you're interested in a method that mixes the pros and cons of each approach, look into **quasi Monte Carlo** methods. I have a [demo on quasi-Monte Carlo](https://github.com/stephenbeckr/randomized-algorithm-class/blob/master/Demos/demo14_MonteCarlo_and_improvements.ipynb) that I made for a different class

## Part 1: quadrature in 1D

"Quadrature" refers to numerical integration where we sample points from the domain and use specific weights associated with those points to form our estimate of the integral (in contrast, Monte Carlo and Quasi-Monte Carlo are sometimes defined to be estimates in which the weights never change from point to point, though that definition is too vague to be illuminating at this point).

For example, the midpoint rule, [trapezoidal rule](https://en.wikipedia.org/wiki/Trapezoidal_rule), and [Simpson's rule](https://en.wikipedia.org/wiki/Simpson%27s_rule) (and their composite version) are quadrature methods, as are things like [Gauss-Legendre](https://en.wikipedia.org/wiki/Gaussian_quadrature) and [Clenshaw-Curtis](https://en.wikipedia.org/wiki/Clenshaw%E2%80%93Curtis_quadrature).

We won't investigate these in detail, as we'll use `scipy.integrate.quad` to do the work for us (and it adaptively chooses the number of nodes in order to reach a target accuracy), but it helps to think of these as just Riemann sums.

Throughout this lab, we'll work with the function $f(x)=\sin(x)$, or in higher dimensions with $\vec{x}\in\mathbb{R}^d$, then
$$f(\vec{x}) = \prod_{i=1}^d \sin(x_i)$$
and our goal is to estimate its integral
$$\widetilde{I} = \int_{a}^{b} f(x)dx$$
or more generally if $d\ge 1$
$$\widetilde{I} = \int_{a}^{b}\int_{a}^{b} \ldots \int_{a}^{b} f(\vec{x})dx_1 dx_2 \ldots dx_d.$$
We chose $f$ to be a simple function so that you can workout the answer by hand.

When $d>1$, let's actually look at
$$I = \widetilde{I}^{1/d}$$
since the value of this will not depend on the dimension.

### Part 1a: determine the value of $I$ by hand for $a=0$ and $b=\pi$
... so that you can check your answer.

Do this for any **arbitrary** dimension $d\in\mathbb{N}$.

And in fact if $b=k \pi$ for any integer $k$, you can easily work out the true answer also.

In [3]:
trueIntegral = ... TODO ...

# You can either hardcode the value for [0,pi]
# or make a function that takes in any [a,b]. Either way is fine

### Part 1b: accuracy vs number of points
For $a=0$ and $b=\pi$, compute the quadrature with a varying number of points, and **plot** the error vs the number of points.

Do this for 1d only (for now)

Do this using Simpson's rule (`scipy.integrate.simpson`), giving it a fixed grid of nodes (e.g., `x=np.linspace(...)`) so that you can control the accuracy (if you use `scipy.integrate.quad`, it's a bit harder to force it to give you low accuracy)

- **Bonus** question: if we choose $b=2\pi$ (and adjust our true answer accordingly), how does the error decay with the number of points? Did you expect this?  (good undergrad numerics books should discuss this; the phenomenon is **spectral accuracy**; see, e.g., [this paper](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=29d832b1235e61ad5445bceb90ecf8a2bf857044) which has links to some textbooks in the references).

In [None]:
def f(x):
    return np.sin(x)
a = 0
b = np.pi

nNodes = 100 # for example...


x = np.linspace(a, b, nNodes)
integral = simpson(f(x),x=x) # this is how to use `simpson`
print(integral)

# Now do this for a range of nNodes values, save the output, and plot it
# ...

### Part 1c: quadrature error as a function of dimension
Now we can use `scipy.integrate.quad` (for 1D) or `scipy.integrate.nquad` (for arbitrary dimensions).

#### (i) Run `scipy.integrate.quad` in 1D (still for $a=0, b=\pi$) and check your answer to confirm that you're using it correctly.




In [None]:
dim  = 1
a, b = 0, np.pi

ranges = [[a,b]]
def f(x):
    return np.sin(x)

stuff = nquad( ... )
# If you want to get more information back, use nquad( ... , full_output=True )

stuff # stuff[0] is the value, stuff[1] is the error estimate

#### (ii) Then run `scipy.integrate.nquad` in 2D (now for the square $[0,\pi]^2$) and check your answer to confirm that you're using it correctly.


In [39]:
# ... TODO ...

#### (iii) Then make a plot for # of function evaluations as a function of dimension, for dimensions $d=1,2,3,\ldots$ (as large as you can go)

In [40]:
# ... TODO ...

## Part 2: Monte Carlo

We think of $\int_{a}^b f(x)dx$ as an expected value, like
$$E[\tilde{f}] := \int_{a}^b \tilde{f}(x) p(x) dx$$
for a probability distribution $p$ (i.e., $\int_a^b p(x)dx = 1$ and $p(x)\ge 0$), with $\tilde{f}(x) := f(x)/p(x)$.

The simplest choice is the uniform distribution for $p$, so
$$p(x) = \frac{1}{b-a}.$$

Then to estimate the value of the integral, we draw random samples $x$ from this uniform distribution $p$, and then evaluate $\tilde{f}(x)$. We do this many times, to get a **sample mean**, and use that to approximate the true mean.

That's it!

### Part 2a: Monte Carlo in 1D
Plot error as a function of number of sampled points

In [42]:
dim = 1
nPoints = int(1e6)
b = np.pi
a = 0

def f(X):
    """ sin(x) where x may be a matrix;
    in that case, it returns one output per row.
    For a given row, this does the product of sin(x_i) for each column i
    """
    return np.sin(X).prod(axis=1)

# Efficient way to get f(x) for a range of x values:
X = np.random.uniform(a,b, (nPoints, dim))
fX = f(X)

# Now, do something with all these values...
# ... TODO ...

### Part 2b: Monte Carlo in high dimensions

For a range of dimensions, look at the error of Monte Carlo using 1e4 points. The answer will be random, so let's do this 100 times and average.

In [43]:
def f(X):
    """ sin(x) where x may be a matrix;
    in that case, it returns one output per row.
    For a given row, this does the product of sin(x_i) for each column i
    """
    return np.sin(X).prod(axis=1)

nPoints = int(1e4)
b = np.pi
a = 0

# ... TODO ...

## Part 3: writeup
To get credit for the lab,
- write a sentence or two about your observations, and...
- from part 1b, using about 100 function evalutions, what was your accuracy?
- from part 1c, in dimension 4, how many function evaluations did you use?
- from part 2, in dimension 1, with 1000 function evaluations, what was the accuracy?'