**Blocking data and Monte Carlo methods**

In the first lesson, we've been introduced how to manage data blocking and some first examples of Monte Carlo methods associated.
Our first exercise consists in computing the simple, following integral, whose value is aknowledged:

$\left< r \right> = \int^1_0 r \, dr = \frac{1}{2}$

We'll see soon why integral can be represented as an average in the uniform interval $[0;1]$; as we have a measure, there'll be a specific error associated to this measure. We can do this by use of the central limit theorem: 

$$ \int_a^b f(x) \, dx = (b-a) \lim_{N \to \infty} \frac{1}{N}\sum_{i=0}^N f(x_i) = (b-a) \left<f\right>_{[a;b]}        $$

As our interval lies in [0;1], the previous integral can be rewritten as

$$ \int_0^1 r \, dr = \left< r \right>_{[0;1]} $$

So all that we need is to generate a huge number ("close to" $\infty$) of random variables in a uniform interval [0;1], let's say $10^5$ measures. As infinity is far beyond our computational power, there'll be always an error associated to our measure, a sort of intrinsic bias for our calculation, which must be reported. If the expected value falls in this bias range, everything it's all right, because there'll be always an unavoidable uncertainty linked to every measure.

Nevertheless, there are many methods to lower uncertainty: one of these is data blocking. What we have to do is to split our set of M measures in N subsets: after that, we estimate an average and a variance for each set. At the end we may want to extract a global average and a standard deviation: while the average over averages is nothing but the total average, it's demonstrated that global standard deviation $\sigma$ is less or equal than standard deviation evaluated without blocking. This can be computed as follows:

$$\sigma = \sqrt{\sum_{i=0}^{N} \frac{\left<r_i^2\right>}{N} - \left(\frac{\left<r_i\right>}{N}\right)^2  }$$

In this case, N is the number of blocks, $\left<r_i\right>$ and $\left<r^2_i\right>$ refer not the single measures, but to the average over a $L=M/N$ number of samples ($L$ is the block lenght). We report the graph of these blocks averages and uncertainties by varying the number of blocks $N$: the lower $N$, the higher the uncertainties (for $N=1$ we have no blocking). Furthermore, the higher the value of $N$, the higher the numbers of data we have to plot. As the previous integral result is well known (call it $r_k=0.5$) and even its uncertainty ($\sigma_k=1/12$), we'll show how ($\left<r\right>-r_k$) value and its uncertainty ($\left<\sigma\right> -\sigma_k$) will approach null value. Graphs and data are generated by the script.py program:

Here we go with N=10:

<img src="graphs/av10blocks.png" />

<img src="graphs/var10blocks.png" />

Let's see what happens with N=100:

<img src="graphs/av100blocks.png" />

<img src="graphs/var100blocks.png" />

At last, N=1000:

<img src="graphs/av1000blocks.png" />

<img src="graphs/var1000blocks.png" />

At last, we may use a $\chi$ squared test for each of our trials:

|  N   |     $\chi^2$    |  
|------|------|  
|  10  |   0.000106740084413327    |  
| 100  |   0.0013757832587935742   |
| 1000 |   0.01555873761507975     |


Which means, the lower is N, the lower is compatibility between expected and observed data; this because

**Sampling Distributions**

In the second exercise we want to sample a simple, uniform dice distribution, an "exponential" dice and a lorentzian one. This means, we have to generate a random number uniformly in the interval $[0;6]$ for the first case, which is quite simple if you have a good generator; for the latter cases, the work it makes harder.

To sample a non-uniform distribution we can make use of the cumulative distribution inversion: given a $p(x)$ distribution, its cumulative $F(x)$ is defined as

$$ F(x) = \int_{-\infty}^x p(y) \, dy $$

Inversion theorem guarantees us that 

$$ x = F^{-1}(y) $$

where $y$ is a random variable extracted uniformly from $[0;1]$ interval, while $x$ is our $p$-extracted random uniform. All we need to do is to evaluate $F^{-1}(x)$ for $x \in [0;1]$. For exponential distribution, we have

$$ p(x) = e^{-\lambda x} \Rightarrow F^{-1}(x) = - \frac{1}{\lambda}\log(1-x) $$

while for lorentzian distribution we have

$$ p(x) = \frac{1}{\pi} \frac{\Gamma}{(x-\mu)^2 + \Gamma^2} \Rightarrow F^{-1}(x) = \frac{1}{\pi}\arctan\left(\frac{x-\mu}{\Gamma}\right) + \frac{1}{2}
$$

For our purposes, we'll set $\lambda, \Gamma=1$ and $\mu=0$. Let's generate $10^4$ data for each of these distributions and plot them filling a histogram: data are generated by main.cpp program through an algorithm implemented by random.cpp, then data are exported in linear_1.csv, exp_1.csv and lorentz_1.csv and plotted by histo.py script. 

<img src="graphs/dice1.png" />

<img src="graphs/exp1.png" />

<img src="graphs/lorentz1.png" />

Here we may see how data fit a uniform distribution (dice), an exponential and a lorentzian one. What if we average over data many times? What we have to do is to generate n sets of our $10^4$ data, then compute an average over them and fill a new histogram. Let's try with n=2: this time, data are exported in linear_2.csv, exp_2.csv and lorentz_2.csv. Remember, this time you have to switch n parameters from 1 to 2 at line 5 in histo.py, otherwise you'll plot the previous histogram another time; what we obtained is

<img src="graphs/dice2.png" />

<img src="graphs/exp2.png" />

<img src="graphs/lorentz2.png" />

Now what we observe is that uniform distribution is approaching a gaussian one and the exponential differs a little from the previous one, while the lorentzian seems to be just the same. Let's proceed with n=10 and n=100:

<img src="graphs/dice10.png" />

<img src="graphs/exp10.png" />

<img src="graphs/lorentz10.png" />

<img src="graphs/dice100.png" />

<img src="graphs/exp100.png" />

<img src="graphs/lorentz100.png" />

As we see, all of our distributions approach the gaussian one, except for the lorentzian. This because gaussian and lorentzian distributions are basins of attraction for other distributions: summing up $S_n$ values sampled from an exponential or uniform distribution, these will converge to a gaussian one, while lorentzian ones will converge nothing but to the same lorentzian distribution (as the sum of gaussian values will converge to a gaussian distribution).