In [None]:
%matplotlib inline
# Importing necessary packages:
import re
from glob import glob

import numpy as np
import datetime
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import MDAnalysis as mda
from MDAnalysis.analysis import diffusionmap, align, rms
from MDAnalysis import transformations as mdatransform

from PipeLine import *

In [None]:
fnames = glob("../sumrule_test/*.bug.*")
fnames = PipeLine.file_reader(fnames)

In [None]:
fnames = PipeLine.file_reader(fnames)

In [None]:
%%time
geom = 'cylinder'
PipeLine.analyze_trj(fnames[4], geom)
PipeLine.trj_rmsd(fnames[4], geom)

# Statistical Confidence in MD measurement
Due to the finite time of a molecular dynamics, there is always some degree of correlation within a set of computational measurement. To esimate this correlation and incorporate it in the error analysis, *statistical inefficiency (SI)* is defined based on the relation between the variance of the mean and the autocorrelation function of the quantity of interest, and then, it is measured using different computational techniques ([Tildesley](https://doi.org/10.1093/oso/9780198803195.001.0001), [Rapaport](https://doi.org/10.1017/CBO9780511816581))


Assuming $\{R_N\}$ as the data set of chain size $R$, the mean and variance of the mean are respectively
$$\langle R\rangle=\frac{1}{N}\sum_{i=1}^{N}X_i$$
$$\sigma^2(\langle R\rangle)=\frac{s}{N}\sigma^2(R)$$
where $\sigma$ is the bias-corrected variance of the data set
$$\sigma^2(R)=\frac{1}{N-1}\sum_{i=1}^{N}(R_i-\langle R\rangle)^2$$
and $s$ is the SI and estimated by means of the blocking method ([Flyvbjerg](https://doi.org/10.1063/1.457480)). In this method, the original data set is sequentially chunked in $N_{block}=\{1,2,4,8, 2^{M}\dots <N\}$ blocks. The data in each set of new blocks are the mean of $N_{block}$ data in the original data set. In each blocking tranform, the SE $s$ is measured in the following way
$$s_{block} = \frac{L_{block}\sigma^2_{block}}{\sigma^2(R)}$$
Where $L_{block}$ is the size of blocks after $M-\text{th}$ transfrom and $\sigma^2_{block}$ is variance of the block mean. For $M=1$, we have the trivial value of $1$. Using the blocking method, it is also possible to find error in $\sigma^2_{block}$ 
$$\Delta\sigma^2_{block}=\sqrt{\frac{2}{L_{block}-1}}\sigma^2_{block}$$
 the SI of the original data $s$ is the integer found by fitting a line to the platuae of $s_{block}$ vs $\frac{1}{L_{block}}$ diagram.
 
According to [Tildesley](https://doi.org/10.1093/oso/9780198803195.001.0001), 
$$ s = \frac{2\tau_{cor}}{dt}$$
where $\tau_{cor}$ and $dt$ are the correlation time of the data set and the time interval between stored configuration.