## Diffusion from displacement of particles
In this part of the lab we are going to calculate the self-diffusion of water from data of the positions over time. In the program, a textfile called _msd.txt_ is generated after a simulation has been run, containing the mean squared displacement of water as a function of time. The mean squared displacement of water is calculated by summing the squares of the oxygen displacements from their respective reference positions determined by their start positions, $x_{0,n}$, at predefined points in time. Then, for each sampled point in time, the sum is divided by the total number of water molecules,
$$
\begin{align}
\langle(x(t)-x(t=0))^2\rangle = \frac{1}{N}\sum\limits_{n=1}^N (x_n(t)-x_n(t=0))^2
\label{msd}
\tag{1}
\end{align}
$$
where $N$ is the total number of particles. 

If particles interact, they will establish a random walk behaovior after enough "collisions" have ocuured, causing the mean squared displacement to increase lineary with time. However, in the start of a simulation, it is common that one observes a non-linear dependence of the mean squared displacement on time. This is known as the _ballistic_ regime, taking place because not enough collisions have occured to establish the random walk behavior. Mathematically, this can be shown by considering two different cases,


$$
\begin{equation}
x_1(t+\Delta t) = x(t) \pm \delta
\label{diff1}
\tag{2}
\end{equation}
$$   

$$
\begin{equation}
x_2(t+\Delta t) = x(t) + \delta
\label{diff2}
\tag{3}
\end{equation}
$$   
where equation \ref{diff1} describes random walk motion, with equal probabilities to move particles in the positive and negative direction, whereas equation \ref{diff2} describes ballistic motion, only displacing particles in the positive direction. The displacement parameter, $\delta$, is assumed to keep constant over time. 

1. Show mathematically that random walk diffusion increases linearly with time, whereas the ballistic one increases linearly with $t^2$, using equations \ref{diff1} and \ref{diff2}.

_Hint_: First consider one particle and sum over $n$ time steps. Then, average this sum by summing over $N$ particles. The first random walk mean squared displacement will show to increase linearly with time, whereas the ballistic one will increase linearly with $t^2$.
2. Run simulations with 216 water molecules for a series of temperatures (4-5 different, you choose which) and determine the diffusion coefficient of water for each temperature using the generated textfile _msd.txt_. Plot the mean squared displacements for all temperatures in the same plot, calculate the corresponding diffusion coefficients and report the values in a table. 

_Hints_: 

i) The diffusion coeficient can be obtained using the relation,
$$
\begin{equation}
D = \frac{\langle(x(t)-x_0)^2\rangle}{2 Nt}
\tag{4}
\end{equation}
$$

ii) The curve for mean squared displacement as a function of time is often not linear. To obtain a reasonable value of the diffusion coefficient, one most commonly makes a linear regression of the part of the curve that shows nearly linear behavior. This is done below using the package stats of scipy:

```python
slope, intercept, r_value, p_value, std_err = stats.linregress(time_array,msd_array)
```

In [None]:
# Loading packages
import matplotlib.pyplot as plt
import numpy as np
import os
from scipy import stats
from tempfile import mkstemp
from shutil import move
from os import fdopen, remove

In [None]:
# Naming directories to be used
wdir = '../md-project/'
datadir = '../md-project/md_analysis/data/'
os.chdir('../md')

In [None]:
# Defining function to replace the value of the input temperature in "config.mo" 
def replace(file_path, pattern, subst):
    with open(file_path,"r+") as f:
        new_f = f.readlines()
        f.seek(0)
        for line in new_f:
            if pattern not in line:
                f.write(line)
            else:
                f.write(subst+"\n")
        f.truncate()

In [None]:
Ts = [] # Enter the temperatures you want to simulate, separated by a comma
iterations = 50000 # multiply this with the timestep (0.001 ps) and you get the simulation time in ps

replace('config.mo', 'iter', 'iter '+str(iterations))

for T in Ts:
    print(T)
    replace('config.mo', 'temperature', 'temperature '+str(T))
    !./build/main WAT_216.equil.gro params_prep.mo config.mo
    os.rename('msd.txt', 'msd_T_'+str(T)+'.txt')
    

In [None]:
for T in Ts: # Looping over temperatures
    time = np.loadtxt("msd_T_"+str(T)+".txt", usecols=0) # Loading time array for temperature T
    msd = np.loadtxt("msd_T_"+str(T)+".txt", usecols=1) # Loading mean squared displacement for each time, averaged over all oxygens, for temperature T

    ### Plot the msd vs time
    plt.plot(time, msd, label="T = "+str(T)) # To do: fill in the empty bracket
plt.xlabel('$time$ / ps', fontsize=13) # Assigning x-axis label
plt.ylabel(r'$\langle(x(t)-x_0)^2\rangle$ / (nm$^2$ ps$^{-1})$', fontsize=13) # Assigning y-axis label
plt.legend()
plt.savefig('msd_vs_time.pdf')

In [None]:
i = 0
D = np.zeros([len(Ts)])
N = 3  # Number of dimensions (to be used in the Einstein relation)

for T in Ts: # Looping over temperatures
    time = np.loadtxt("msd_T_"+str(T)+".txt", usecols=0) # Loading time array for temperature T
    msd = np.loadtxt("msd_T_"+str(T)+".txt", usecols=1) # Loading mean squared displacement for each time, averaged over all oxygens, for temperature T
    slope, intercept, r_value, p_value, std_err = stats.linregress(time,msd)
    D[i] = slope/(N*2)*1e-2
    print("D(T="+str(T)+" K) = ", D[i])
    i = i+1



3. Plot the diffusion coefficient as a function of temperature in units cm$^2$/s and K, respectively. Discuss what you observe. Why does diffusion increase with temperature? Use keywords like entropy/thermal energy, enthalpy and local minima.  

In [None]:
plt.plot(Ts, D) # Enter arguments (temperature and diffusion)
plt.xlabel('$T$ / K')
plt.ylabel('$D$ / (cm$^2$ s$^{-1}$)')