# Diffusion coefficient from a VASP file

Using ``kinisi`` for the analysis of a VASP Xdatcar type file is really straightforward and involves using the ``DiffusionAnalyzer`` class.

In [None]:
import numpy as np
from kinisi.analyze import DiffusionAnalyzer
from pymatgen.io.vasp import Xdatcar
np.random.seed(42)

There the ``p_params`` dictionary describes details about the simulation, and are documented in the [parser module](./parser.html).

In [None]:
p_params = {'specie': 'Li',
            'time_step': 2.0,
            'step_skip': 50,
            'min_obs': 50}

In [None]:
xd = Xdatcar('example_XDATCAR.gz')
diff = DiffusionAnalyzer.from_Xdatcar(xd, parser_params=p_params)

In the above cells, we parse and determine the uncertainty on the mean-squared displacement as a function of timestep. 
We should visualise this, to check that we are observing diffusion in our material and to determine the timescale at which this diffusion begins. 

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.errorbar(diff.dt, diff.msd, diff.msd_std)
plt.axvline(diff.ngp_max, color='g')
plt.ylabel('MSD/Å$^2$')
plt.xlabel('$\Delta t$/ps')
plt.show()

The vertical green line indicates the start of the diffusive regime, based on the maximum of the [non-Gaussian parameter](https://doi.org/10.1073/pnas.1900239116). 
We can also visualise this on a log-log scale, which helps to reveal the diffusive regime. 

In [None]:
plt.errorbar(diff.dt, diff.msd, diff.msd_std)
plt.axvline(diff.ngp_max, color='g')
plt.ylabel('MSD/Å$^2$')
plt.xlabel('$\Delta t$/ps')
plt.yscale('log')
plt.xscale('log')
plt.show()

It appears that the non-Gaussian parameter does an okay job at predicting the start of the diffusive regime for this system, so we will set `use_ngp` to `True` such that this is used in the fitting.
If we wanted to use a different estimate, say 4 ps which looks more like the real start, we could use the keyword argument of `dt_skip=4`. 
The `d_params` dictionary defines parameters about the diffusion estimation, documented in the [diffusion module](https://kinisi.readthedocs.io/en/latest/diffusion.html#kinisi.diffusion.MSDBootstrap). 

In [None]:
d_params = {'use_ngp': True}

We can then run the diffusion analysis as follows. 

In [None]:
diff.diffusion(d_params)

This method estimates the correlation matrix between the timesteps and uses likelihood sampling to find the diffusion coefficient, $D$ and intercept (the units are cm<sup>2</sup>/s and Å<sup>2</sup> respectively).
Now we can probe these objects.

We can get the median and 95 % confidence interval using, 

In [None]:
diff.D, diff.intercept

Or we can get all of the sampled values from one of these objects. 

In [None]:
diff.D.samples

We can plot the data and some of the samples from the $D$ and $D_{\text{offset}}$ distribution (the `60000` is to convert the units).

In [None]:
plt.errorbar(diff.dt, diff.msd, diff.msd_std)
for i in np.random.choice(diff.D.size, size=100):
    plt.plot(diff.dt, diff.D.samples[i] * 60000 * diff.dt + diff.intercept.samples[i], 'k', alpha=0.1, zorder=10) 
plt.axvline(diff.ngp_max, color='g')
plt.ylabel('MSD/Å$^2$')
plt.xlabel('$\Delta t$/ps')
plt.show()

Additionally, we can visualise the distribution of the diffusion coefficient that has been determined.

In [None]:
plt.hist(diff.D.samples, density=True)
plt.axvline(diff.D.n, c='k')
plt.xlabel('$D$/cm$^2$s$^{-1}$')
plt.ylabel('$p(D$/cm$^2$s$^{-1})$')
plt.show()