# Deep Dive: What is analyse_freenrg doing?

The freenrgs.s3 file produced by waterswap is in the same format as for ligandswap and proteinswap. This means that, just like for ligandswap, you can write your own scripts to investigate the free energy prediction more closely.

To do this, we need to import the `Sire.Stream` module, which reads `.s3` files, and also pandas and MatplotLib so that we can draw some graphs :-)

In [None]:
import pandas as pd
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'   # helps make things look better in Jupyter :-)

import Sire.Stream

`Sire.Stream.read` will read all of the Python objects that are contained in a `.s3` file. The `freenrgs.s3` file contains three objects:

* bennetts - the object containing all of the data needed for a Bennetts calculation of the free energy
* fep - the object containing all of the data needed for a FEP calculation of the free energy
* ti - the object containing all of the data needed for a TI calculation of the free energy

We can load this into the notebook by using

In [None]:
(bennetts, fep, ti) = Sire.Stream.load("output/freenrgs.s3")

Here is the function to plot a PMF using Matplotlib

In [None]:
def plotPMF(pmf):
    x = [point.x() for point in pmf.values()]
    y = [point.y() for point in pmf.values()]
    d = DataFrame( index=x, data={"free energy":y} )
    d.plot()

And here are the PMFs plotted by merging together iterations 400-1000 for each method

In [None]:
plotPMF( bennetts.merge(400,1000).sum() )

In [None]:
plotPMF( fep.merge(400,1000).sum() )

In [None]:
plotPMF( ti.merge(400,1000).integrate() )

As for ligandswap, you can check your assumption to discard the first 40% of iterations as equilibration by plotting convergence.

In [None]:
def plotConvergence(data):
    x = [i for i in range(1,1001)]
    b = [data[i].sum().values()[-1].y() for i in range(1,1001)]
    d = DataFrame( index=x, data={"free energy":b} )
    d.plot()
    
plotConvergence(bennetts)

You can also work out your own method of combining together the different predictions into a single number. Like ligandswap, I prefer a linear or weighted average of FEP, TI and Bennetts is acceptable, e.g. `0.5 * bennetts + 0.3 * TI + 0.2 * FEP` (with those numbers based on my personal relative feeling of how much I trust each method). The error should be the spread between the different methods, e.g.

In [None]:
b = bennetts.merge(400,1000).sum().values()[-1].y()
f = fep.merge(400,1000).sum().values()[-1].y()
t = ti.merge(400,1000).integrate().values()[-1].y()

average = 0.5*b + 0.3*f + 0.2*t
error = 0.5 * (max(f,max(b,t))-min(f,min(b,t)))

print("Bennetts = %s, FEP = %s, TI = %s" % (b,f,t))
print("Result is %s +/- %s kcal mol-1" % (average,error))

So, for the above, I would round to 0 decimal places and report the result as 33 +/- 3 kcal mol-1 (I tend to round up errors)

The error is larger than for ligandswap because this is a larger free energy change. One default swap water cluster was, perhaps, too large as well, as there were too many water molecules to fit into the volume of the ligand. One way to reduce the error would be to reduce the number of swapped water molecules by choosing identity points manually to distribute them across the ligand, and then visualising the `swapcluster01.pdb` file to ensure that the water molecules fit nicely under the ligand.