# Quantifying "clumpiness" in solution: the radial distribution function

It was easy enough for us to see that the butane molecules clumped together (i.e., aggregated) in water, but how could we quantify this observation?

One possibility is to compute a **radial distribution function**. Briefly, a radial distribution function (or **rdf**) computes the probability of finding a given pair of atoms at each distance and normalizes this by the average number density of those atoms in the simulation box. That might sound complicated – and it is! – but the basic idea is this: an rdf value of 1.0 indicates that probability of finding that pair of atoms at the specified distance is equal to what we would expect if the atoms were uniformly distributed (no clumping!) in the simulation box. rdf values greater than 1.0 indicate that there are more atom pairs observed at that distance than expected (enrichment, indicative of aggregation), whereas values less than 1.0 indicate that there are fewer atom pairs observed at that distance than expected (depletion, indicative of exclusion). 

For more info about rdfs, please see: https://en.wikibooks.org/wiki/Molecular_Simulation/Radial_Distribution_Functions

In [None]:
import mdtraj
import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ['svg']

In [None]:
traj = mdtraj.load('butane-water_sim.dcd', top='butane-water.prmtop')
atoms, bonds = traj.topology.to_dataframe()
atoms.head()

After reading in the trajectory, we can then use MDTraj to directly compute the rdf by supplying just a few parameters (most notably the atom pairs to be considered):

In [None]:
r_min = 0.0 # smallest interatomic distance to consider
r_max = 1.25 # largest interatomic distance to consider
bins = 100 # how many "bins" to use for grouping our rdf data

carbon_pairs = traj.top.select_pairs("name C2", "name C2") # specify the atom pairs

dist_butCC, rdf_butCC = mdtraj.compute_rdf(traj, carbon_pairs, (r_min, r_max), n_bins=bins)

We can then generate a plot of the rdf:

In [None]:
plt.plot(dist_butCC, rdf_butCC)
plt.xlim(0,1.4)
plt.ylim(0,4.5)
plt.xlabel('C2-C2 interatomic distance (nm)')
plt.ylabel('radial dist. func. g(r)')
plt.show()

Some questions to answer for yourself:

1. What are some features of this plot that stand out to you? 
2. Why is the value of the rdf equal to 0 for very small interatomic distances?
3. Based on the explanation provided at the top of this notebook, do you see evidence of butane clumping together (aggregating) in this plot?

## Your turn #1: perform this analysis for another simulation

Carry out the following procedure:

1. Make a copy of this notebook.
2. Replace every instance of `butane` with `ethylenediamine` in the new notebook. (Don't forget the notebook title/filename!)
3. Execute the notebook to completion. How does the plot of the radial distribution function from the ethylenediamine-water simulation compare with the one from the butane-water simulation? Does this agree with what we saw when we visualized the simulation trajectories?

## Your turn #2: perform this analysis for the water molecules

You might also be interested to see how the water molecules are distributed in the simulation box. Code is included for computing the rdf for the oxygen atoms in the water molecules below. Add your own code to generate a plot of the rdf. 

Why do you think the rdf is so strongly peaked at certain distances? (What is true about the arrangement of water molecules when they are in a liquid state?)

In [None]:
oxygen_pairs = traj.top.select_pairs("name O", "name O")

dist_watOO, rdf_watOO = mdtraj.compute_rdf(traj, oxygen_pairs, (r_min, r_max), n_bins=bins)

In [None]:
## write code to plot the water O-O rdf here ##