# Quickstart

This is a very short guide on how to use ProLIF to generate an interaction fingerprint for a ligand-protein complex.

Let's start by importing MDAnalysis and ProLIF to read our input files:

In [None]:
import MDAnalysis as mda
import prolif as plf
# load trajectory
u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)
# create selections for the ligand and protein
lig = u.atoms.select_atoms("resname LIG")
prot = u.atoms.select_atoms("protein")
lig, prot

MDAnalysis should automatically recognize the file type that you're using from its extension. Click [here](https://userguide.mdanalysis.org/stable/quickstart.html) to learn more about loading files with MDAnalysis, and [here](https://userguide.mdanalysis.org/stable/selections.html) to learn more about their atom selection language.

Next, lets make sure that our ligand was correctly read by MDAnalysis.

This next step is crucial if you're loading a structure from a file that doesn't explicitely contain bond orders and formal charges. MDAnalysis will infer those from the atoms connectivity, which requires all atoms including hydrogens to be present in the input file.

ProLIF molecules are built on top of RDKit and are compatible with its drawing code. Let's have a quick look at our ligand:

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
# create a molecule from the MDAnalysis selection
lmol = plf.Molecule.from_mda(lig)
# cleanup before drawing
mol = Chem.RemoveHs(lmol)
mol.RemoveAllConformers()
Draw.MolToImage(mol, size=(400,200))

We can do the same for the residues in the protein (I'll only show the first 20 to keep the notebook short):

In [None]:
pmol = plf.Molecule.from_mda(prot)
frags = []
# to show all residues, simply use `for res in pmol:`
for i in range(20):
    res = pmol[i]
    mol = Chem.RemoveHs(res)
    mol.RemoveAllConformers()
    frags.append(mol)
Draw.MolsToGridImage(frags, legends=[str(res.resid) for res in pmol], 
                     subImgSize=(200, 140), molsPerRow=4,
                     maxMols=prot.n_residues)

Everything looks good, we can now compute a fingerprint:

In [None]:
# use default interactions
fp = plf.Fingerprint()
# run on a slice of frames from begining to end with a step of 10
fp.run(u.trajectory[::10], lig, prot)

The `run` method will automatically select residues that are close to the ligand (6.0 Å) when computing the fingerprint. Alternatively, you can pass a list of residues like so:

```python
fp.run(..., residues=["TYR38.A", "ASP129.A"])
```
Or simply use `fp.run(..., residues="all")` to use all residues in the `prot` selection.

To keep the output short, the resulting DataFrame only keeps track of residues and interaction types that were seen in at least one of the frames in your trajectory. You can access the full results in `fp.ifp`.

In [None]:
df = fp.to_dataframe()
# show only the 10 first frames
df.head(10)

In [None]:
# drop the ligand residue column since there's only a single ligand residue
df = df.droplevel("ligand", axis=1)
df.head(5)

In [None]:
# show all pi-stacking interactions
df.xs("PiStacking", level="interaction", axis=1).head(5)

In [None]:
# show all interactions with a specific protein residue
df.xs("ASP129.A", level="protein", axis=1).head(5)
# or more simply
df["ASP129.A"].head(5)

Here's a simple example to plot the interactions over time

In [None]:
import seaborn as sns
import pandas as pd

# reorganize data
data = df.reset_index()
data = pd.melt(data, id_vars=["Frame"], var_name=["residue","interaction"])
data = data[data["value"] != 0]
data.reset_index(inplace=True, drop=True)

# plot
sns.set_theme(font_scale=.8, style="white", context="talk")
g = sns.catplot(
    data=data, x="interaction", y="Frame", hue="interaction", col="residue",
    hue_order=["Hydrophobic", "HBDonor", "HBAcceptor", "PiStacking", "CationPi", "Cationic"],
    height=3, aspect=0.2, jitter=0, sharex=False, marker="_", s=8, linewidth=3.5,
)
g.set_titles("{col_name}")
g.set(xticks=[], ylim=(-.5, data.Frame.max()+1))
g.set_xticklabels([])
g.set_xlabels("")
g.fig.subplots_adjust(wspace=0)
g.add_legend()
g.despine(bottom=True)
for ax in g.axes.flat:
    ax.invert_yaxis()
    ax.set_title(ax.get_title(), pad=15, rotation=60, ha="center", va="baseline")

In [None]:
# calculate the occurence of each interaction on the trajectory
occ = df.mean()
# restrict to the frequent ones
occ.loc[occ > 0.3]

In [None]:
# regroup all interactions together and do the same
g = (df.groupby(level=["protein"], axis=1)
       .sum()
       .astype(bool)
       .mean())
g.loc[g > 0.3]

You can also compute a Tanimoto similarity between each frames:

In [None]:
from rdkit import DataStructs
bvs = fp.to_bitvectors()
tanimoto_sims = DataStructs.BulkTanimotoSimilarity(bvs[0], bvs)
tanimoto_sims