In [None]:
%load_ext autoreload
%autoreload 2

# Water-bridge Interactions

This tutorial showcases how to use ProLIF to generate an interaction fingerprint including water-mediated hydrogen-bond interactions, and analyze the interactions for a ligand-protein complex.

## Preparation

Let's start by importing MDAnalysis and ProLIF to read our tutorial files and create selections for the ligand, protein and water.

:::{important}
It is advised to select for the protein and water selection only the residues in close distance to the ligand,
else the generation of the fingerprint will be time consuming due to the amount of analyzed atoms.
:::

For the selection of the protein in this tutorial we only select the resiudes around 15.0 Å of the ligand.

:::{tip}
For the water component it is possible to update the `AtomGroup` distance-based selection at every frame,
which is convenient considering the large movements of water molecules during most simulations.
**Do not use `updating=True` for the protein selection**, it will produce wrong results.
:::

For the water selection we select the water residues around 4 Å of the ligand or protein. You may need to increase this
distance threshold to investigate very high-order water-mediated interactions.

In [None]:
import MDAnalysis as mda
import prolif as plf

# load topology and trajectory
u = mda.Universe(plf.datafiles.WATER_TOP, plf.datafiles.WATER_TRAJ)

# create selections for the ligand and protein
ligand_selection = u.select_atoms("resname QNB")
protein_selection = u.select_atoms(
    "protein and byres around 15 group ligand",
    ligand=ligand_selection,
)
water_selection = u.select_atoms(
    "resname TIP3 and byres around 4 (group ligand or group pocket)",
    ligand=ligand_selection,
    pocket=protein_selection,
    updating=True,
)
ligand_selection, protein_selection, water_selection

Now we perform the calculation of the interactions.
In this tutorial we only focus on the water bridge interactions, but you can also include the other typical ligand-protein interactions in the list of interactions.

:::{important}
When using `WaterBridge`, you must specify the `water` parameter with your atomgroup selection.
:::

By default, the `WaterBridge` interaction will only look at bridges including a single water molecule, i.e. `ligand---water---protein`.
Here, we will look at the water bridges up to order 3 (i.e. there can be up to 3 water molecules),
for this we explicitely provide `order=3` in the parameters of the `WaterBridge` interaction (defaults to `1`).

In [None]:
fp = plf.Fingerprint(["WaterBridge"], parameters={"WaterBridge": {"water": water_selection, "order": 3}})

Then we simply run the analysis. The way the `WaterBridge` analysis is performed under the hood,
three successive runs are executed:
- between the ligand and water
- between the protein and water
- between the water molecules

The results are then collated together using `networkx`.

:::{note}
For practical reasons, the `WaterBridge` analysis only runs in serial.
:::

In [None]:
fp.run(u.trajectory, ligand_selection, protein_selection)

## Visualization

The trajectory that we use consists of 20 frames. Let's analyze the water bridge interactions that are present in the trajectory. For this we generate a DataFrame, which shows which interactions occur during each frame of the simulation.

In [None]:
df = fp.to_dataframe()
df

We now sort the values to identify which of the water bridge interactions appears more frequently than the others.

In [None]:
# percentage of the trajectory where each interaction is present
df.mean().sort_values(ascending=False).to_frame(name="%").T * 100

We can also analyze the water bridge interactions using a barcode plot.

In [None]:
fp.plot_barcode(figsize=(8, 3))

We can also visualize water-mediated interactions using the `LigNetwork` plot.
Here we can also see the water bridges with higher orders, where the water molecules interact 
with each other thus building water bridges.

:::{tip}
The threshold for interactions to be displayed in `fp.plot_lignetwork()` is `0.3`. Thus only interactions with an occurence of more than 30 % will appear with the default settings, so don't forget to adjust the threshold if you want to see interactions with lower occurence.
:::

In [None]:
ligand_mol = plf.Molecule.from_mda(ligand_selection)
view = fp.plot_lignetwork(ligand_mol, threshold=0.05)
view

## Water Bridge Interaction Metadata

The current example showed if certain water bridge interactions are present or not during the simulation.
During the interaction calculation metadata is stored, which we can use to obtain more information about the interaction.

Here we show how to access the metadata stored after the calculcation of the interactions.
We will access all of the metadata stored and display it as a Dataframe.

In [None]:
import pandas as pd

all_metadata = []

for frame, dict in fp.ifp.items():
    for (lig_resid, prot_resid), interaction_dict in dict.items():
        for metadata in interaction_dict["WaterBridge"]:
            flat = {
                "frame": frame,
                "ligand_residue": lig_resid,
                "protein_residue": prot_resid,
                "order": metadata.get("order"),
                "distance": metadata.get("distance"),
                "ligand_role": metadata.get("ligand_role"),
                "protein_role": metadata.get("protein_role"),
                "water_residues": metadata.get("water_residues"),
            }
            all_metadata.append(flat)

df_metadata = pd.DataFrame(all_metadata)

:::{important} We can see that the metadata contains important information about the interactions calculated, containing the `order` of the water bridge interactions, which `water_residues` participate in the water bridge interactions as well as the `ligand_role` and the `protein_role` during the interactions. :::

In [None]:
df_metadata

We can now use this information to access interactions of orders 2 and 3.

In [None]:
higher_order_df = df_metadata[df_metadata["order"].isin([2, 3])]
higher_order_df