# Water-bridge Interactions

This tutorial showcases how to use ProLIF to generate an interaction fingerprint including water-mediated hydrogen-bond interactions, and analyze the interactions for a ligand-protein complex.

## Preparation

Let's start by importing MDAnalysis and ProLIF to read our tutorial files and create selections for the ligand, protein and water.

:::{important}
It is advised to select for the protein and water selection only the residues in close distance to the ligand,
else the generation of the fingerprint will be time consuming due to the amount of analyzed atoms.
:::

For the selection of the protein in this tutorial we only select the resiudes around 15.0 Å of the ligand.

:::{tip}
For the water component it is possible to update the `AtomGroup` distance-based selection at every frame,
which is convenient considering the large movements of water molecules during most simulations.
**Do not use `updating=True` for the protein selection**, it will produce wrong results.
:::

For the water selection we select the water residues around 4 Å of the ligand or protein. You may need to increase this
distance threshold to investigate very high-order water-mediated interactions.

In [None]:
import MDAnalysis as mda
import prolif as plf

# load topology and trajectory
u = mda.Universe(plf.datafiles.WATER_TOP, plf.datafiles.WATER_TRAJ)

# create selections for the ligand and protein
ligand_selection = u.select_atoms("resname QNB")
protein_selection = u.select_atoms(
    "protein and byres around 15 group ligand",
    ligand=ligand_selection,
)
water_selection = u.select_atoms(
    "resname TIP3 and byres around 4 (group ligand or group pocket)",
    ligand=ligand_selection,
    pocket=protein_selection,
    updating=True,
)
ligand_selection, protein_selection, water_selection

Now we perform the calculation of the interactions.
In this tutorial we only focus on the water bridge interactions, but you can also include the other typical ligand-protein interactions in the list of interactions.

:::{important}
When using `WaterBridge`, you must specify the `water` parameter with your atomgroup selection.
:::

By default, the `WaterBridge` interaction will only look at bridges including a single water molecule, i.e. `ligand---water---protein`.
Here, we will look at the water bridges up to order 3 (i.e. there can be up to 3 water molecules),
for this we explicitely provide `order=3` in the parameters of the `WaterBridge` interaction (defaults to `1`).

In [None]:
fp = plf.Fingerprint(
    ["WaterBridge"], parameters={"WaterBridge": {"water": water_selection, "order": 3}}
)

Then we simply run the analysis. The way the `WaterBridge` analysis is performed under the hood,
three successive runs are executed:
- between the ligand and water
- between the protein and water
- between the water molecules

The results are then collated together using `networkx`.

:::{note}
For practical reasons, the `WaterBridge` analysis only runs in serial.
:::

In [None]:
fp.run(u.trajectory, ligand_selection, protein_selection)

## Visualization

The trajectory that we use consists of 20 frames. Let's analyze the water bridge interactions that are present in the trajectory. For this we generate a DataFrame, which shows which interactions occur during each frame of the simulation.

In [None]:
df = fp.to_dataframe()
df

We now sort the values to identify which of the water bridge interactions appears more frequently than the others.

In [None]:
# percentage of the trajectory where each interaction is present
df.mean().sort_values(ascending=False).to_frame(name="%").T * 100

We can also analyze the water bridge interactions using a barcode plot.

In [None]:
fp.plot_barcode(figsize=(8, 3))

We can also visualize water-mediated interactions using the `LigNetwork` plot.
Here we can also see the water bridges with higher orders, where the water molecules interact 
with each other thus building water bridges.

:::{tip}
The threshold for interactions to be displayed in `fp.plot_lignetwork()` is `0.3`. Thus only interactions with an occurence of more than 30 % will appear with the default settings, so don't forget to adjust the threshold if you want to see interactions with lower occurence.
:::

In [None]:
ligand_mol = plf.Molecule.from_mda(ligand_selection)
view = fp.plot_lignetwork(ligand_mol, threshold=0.05)
view

We can also visualize water-mediated interactions in 3D with the `plot_3d` function. Here is an example of the water-mediated interaction between the protein and ligand present in frame `0`.

:::{note} For the `plot_3d` we need to have the `protein`, `ligand` and `water` all as `plf.Molecule`, thus a conversion from `mda.AtomGroup`may be required, which we used for `fp.run()`. These serve than as inputs in `plot_3d()`.:::

In [None]:
protein_mol = plf.Molecule.from_mda(protein_selection)
water_mol = plf.Molecule.from_mda(water_selection)
view = fp.plot_3d(ligand_mol, protein_mol, water_mol, frame=0, display_all=False)
view

## Water Bridge Interaction Metadata

The current example showed if specific water bridge interactions are present or not during the simulation.
During the analysis, some metadata about the interaction is stored:
- the indices of atoms involved in each component (ligan, protein or water),
- the "order" of the water-mediated interaction, i.e. how many water molecules are involved in the bridge,
- the residue identifier of the water molecules,
- the role of the ligand and protein (H-bond acceptor or donor),
- distances for each HBond interaction forming the bridge (and their sum).

In [None]:
frame = 0
all_interaction_data = fp.ifp[frame].interactions()
next(all_interaction_data)

Next we show how to access and process the metadata stored after the calculation of the interactions, using a pandas Dataframe for convenience.

In [None]:
import pandas as pd

all_metadata = []

for frame, ifp in fp.ifp.items():
    for interaction_data in ifp.interactions():
        if interaction_data.interaction == "WaterBridge":
            flat = {
                "frame": frame,
                "ligand_residue": interaction_data.ligand,
                "protein_residue": interaction_data.protein,
                "water_residues": " ".join(
                    map(str, interaction_data.metadata["water_residues"])
                ),
                "order": interaction_data.metadata["order"],
                "ligand_role": interaction_data.metadata["ligand_role"],
                "protein_role": interaction_data.metadata["protein_role"],
                "total_distance": interaction_data.metadata["distance"],
            }
            all_metadata.append(flat)

df_metadata = pd.DataFrame(all_metadata)
df_metadata

We can now use this information to access interactions of orders 2 and 3 only to perform further analysis.

In [None]:
# count the occurence of each water residue in bridged interactions of higher order
(
    df_metadata[df_metadata["order"].isin([2, 3])]["water_residues"]
    .str.split(" ")
    .explode()
    .value_counts()
)

## Ligand set Analysis

It is possible to analyze docking poses and lists of compounds with `ProLIF` through the `run_from_iterable` function.
In this section, we will showcase how `run_from_iterable` can be used for the analysis of water-mediated interactions.

For this we will create a set of ligand poses via the iteration through the trajectory, while writing out the ligand during each frame, thus obtaining `20` ligand poses.

In [None]:
import MDAnalysis as mda
import prolif as plf
from prolif import Fingerprint

# Create list of ligand poses using from_mda at each frame
ligand_poses = []
for ts in u.trajectory:
    ligand_mol = plf.Molecule.from_mda(ligand_selection)
    ligand_poses.append(ligand_mol)

Now that we have created ligand poses we can use the `run_from_iterable` function to analyze the interactions of each ligand pose.

:::{note} For the `Fingerprint` that we will use in the `run_from_iterable` function we need to have the water selection input as `plf.Molecule`, while for the `run`function we used `mda.AtomGroup` as input for the water selection. :::

In [None]:
# Setup Fingerprint with desired interactions
fp = Fingerprint(["WaterBridge"], parameters={"WaterBridge": {"water": water_mol, "order": 3}})

# Run fingerprint on ligand poses and protein molecule
fps = fp.run_from_iterable(ligand_poses, protein_mol)

print(fps)


Now, let's examine the results of the analysis.

In [None]:
df = fp.to_dataframe(index_col="Pose")
# show only the 5 first poses
df.head(20)

Finally, we can compare the docking poses and the interactions they do. For this, we look at the poses during frames `0` and `2` to see the difference in their interactions.

In [None]:
from prolif.plotting.complex3d import Complex3D

pose_index = 0
comp3D = Complex3D.from_fingerprint(
    fp, ligand_poses[pose_index], protein_mol, water_mol, frame=pose_index
)

pose_index = 2
other_comp3D = Complex3D.from_fingerprint(
    fp, ligand_poses[pose_index], protein_mol, water_mol, frame=19
)

view = comp3D.compare(other_comp3D)
view