# Visualizing protein-ligand binding sites

This tutorial is aimed at using [ProLIF](https://prolif.readthedocs.io/en/latest/index.html) to generate an interaction site image (also called a fingerprint) for protein-ligand complexes. There are several preparation steps which you need to follow before generating an interaction fingerprint with ProLIF. This type of analysis and the resulting visuals are important for communicating results clearly to your audience and provides an automated way to identify potential interactions. This approach does not replace manual analysis of the interactions in Mol*, pyMol, or other software! This automated analysis should be used alongside manual analysis and confirmation of the results.

This colab assumes that you have already docked a ligand to the protein through any method. The outputs used to create this colab come from  [Uni_dock colab](https://colab.research.google.com/drive/1hV3pwwYMr8Peg37JGnyZ_uipy08YbhUC?usp=sharing) which produces ligand binding poses in .sdf format.

####**Please make a copy of this colab for your personal use!!**

A few things to start:

1.   These lessons only work in Google Chrome
2.   If you want to save your progress, go to File> Save a Copy in Drive; then locate a spot in your Drive folder
3.   Clicking the "play" button to the top left of a code block runs the code. Sometimes you can see the code and interact with it. However, if the code is hidden  it is to run adminstrative tasks in the background and you do not need to worry unless you are interested.
4. Sometimes the code may be hidden from view. To view it, click the '>' on the left of the title, until it changes to 'v'. This will reveal the code in that line.

### Starting Files
1. A .pdb file with explict hydrogens. This means all of the hydrogens are there already. If you used the Uni_dock colab to docking the molecules, use the **receptor.pdb** file produced in step 2.
2. A .sdf file of the ligand from the docking software. From the Uni_dock colab, these would be the files produced some step 9. Only one ligand site can be visualized at a time in this workflow.

Overall, this code takes 20 minutes or less to visualize the binding site. Comparison of multiple ligands will increase the run time.

---

This document has a list of [Frequently Asked Questions](https://docs.google.com/document/d/1kzkOi1T6QYjcyIoMXpU114B15WE0VcGnof5xvG5zdvA/edit?usp=sharing), reach out to your instructor if you have additional questions.

---


**Acknowledgments**
- This Colab was written and adapted by Chris Berndsen in 2024 from a [code tutorial](https://prolif.readthedocs.io/en/latest/notebooks/docking.html#docking) from ProLIF. If you are interested in additional ways to use ProLIF or to use it with full simulations, please check out the [ProLIF tutorials](https://prolif.readthedocs.io/en/latest/source/tutorials.html) page.



---
---
# **Getting started**

The first thing we need to do is install all necessary libraries and packages for our simulation, which in simple terms is code that is needed to run other code. To install the packages you need, click the PLAY button on the left. These packages will load some settings that are needed to run the rest of the code.

It is also important to indicate what the code means. Any line that begins with a '#' contains a comment on the code. If the code is shown, it is important to read the '#' to know what the code is and potentially how to run it.

In [None]:
#@title **1A: Install Conda Colab**
#@markdown Press PLAY. It will restart the session, don't worry.
!pip install -q condacolab
import condacolab
condacolab.install()

In [None]:
#@title **1B: Install dependencies**
#@markdown Only press PLAY **after** the Conda Colab has finished downloading. It may take a few minutes, please wait :-)
# install dependencies
!pip install rdkit prolif --quiet
!pip install py3Dmol --quiet
import py3Dmol
import MDAnalysis as mda
import prolif as plf


Once both of the above have run, *then* you can begin analyzing the protein and ligand.

1.   Click on the file folder on the left side
2.   **Right-click**, bring up the actions menu, and create a folder. Name this folder "data".
3. Upload the .pdb file and the .sdf file to the "data" folder.
4.   **Right-click**, bring up the actions menu, and create a folder. Name this folder "outputs". This is where we will put any output files or figures. **Remember to download all of these files when done.**

## 2: Protein preparation

In order to generate a valid molecule (with bond orders and charges) from a PDB file and correctly detect HBond interactions, **ProLIF requires the protein file to contain explicit hydrogens.**



In [None]:
# these are the names of your files, replace the "xxx" with your pdb name
u = mda.Universe("data/xxx.pdb")
protein_mol = plf.Molecule.from_mda(u)

# count the number of amino acids in the protein
# make sure this number matches what you would expect!
protein_mol.n_residues

## 3: Ligand preparation

As for the protein structure, we'll need our prepared ligand file to contain explicit hydrogens. If you .sdf file came from the Uni_dock colab, it should work fine.


Save a screenshot of ligand for your notebook.

In [None]:
# read SDF
# these are the names of your files, replace the "xxx" with your sdf name
ligand_mol = plf.sdf_supplier("data/xxx.sdf")[0]

# display ligand below
# make sure that it looks correct
plf.display_residues(ligand_mol, size=(400, 200))

## 4: Fingerprint generation

We can now generate a fingerprint or which amino acids are interacting with the ligand. By default, ProLIF will calculate the following interactions: Hydrophobic, HBDonor, HBAcceptor, PiStacking, Anionic, Cationic, CationPi, PiCation, VdWContact.

* Anionic and Cationic are ionic interactions.
* HBDonor and HBAcceptor are hydrogen bonding contacts, with the HBDonor group having the hydrogen and the HBAcceptor having the lone pair that interacts with the hydrogen.
* PiStacking and PiCation are interactions of aromatic groups with other aromatic groups or cations, respectively.
* Hydrophobic interactions are London dispersion interactions or interactions between induced dipoles.


After running this step the code will produce a message with a brief description of the site. Record the number of interactions found in the fingerprint.


In [None]:
# use default interactions
fp = plf.Fingerprint(count=True)
# run on your poses
fp.run_from_iterable([ligand_mol], protein_mol)

The code below is optional and does not need to be run unless you want to change the interaction site size or focus in on specific amino acids. This general is not needed.

In [None]:
# optional code for describing specific interactions
# to increase/decrease the distance between ligand and interaction partners in the protein
# default is 6 angstrom (0.6 nm)
# fp.run_from_iterable([ligand_mol], protein_mol, vicinity_cutoff=7.0)


# to focus on specific amino acids, add the descriptor of the amino acid
# fp.run_from_iterable([ligand_mol], protein_mol, residues=["TYR38.A", "ASP129.A"])

The best way to record our results is to export the interaction fingerprint to dataframe and then save it as a .csv, which can be opened in Excel or Sheets. Generally there is no need to change any of the steps below, unless you want to change the filename in the third line of code which has the `to_csv` command.

In [None]:
# make the dataframe
df = fp.to_dataframe()
# make the dataframe vertical
df2 = df.T
#save the data frame in the outputs folder
df2.to_csv("outputs/fingerprint.csv")

## 5: Visualization

One of the more useful features of ProLIF is the ability to generate an interactive  2-D interaction figure. Looking at structures in 3-D is useful but hard to show in a report or manuscript where the images are static.

Once you run the code, the image is interactive. You can move the protein interactions around to be visually appealing and/or hide certain types of interactions by clicking the boxes below the image. These simplifications are important for protein-ligand pairs with extensive interaction sites.

The representation below shows only the first instance of each amino acid-ligand pair. The code block below shows ALL interactions. This is useful for analysis but may be too complex for a report. Use your judgement as to which is better for communicating your site.

Take a screenshot of your site(s) once you have a good figure.

In [None]:
view = fp.plot_lignetwork(ligand_mol, kind="frame", frame=0, display_all = False)
view

In [None]:
# show the more complex network of interactions
# completely optional
view = fp.plot_lignetwork(ligand_mol, kind="frame", frame=0, display_all = True)
view

In addition to the 2-D plot, it is also possible to generate a 3-D version, although this may be better done in other programs.

You can pose your protein-ligand complex to show the interactions. If you let your cursor hover over an amino acid it will describe which amino acid is shown.

Take a screenshot of your site(s) once you have a good figure.