<a href="https://www.kaggle.com/code/madukacharles/protein-ligand-docking-and-visualization-of-hiv-1?scriptVersionId=286428159" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
!pip install -q condacolab
import condacolab
condacolab.install()

‚è¨ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
üì¶ Installing...
üìå Adjusting configuration...
ü©π Patching environment...
‚è≤ Done in 0:00:11
üîÅ Restarting kernel...


In [1]:
!wget -q https://sourceforge.net/projects/smina/files/smina.static/download -O smina
!chmod +x smina
!mv smina /usr/local/bin/

In [2]:
!pip install biopython
!pip install py3Dmol
!apt-get install -y openbabel
!obabel -V

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
openbabel is already the newest version (3.1.1+dfsg-6ubuntu5).
0 upgraded, 0 newly installed, 0 to remove and 165 not upgraded.
Open Babel 3.1.1 -- Feb  7 2022 -- 06:51:49


In [None]:
!pip install pytdc

In [5]:
from tdc.single_pred import HTS

# Load HIV dataset
dataset = HTS(name="HIV")

# Get the data as a pandas DataFrame
df = dataset.get_data()
print(df.head())
len(df)

Found local copy...
Loading...
Done!


  Drug_ID                                               Drug  Y
0  Drug 0  CCC1=[O+][Cu-3]2([O+]=C(CC)C1)[O+]=C(CC)CC(CC)...  0
1  Drug 1  C(=Cc1ccccc1)C1=[O+][Cu-3]2([O+]=C(C=Cc3ccccc3...  0
2  Drug 2                   CC(=O)N1c2ccccc2Sc2c1ccc1ccccc21  0
3  Drug 3    Nc1ccc(C=Cc2ccc(N)cc2S(=O)(=O)O)c(S(=O)(=O)O)c1  0
4  Drug 4                             O=S(=O)(O)CCS(=O)(=O)O  0


41127

In [6]:
!wget https://files.rcsb.org/download/1hvr.pdb

--2025-12-15 20:16:46--  https://files.rcsb.org/download/1hvr.pdb
Resolving files.rcsb.org (files.rcsb.org)... 18.239.18.50, 18.239.18.71, 18.239.18.95, ...
Connecting to files.rcsb.org (files.rcsb.org)|18.239.18.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/plain]
Saving to: ‚Äò1hvr.pdb‚Äô

1hvr.pdb                [  <=>               ] 185.97K   694KB/s    in 0.3s    

2025-12-15 20:16:46 (694 KB/s) - ‚Äò1hvr.pdb‚Äô saved [190431]



## Detect which ligands are present in the pdb file

In [7]:
from Bio.PDB import PDBParser

pdb_file = "1hvr.pdb"  # The PDB file
parser = PDBParser(QUIET=True)
structure = parser.get_structure("complex", pdb_file)

ligands = set()

for model in structure:
    for chain in model:
        for residue in chain:
            # Hetero atoms (ligands) usually have id[0] != ' '
            if residue.id[0] != ' ':
                ligands.add(residue.get_resname())

print("Ligands in the PDB:", ligands)

Ligands in the PDB: {'XK2', 'CSO'}


## Fetch the binding site center

In [10]:
import numpy as np
pdb_file = "1hvr.pdb"
ligand_resname = "XK2"  #your ligand's residue name

parser = PDBParser()
structure = parser.get_structure("complex", pdb_file)

coords = []
for model in structure:
    for chain in model:
        for residue in chain:
            if residue.get_resname() == ligand_resname:
                for atom in residue:
                    coords.append(atom.get_coord())

coords = np.array(coords)
center = coords.mean(axis=0)
print("Binding site center:", center)

Binding site center: [-9.191565  15.9062605 27.946478 ]


## Viewing the Protein-ligand complex

In [12]:
import py3Dmol

with open("1hvr.pdb") as f:
    pdb = f.read()

center = [-9.191565, 15.9062605, 27.946478]

box_size = {
    "w": 25,  # size_x
    "h": 25,  # size_y
    "d": 25   # size_z
}
view = py3Dmol.view(width=800, height=500)

# Add protein
view.addModel(pdb, "pdb")
view.setStyle({"cartoon": {"color": "cyan"}})
view.setStyle(
    {"resn": "XK2"},
    {"stick": {"radius": 0.18}, "sphere": {"radius": 0.35}}
)

# Add docking box
view.addBox({
    "center": {"x": center[0], "y": center[1], "z": center[2]},
    "dimensions": box_size,
    "wireframe": True,
    "color": "red"
})

view.zoomTo()
view.show()

## Runnning the docking process 

In [51]:
import os
def dock_molecule(mol, program='smina'):
    Chem.MolToMolFile(mol, f'molecule.mol')
    !obabel -imol molecule.mol -omol2 -O molecule.mol2
    os.remove(f'molecule.mol')
    
    # Dock the molecule stored in file molecule.mol2
    !{program} -r /kaggle/working/1hvr.pdb -l molecule.mol2 --center_x -9.191565 --center_y 15.9062605 --center_z 27.946478 --size_x 25 --size_y 25 --size_z 25 --exhaustiveness 8 --out molecule_docked.mol2


In [52]:
from rdkit import Chem
from rdkit.Chem import AllChem
def optimize_conformation(mol):
    mol = Chem.AddHs(mol)  # Adds hydrogens to make optimization more accurate
    AllChem.EmbedMolecule(mol)  # Adds 3D positions
    AllChem.MMFFOptimizeMolecule(mol)  # Improves the 3D positions using a force-field method
    return mol

In [68]:
smiles = df.Drug.iloc[0]
mol = Chem.MolFromSmiles(smiles)
optimize_conformation(mol)
dock_molecule(mol)

1 molecule converted


[21:11:44] UFFTYPER: Unrecognized hybridization for atom: 4
[21:11:44] UFFTYPER: Unrecognized atom type: Cu+1 (4)


   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439    non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923        num_tors_div

Using random seed: -1945641360

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------

## View the docked compound

In [106]:
# Load protein structure
with open("1hvr.pdb") as ifile:
    system = "".join(ifile)
    
# Load docked ligand poses if available
if os.path.exists("molecule_docked.mol2"):
    with open("molecule_docked.mol2") as ifile:
        mol = "".join(ifile)
        mols = ['@<TRIPOS>MOLECULE' + conf
                for conf in mol.split('@<TRIPOS>MOLECULE') if conf]
else:
    mols = None
    
# Create 3D viewer
view = py3Dmol.view(width=800, height=500)
view.addModelsAsFrames(system)
view.setStyle({'model': -1}, {"cartoon": {'color': 'cyan'}})
if mols:
    view.addModel(mols[8], 'mol')
    view.setStyle(
        {'model': -1},
        {"stick": {'color': 'white', 'radius': 0.15},
         "sphere": {'radius': 0.4}}
    )

# Docking box centered on binding site
pocket_center = [-9.191565, 15.9062605, 27.946478]
view.addBox({
    'center': {'x': pocket_center[0], 'y': pocket_center[1], 'z': pocket_center[2]},
    'dimensions': {'w': 25, 'h': 25, 'd': 25},
    'wireframe': True
})
view.zoomTo()
view.show()

In [55]:
import re

affinities = []

for i, smiles in enumerate(df.Drug.iloc[:50]): #first 50 compounds
    try:
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            affinities.append(None)
            continue

        optimize_conformation(mol)
        dock_molecule(mol)

        output = !smina -r 4ivt.pdb -l molecule_docked.mol2 --score_only
        scores = re.findall(r'Affinity:\s*(\-?[\d\.]+)', '\n'.join(output))

        affinities.append(float(scores[0]) if scores else None)

    except Exception:
        affinities.append(None)

1 molecule converted


[20:13:33] UFFTYPER: Unrecognized hybridization for atom: 4
[20:13:33] UFFTYPER: Unrecognized atom type: Cu+1 (4)


   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439    non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923        num_tors_div

Using random seed: 1072990384

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------


[20:13:46] UFFTYPER: Unrecognized hybridization for atom: 10
[20:13:46] UFFTYPER: Unrecognized atom type: Cu+1 (10)


   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439    non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923        num_tors_div

Using random seed: 547704374

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------
1

[20:31:20] UFFTYPER: Unrecognized charge state for atom: 5


   _______  _______ _________ _        _______ 
  (  ____ \(       )\__   __/( (    /|(  ___  )
  | (    \/| () () |   ) (   |  \  ( || (   ) |
  | (_____ | || || |   | |   |   \ | || (___) |
  (_____  )| |(_)| |   | |   | (\ \) ||  ___  |
        ) || |   | |   | |   | | \   || (   ) |
  /\____) || )   ( |___) (___| )  \  || )   ( |
  \_______)|/     \|\_______/|/    )_)|/     \|


smina is based off AutoDock Vina. Please cite appropriately.

Weights      Terms
-0.035579    gauss(o=0,_w=0.5,_c=8)
-0.005156    gauss(o=3,_w=2,_c=8)
0.840245     repulsion(o=0,_c=8)
-0.035069    hydrophobic(g=0.5,_b=1.5,_c=8)
-0.587439    non_dir_h_bond(g=-0.7,_b=0,_c=8)
1.923        num_tors_div

Using random seed: -1799524304

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************

mode |   affinity | dist from best mode
     | (kcal/mol) | rmsd l.b.| rmsd u.b.
-----+------------+----------+----------