# **Prepare the protein and ligands topology**

Preparing the ligand and protein topology is a crucial step in molecular simulations, such as molecular dynamics (MD) simulations.

Protein topology preparation involves defining the molecular structure of the protein, including its atomic coordinates, bond connectivity, angles, dihedrals, and other parameters required for simulations. Ligand topology preparation involves similar steps for small molecules (ligands) that might bind to the protein.

In [1]:
%%capture
!apt install gromacs

%%capture
!pip install py3dmol

In [2]:
# See the installed GROMACS version
!gmx pdb2gmx --version

             :-) GROMACS - gmx pdb2gmx, 2021.4-Ubuntu-2021.4-2 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            

## **Prepare the Protein Topology**

1.   Prepare the protein topology using the CHARMM36 force field (download the latest available force field and the conversion script "cgenff_charmm2gmx.py" from the MacKerell lab website).
2.   Decompress the force field tarball
3.   Use the "charmm36-jul2022.ff" directory to write the topology (select the CHARMM36 force field, the CHARMM-modified TIP3P water model, and the terminal types "NH3+" and "COO-").

In [3]:
# Download the latest CHARMM36 force field and protein
# CORREGIR NO SE DEBE DESCARGAR ASI
# !wget https://github.com/cpariona/biomedical-thesis/blob/main/data/MD_preparation/charmm36-jul2022.ff.tgz # get force field
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/1xq8.pdb # get protein

--2024-08-05 09:09:28--  https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/1xq8.pdb
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 163378 (160K) [text/plain]
Saving to: ‘1xq8.pdb’


2024-08-05 09:09:29 (6.18 MB/s) - ‘1xq8.pdb’ saved [163378/163378]



In [4]:
# Unarchive the force field tarball
!tar -zxvf charmm36-jul2022.ff.tgz

charmm36-jul2022.ff/
charmm36-jul2022.ff/solvent.n.tdb
charmm36-jul2022.ff/ethers.hdb
charmm36-jul2022.ff/tip5p.itp
charmm36-jul2022.ff/cgenff.rtp
charmm36-jul2022.ff/spc.itp
charmm36-jul2022.ff/lipid.n.tdb
charmm36-jul2022.ff/solvent.rtp
charmm36-jul2022.ff/na.c.tdb
charmm36-jul2022.ff/ethers.rtp
charmm36-jul2022.ff/metals.hdb
charmm36-jul2022.ff/solvent.c.tdb
charmm36-jul2022.ff/aminoacids.r2b
charmm36-jul2022.ff/lipid.rtp
charmm36-jul2022.ff/cgenff.n.tdb
charmm36-jul2022.ff/tip3p_original.itp
charmm36-jul2022.ff/silicates.c.tdb
charmm36-jul2022.ff/ffnonbonded.itp
charmm36-jul2022.ff/na.hdb
charmm36-jul2022.ff/lipid.c.tdb
charmm36-jul2022.ff/cmap.itp
charmm36-jul2022.ff/ethers.n.tdb
charmm36-jul2022.ff/spce.itp
charmm36-jul2022.ff/tip3p.itp
charmm36-jul2022.ff/aminoacids.arn
charmm36-jul2022.ff/cgenff.c.tdb
charmm36-jul2022.ff/silicates.r2b
charmm36-jul2022.ff/atomtypes.atp
charmm36-jul2022.ff/carb.r2b
charmm36-jul2022.ff/watermodels.dat
charmm36-jul2022.ff/aminoacids.rtp
charmm36-ju

In [5]:
# Write the topology with pdb2gmx
!gmx pdb2gmx -f 1xq8.pdb -o 1xq8_processed.gro -ignh -ter
# Select the CHARMM36 force field
# Select the CHARMM-modified TIP3P water model
# Select the terminal type "NH3+" and "COO-"

             :-) GROMACS - gmx pdb2gmx, 2021.4-Ubuntu-2021.4-2 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            

## **Prepare the Ligand Topology**

1.   Create a .mol2 file with the ligand and add the missing hydrogens. Open ligand.pdb in Avogadro, and from the "Build" menu, choose "Add Hydrogens." Avogadro will build all of the hydrogen atoms onto the ligand. Save a .mol2 file (File -> Save As... and choose Sybyl Mol2 from the drop-down menu) named "ligand.mol2."
2.   Open the .mol2 file to make some changes:
  *   Replace "*****" with "LigandName,"
  *   Fix the residue names and numbers so that they are all the same
  *   Download sort_mol2_bonds.pl [http://www.mdtutorials.com/gmx/complex/Files/sort_mol2_bonds.pl] and run it with perl using the command: `perl sort_mol2_bonds.pl ligand.mol2 ligand_fix.mol2`
3.   Generate the topology: Visit the CGenFF server, log into your account, and click "Upload molecule" at the top of the page. Upload the .mol2 file, and the CGenFF server will quickly return a topology in the form of a CHARMM "stream" file (extension .str).
4.   Download the script cgenff_charmm2gmx.py from this GitHub site [https://github.com/Lemkul-Lab/cgenff_charmm2gmx] and run `python cgenff_charmm2gmx.py JZ4 jz4_fix.mol2 jz4.str charmm36-jul2022.ff`.

In [6]:
# Download sort_mol2_bonds.pl
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/sort_mol2_bonds.pl

--2024-08-05 09:10:47--  https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/sort_mol2_bonds.pl
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3490 (3.4K) [text/plain]
Saving to: ‘sort_mol2_bonds.pl’


2024-08-05 09:10:47 (50.1 MB/s) - ‘sort_mol2_bonds.pl’ saved [3490/3490]



In [7]:
# Donwload ligands
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/1_lig_opt_DrugBank_stilbene_8.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/2_lig_opt_PubChem_flavanol_399.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/3_lig_opt_PubChem_flavanol_451.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/4_lig_opt_UNPD_indolinone_27.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/5_lig_opt_UNPD_indolinone_41.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/6_lig_opt_UNPD_indolinone_42.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_control/1_lig_opt_control_1.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_control/2_lig_opt_control_2.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_control/3_lig_opt_control_3.mol2
!wget https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_control/4_lig_opt_control_4.mol2

--2024-08-05 09:10:52--  https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/1_lig_opt_DrugBank_stilbene_8.mol2
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5040 (4.9K) [text/plain]
Saving to: ‘1_lig_opt_DrugBank_stilbene_8.mol2’


2024-08-05 09:10:52 (54.3 MB/s) - ‘1_lig_opt_DrugBank_stilbene_8.mol2’ saved [5040/5040]

--2024-08-05 09:10:52--  https://raw.githubusercontent.com/cpariona/biomedical-thesis/main/data/MD_preparation/ligands_top/2_lig_opt_PubChem_flavanol_399.mol2
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request 

In [30]:
# Run sort_mol2_bonds.pl
!perl sort_mol2_bonds.pl 1_lig_opt_DrugBank_stilbene_8.mol2 1_lig_opt_DrugBank_stilbene_8_fix.mol2
!perl sort_mol2_bonds.pl 2_lig_opt_PubChem_flavanol_399.mol2 2_lig_opt_PubChem_flavanol_399_fix.mol2
!perl sort_mol2_bonds.pl 3_lig_opt_PubChem_flavanol_451.mol2 3_lig_opt_PubChem_flavanol_451_fix.mol2
!perl sort_mol2_bonds.pl 4_lig_opt_UNPD_indolinone_27.mol2 4_lig_opt_UNPD_indolinone_27_fix.mol2
!perl sort_mol2_bonds.pl 5_lig_opt_UNPD_indolinone_41.mol2 5_lig_opt_UNPD_indolinone_41_fix.mol2
!perl sort_mol2_bonds.pl 6_lig_opt_UNPD_indolinone_42.mol2 6_lig_opt_UNPD_indolinone_42_fix.mol2
!perl sort_mol2_bonds.pl 1_lig_opt_control_1.mol2 1_lig_opt_control_1_fix.mol2
!perl sort_mol2_bonds.pl 2_lig_opt_control_2.mol2 2_lig_opt_control_2_fix.mol2
!perl sort_mol2_bonds.pl 3_lig_opt_control_3.mol2 3_lig_opt_control_3_fix.mol2
!perl sort_mol2_bonds.pl 4_lig_opt_control_4.mol2 4_lig_opt_control_4_fix.mol2

Found 48 atoms in the molecule, with 52 bonds.
Found 34 atoms in the molecule, with 37 bonds.
Found 43 atoms in the molecule, with 45 bonds.
Found 39 atoms in the molecule, with 43 bonds.
Found 44 atoms in the molecule, with 49 bonds.
Found 29 atoms in the molecule, with 34 bonds.
Found 17 atoms in the molecule, with 17 bonds.
Found 33 atoms in the molecule, with 33 bonds.
Found 36 atoms in the molecule, with 36 bonds.
Found 39 atoms in the molecule, with 39 bonds.


In [31]:
# Download the script cgenff_charmm2gmx.py
!wget https://raw.githubusercontent.com/Lemkul-Lab/cgenff_charmm2gmx/main/cgenff_charmm2gmx_py3_nx2.py

--2024-08-05 08:34:02--  https://raw.githubusercontent.com/Lemkul-Lab/cgenff_charmm2gmx/main/cgenff_charmm2gmx_py3_nx2.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37837 (37K) [text/plain]
Saving to: ‘cgenff_charmm2gmx_py3_nx2.py’


2024-08-05 08:34:02 (3.46 MB/s) - ‘cgenff_charmm2gmx_py3_nx2.py’ saved [37837/37837]



In [None]:
# Download STR files


In [38]:
# Run cgenff_charmm2gmx_py3_nx2.py
!python cgenff_charmm2gmx_py3_nx2.py UNL1 1_lig_opt_DrugBank_stilbene_8_fix.mol2 1_lig_opt_DrugBank_stilbene_8_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 2_lig_opt_PubChem_flavanol_399_fix.mol2 2_lig_opt_PubChem_flavanol_399_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 3_lig_opt_PubChem_flavanol_451_fix.mol2 3_lig_opt_PubChem_flavanol_451_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 4_lig_opt_UNPD_indolinone_27_fix.mol2 4_lig_opt_UNPD_indolinone_27_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 5_lig_opt_UNPD_indolinone_41_fix.mol2 5_lig_opt_UNPD_indolinone_41_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 6_lig_opt_UNPD_indolinone_42_fix.mol2 6_lig_opt_UNPD_indolinone_42_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 1_lig_opt_control_1_fix.mol2 1_lig_opt_control_1_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 2_lig_opt_control_2_fix.mol2 2_lig_opt_control_2_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 3_lig_opt_control_3_fix.mol2 3_lig_opt_control_3_fix.str charmm36-jul2022.ff
!python cgenff_charmm2gmx_py3_nx2.py UNL1 4_lig_opt_control_4_fix.mol2 4_lig_opt_control_4_fix.str charmm36-jul2022.ff

This script has been tested with NetworkX 2.3, and 2.4 is buggy.
Please install version 2.3 for best performance:
pip uninstall networkx
pip install networkx==2.3
NOTE 1: Code tested with Python 3.5.2 and 3.7.3. Your version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]

NOTE 2: Code tested with NetworkX 2.3. Your version: 3.3

NOTE 3: Please be sure to use the same version of CGenFF in your simulations that was used during parameter generation:
--Version of CGenFF detected in  1_lig_opt_DrugBank_stilbene_8_fix.str : 4.6
--Version of CGenFF detected in  charmm36-jul2022.ff/forcefield.doc : 4.6

NOTE 4: To avoid duplicated parameters, do NOT select the 'Include parameters that are already in CGenFF' option when uploading a molecule into CGenFF.
Error in atomgroup.py: read_mol2_coor_only: no. of atoms in mol2 (48) and top (0) are unequal
Usually this means the specified residue name does not match between str and mol2 files
This script has been tested with NetworkX 2.3, and 2.4 is 