# LSD1 Binding to ORY-1001 and GSK2879552 KDM1A Inhibitors
---

### Step 1: Download Ligand Structures from PubChem
In this step, we’ll download the structures for ORY-1001 and GSK2879552 from PubChem in SDF format. Each structure will be saved locally in a designated folder called `ligands`.

In [1]:
import requests
import os

# Define CIDs for ORY-1001 and GSK2879552
ligands = {
    "ORY1001": 71543365,
    "GSK2879552": 66571643
}

# Create a folder to store ligand files
os.makedirs("ligands", exist_ok=True)

# Download and save SDF files for each ligand
for ligand_name, cid in ligands.items():
    sdf_url = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{
        cid}/SDF"
    response = requests.get(sdf_url)

    # Check if the response is successful
    if response.status_code == 200:
        with open(f"ligands/{ligand_name.lower()}.sdf", "wb") as file:
            file.write(response.content)
        print(
            f"Ligand {ligand_name} saved as ligands/{ligand_name.lower()}.sdf")
    else:
        print(
            f"Failed to download {ligand_name}. Status code:", response.status_code)

Ligand ORY1001 saved as ligands/ory1001.sdf
Ligand GSK2879552 saved as ligands/gsk2879552.sdf


### Step 2: Optimize Ligand Structures
Using RDKit, we’ll load the SDF files, add hydrogens to each molecule, and optimize their 3D structures. The optimized structures will be saved in PDB format.

In [2]:
from rdkit import Chem
from rdkit.Chem import AllChem
import os

# Define ligands and file paths
ligands = {
    "ORY1001": "ligands/ory1001.sdf",
    "GSK2879552": "ligands/gsk2879552.sdf"
}

# Process each ligand
for ligand_name, sdf_path in ligands.items():
    # Load the compound from the SDF file
    mol = Chem.MolFromMolFile(sdf_path)

    if mol is not None:
        # Add hydrogens to the molecule
        mol = Chem.AddHs(mol)

        # Embed and optimize the 3D structure
        # randomSeed makes results reproducible
        AllChem.EmbedMolecule(mol, randomSeed=0xf00d)
        AllChem.UFFOptimizeMolecule(mol)

        # Save optimized structure as PDB
        pdb_path = f"ligands/{ligand_name.lower()}_optimized.pdb"
        Chem.MolToPDBFile(mol, pdb_path)
        print(f"Ligand {ligand_name} optimized and saved as {pdb_path}")
    else:
        print(f"Failed to load {ligand_name} from {sdf_path}")

Ligand ORY1001 optimized and saved as ligands/ory1001_optimized.pdb
Ligand GSK2879552 optimized and saved as ligands/gsk2879552_optimized.pdb


### Step 3: Download Protein Structure from RCSB PDB
For docking, we need the target protein structure. Here, we’ll download the protein structure for LSD1 (PDB ID: 2UXX) and save it to a directory called `protein_structures`.

In [3]:
import os
import requests

pdb_id = "2UXX"
protein_directory = "protein_structures"
os.makedirs(protein_directory, exist_ok=True)

# Download protein structure
pdb_request = requests.get(f"https://files.rcsb.org/download/{pdb_id}.pdb")
with open(f"{protein_directory}/{pdb_id}.pdb", "w") as f:
    f.write(pdb_request.text)
print(f"Downloaded {pdb_id} and saved to {protein_directory}/{pdb_id}.pdb")

Downloaded 2UXX and saved to protein_structures/2UXX.pdb


### Step 4: Clean and Prepare Protein Structure
In this step, we’ll clean the downloaded protein structure by removing any extra molecules (such as water and ligands) and adding missing atoms and hydrogens. This prepares the protein for docking by ensuring it is in the correct state. We’ll use `pdbfixer` for this step

1. **Load the Protein:** We start by loading the PDB file for our protein using `pdbfixer`, a tool designed to fix common issues in protein structures.
2. **Remove Heterogens:** This command removes non-protein molecules from the structure. Here, we remove all water molecules and other non-standard residues.
3. **Add Missing Atoms and Hydrogens:** The protein may have missing atoms, especially hydrogens and side chains. We add these, setting the protonation state for a neutral pH of 7.4.
4. **Save the Cleaned Protein:** The cleaned protein structure is saved to a new file for the next steps.

This process ensures that the protein is in the optimal condition for docking calculations in Step 6.

In [4]:
import os
from tqdm import tqdm
import pdbfixer
from openmm.app import PDBFile

# Define paths
protein_path = "protein_structures/2UXX.pdb"
cleaned_protein_path = "protein_structures/2UXX_cleaned.pdb"

# Check if cleaned protein file already exists
if os.path.exists(cleaned_protein_path):
    print(f"{cleaned_protein_path} already exists. Skipping cleaning step.")
else:
    # Initialize tqdm for progress tracking
    with tqdm(total=5, desc="Cleaning Protein Structure") as progress:
        
        # Load the protein structure file into pdbfixer
        fixer = pdbfixer.PDBFixer(filename=protein_path)
        progress.update(1)  # Step 1: Loaded structure file
        
        # Remove water molecules and any non-standard residues
        fixer.removeHeterogens(keepWater=False)
        progress.update(1)  # Step 2: Removed heterogens
        
        # Add missing atoms and residues
        fixer.findMissingResidues()
        fixer.findMissingAtoms()
        fixer.addMissingAtoms()
        progress.update(1)  # Step 3: Added missing residues and atoms
        
        # Add missing hydrogens at pH 7.4
        fixer.addMissingHydrogens(pH=7.4)
        progress.update(1)  # Step 4: Added missing hydrogens

        # Save the cleaned structure
        with open(cleaned_protein_path, 'w') as outfile:
            PDBFile.writeFile(fixer.topology, fixer.positions, outfile)
        progress.update(1)  # Step 5: Saved cleaned structure
        
    print(f"Cleaned protein saved as {cleaned_protein_path}")


protein_structures/2UXX_cleaned.pdb already exists. Skipping cleaning step.


In [5]:
import nglview as nv
import py3Dmol

# Load a sample molecule, for example, a pre-installed protein or local pdb file
view = nv.show_structure_file("protein_structures/2UXX_cleaned.pdb")

# Display the molecule
view




NGLWidget()

### Step 5: Convert Protein to PDBQT Format
AutoDock Vina requires both the protein and ligands to be in `PDBQT` format. We’ll use `Open Babel` to convert the cleaned protein structure to PDBQT.

In [6]:
import os
import subprocess

# Paths for the input and output files
protein_pdb_path = "protein_structures/2UXX_cleaned.pdb"
protein_pdbqt_path = "protein_structures/2UXX_cleaned.pdbqt"

# Function to convert the cleaned protein PDB to PDBQT format


def convert_protein_to_pdbqt(input_path, output_path):
    try:
        # Use Open Babel to convert PDB to PDBQT, adding partial charges
        subprocess.run(["obabel", input_path, "-O", output_path,
                       "--partialcharge"], check=True)
        print(f"Converted protein {input_path} to {output_path}")
    except subprocess.CalledProcessError:
        print(f"Failed to convert {input_path} to {output_path}")


# Convert protein if the PDBQT file doesn’t already exist
if not os.path.exists(protein_pdbqt_path):
    print("Starting protein conversion...")
    convert_protein_to_pdbqt(protein_pdb_path, protein_pdbqt_path)
else:
    print(f"Protein PDBQT file already exists at {protein_pdbqt_path}")

Starting protein conversion...
Converted protein protein_structures/2UXX_cleaned.pdb to protein_structures/2UXX_cleaned.pdbqt


1 molecule converted


### Step 6: Convert Ligands to PDBQT Format with Meeko
To prepare ligands for AutoDock Vina, we’ll use `Meeko` to convert the optimized ligands to `PDBQT` format.

ModuleNotFoundError: No module named 'rdkit.six'

### Step 7: Verify Prepared Files
Finally, let’s check if all files are correctly prepared for docking. This should include:
- `protein_structures/2UXX_cleaned.pdbqt`
- `ligands/ORY1001.pdbqt`
- `ligands/GSK2879552.pdbqt`

In [None]:
# Import necessary libraries
import subprocess

# Function to prepare a ligand for docking using Open Babel


def prepare_ligand_for_docking(input_pdb_file: str, output_pdbqt_file: str) -> None:
    """
    Prepares a ligand for docking using Open Babel.

    Args:
        input_pdb_file (str): Path to the input PDB file of the ligand.
        output_pdbqt_file (str): Path to the output PDBQT file for docking.
    """
    #Use Open Babel to convert the input PDB file to PDBQT format
    # Open Babel command: obabel input.pdb -O output.pdbqt --gen3d
    try:
        subprocess.run([
            "obabel", input_pdb_file, "-O", output_pdbqt_file, "--gen3d"
        ], check=True)
        print(f"Ligand preparation complete. Output saved to {
              output_pdbqt_file}")
    except subprocess.CalledProcessError as e:
        print(f"Error during ligand preparation: {e}")


# Example usage of the function (in a Jupyter Notebook cell)
# Input PDB files for the ligands
input_ligand_files = [
    "ligands/ory1001_optimized.pdb",
    "ligands/gsk2879552_optimized.pdb"
]

# Loop through each ligand file and prepare it for docking
for input_file in input_ligand_files:
    # Derive the output filename by replacing the extension
    output_file = input_file.replace(".pdb", ".pdbqt")

    # Prepare the ligand using Open Babel
    prepare_ligand_for_docking(input_file, output_file)

1 molecule converted


Ligand preparation complete. Output saved to ligands/ory1001_optimized.pdbqt
Ligand preparation complete. Output saved to ligands/gsk2879552_optimized.pdbqt


1 molecule converted


In [12]:
# Run AutoDock Vina directly in Jupyter cell
!vina --config config.txt --out output_docked.pdbqt


#################################################################
# If you used AutoDock Vina in your work, please cite:          #
#                                                               #
# O. Trott, A. J. Olson,                                        #
# AutoDock Vina: improving the speed and accuracy of docking    #
# with a new scoring function, efficient optimization and       #
# multithreading, Journal of Computational Chemistry 31 (2010)  #
# 455-461                                                       #
#                                                               #
# DOI 10.1002/jcc.21334                                         #
#                                                               #
# Please see http://vina.scripps.edu for more information.      #
#################################################################

Detected 12 CPUs
Reading input ... 

Parse error on line 4081 in file "protein_structures/2UXX_cleaned.pdbqt": Unknown or inappropriate tag


  pid, fd = os.forkpty()


In [None]:
# Run this in a terminal or Jupyter notebook cell with `!` prefix
!pdb2pqr --pdb-output=protein_structures/protein_h.pdb --pH=7.4 protein_structures/2UXX.pdb protein_structures/2UXX.pqr --whitespace


In [None]:
import nglview as nv
from rdkit import Chem
from rdkit.Chem import AllChem

# Load the SDF file for ORY-1001
ligand = Chem.MolFromMolFile("ligands/ory1001.sdf")

# Add hydrogens and optimize the geometry
ligand = Chem.AddHs(ligand)
AllChem.EmbedMolecule(ligand)
AllChem.UFFOptimizeMolecule(ligand)

# Save the ligand as a PDB file for visualization
Chem.MolToPDBFile(ligand, "ligands/ory1001.pdb")

# Use NGLView to visualize the molecule
view = nv.show_file("ligands/ory1001.pdb")  # Load the saved PDB file
view.add_ball_and_stick()  # Set the visualization style to "ball and stick"
view