-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual screening using Python bindings #320
Comments
Hi |
Thank you for answering. |
Using Python bindings, what is the difference between normal docking and batch mode? The only difference seems to be the ability to dock without initialising a new Vina instance each time. Could you please tell me if there are any advantages that are only available in batch mode? (I couldn't tell) |
That is the only advantage, which makes it faster. |
When using When using Python bindings, if there are too many files, I will try to perform normal docking instead of using batch mode. |
That sounds like a bug. What was the approximate number of ligands that made it crash? More like thousands, or more like millions? |
Hi @eunos-1128 Could you provide a bit more details of your hardware, OS and where do you run from (Jupyter notebook has a memory limit, while a python script shouldn't)? |
About a million ligands per a docking. I execute the program using Jupyter notebook on a linux workstation, which I didn't know that would be the limitation. |
Hi @eunos-1128 I have never done a very thorough memory monitoring on each steps, so I'm not very sure how memory usage is affected by the way we run it. For a very long time, I was not aware of the batch mode and just run one Vina command for each ligand... Ideally, memory usage shouldn't scale with number of ligands in the batch mode (?) I can take a closer look when I get a chance |
Thanks for the advice. I checked and it seems that Jupyter notebook can relax the memory limit, which I'm trying now. If that doesn't work, I will move the code to a python file and run it. |
I would like to confirm something about this part of the documentation. Does this mean that I should run Does the order in which these functions are executed change the docking results (i.e. docking score and ligand RMSD) significantly? |
Hi @eunos-1128 |
Thank you! I'll try both of two ways to compare results if I have time enough to do. |
I used the When I set one ligand PDBQT at a time, not in batch mode, the output files were written out without any problem, but when I set multiple PDBQT files in batch mode, nothing was output. I can't think of a cause for this, but do you have any ideas? |
Hi @eunos-1128 |
one more thing, in your first post
This might not exactly be the batch mode you're looking for virtual screening. Doing something like the following: v.set_ligand_from_file(['Ligand_1.pdbqt', 'Ligand_2.pdbqt']) Will generate poses each containing two ligands (could be another cause for the unexpected memory surge). The multiple ligand PDBQT files are taken as arguments for docking multiple ligands simultaneously. If you want to perform docking one at a time, you need to read them in one by one, use |
Hi, I use the function below for virtual screening. def dock_ligands_to_protein(prot_name):
ligand_files = glob.glob(f'{PARENT_DIR}/actives/{prot_name}/active_[1-2].pdbqt')
ligand_files += glob.glob(f'{PARENT_DIR}/decoys/{prot_name}/decoy_[1-2].pdbqt')
result_file = f'{PARENT_DIR}/results/vs/all/ligands_to_{prot_name}.pdbqt'
if os.path.exists(result_file):
return
try:
print(f"Starting docking {prot_name} and compounds(active/decoy)...")
vina = Vina(sf_name="vina", seed=42, verbosity=1)
vina.set_receptor(
f"{PARENT_DIR}/receptors/{prot_name}_elem_charge_fixed-FHs.pdbqt"
)
pml.load(
os.path.expanduser(f"~/eunos/dockings/datasets/DUD-E/all/{prot_name}/crystal_ligand.mol2")
)
centroid_coords = pml.centerofmass('crystal_ligand')
pml.delete('all')
vina.compute_vina_maps(center=centroid_coords, box_size=BOX_SIZE)
vina.set_ligand_from_file(ligand_files)
pp.pprint(vina.info())
vina.randomize()
vina.dock(exhaustiveness=32, n_poses=20)
print(f"Finishing docking {prot_name} and compounds(active/decoy)!")
vina.write_poses(
result_file,
energy_range=10000.0,
overwrite=True,
n_poses=20
)
except Exception as e:
print(f"Exception occurred for {prot_name}: {e}")
finally:
pml.delete("all")
gc.collect() |
Hi @eunos-1128 def dock_ligands_to_protein(prot_name):
ligand_files = glob.glob(f'{PARENT_DIR}/actives/{prot_name}/active_[1-2].pdbqt')
ligand_files += glob.glob(f'{PARENT_DIR}/decoys/{prot_name}/decoy_[1-2].pdbqt')
if os.path.exists(result_file):
return
try:
print(f"Starting docking {prot_name} and compounds(active/decoy)...")
vina = Vina(sf_name="vina", seed=42, verbosity=1)
vina.set_receptor(
f"{PARENT_DIR}/receptors/{prot_name}_elem_charge_fixed-FHs.pdbqt"
)
pml.load(
os.path.expanduser(f"~/eunos/dockings/datasets/DUD-E/all/{prot_name}/crystal_ligand.mol2")
)
centroid_coords = pml.centerofmass('crystal_ligand')
pml.delete('all')
vina.compute_vina_maps(center=centroid_coords, box_size=BOX_SIZE)
for ligand_file in ligand_files:
ligand_name = os.path.basename(ligand_file).replace('.pdbqt','')
vina.set_ligand_from_file(ligand_file)
vina.randomize()
vina.dock(exhaustiveness=32, n_poses=20)
print(f"Finishing docking {prot_name} and compound: {ligand_file}!")
result_file = f'{PARENT_DIR}/results/vs/all/{ligand_name}_to_{prot_name}.pdbqt'
vina.write_poses(
result_file,
energy_range=10000.0,
overwrite=True,
n_poses=20
)
except Exception as e:
print(f"Exception occurred for {prot_name}: {e}")
finally:
pml.delete("all")
gc.collect() i didn't test the codes. But i hope this could explain what I meant by individual virtual screening. Also, the input PDBQT files need to have exactly one ligand each. I don't think multiple-ligand PDBQT inputs are currently supported |
Thank you for reviewing the code. I checked You meant those ligands(
I understand this point. |
Hi @eunos-1128
Yes, in my understanding it's for docking multiple ligands simultaneously. You can do a quick check by giving two ligands, and check out the output - each "pose" will contain two ligands. I think initially you were docking multiple ligands simultaneously (trying to generate a co-complex of receptor and many, many ligands) which led to the unexpected memory surge |
I see... I'm trying the way you proposed. Thank you. |
It looks working for the time being. Thanks a lot! |
Hi,
I'm trying to virtual screening with vina's batch mode using Python API.
According to docs,
set_ligand_from_file
looks to take multiple ligand PDBQT files as arguments.I think PDBQT file can have multiple models in it.
Which is appropriate for vina's batch mode, one PDBQT file with multiple models or multiple PDBQT files where each one has only one model?
(I've never tried and compared both of two ways. )
The text was updated successfully, but these errors were encountered: