Virtual screening using Python bindings #320

eunos-1128 · 2024-06-10T04:27:27Z

Hi,

I'm trying to virtual screening with vina's batch mode using Python API.

According to docs, set_ligand_from_file looks to take multiple ligand PDBQT files as arguments.

I think PDBQT file can have multiple models in it.

Which is appropriate for vina's batch mode, one PDBQT file with multiple models or multiple PDBQT files where each one has only one model?
(I've never tried and compared both of two ways. )

The text was updated successfully, but these errors were encountered:

diogomart · 2024-06-10T15:49:08Z

Hi
Vina expects multiple PDBQT files where each one has only one model.

eunos-1128 · 2024-06-11T00:51:11Z

Thank you for answering.

eunos-1128 · 2024-06-13T11:19:41Z

Using Python bindings, what is the difference between normal docking and batch mode?

The only difference seems to be the ability to dock without initialising a new Vina instance each time.

Could you please tell me if there are any advantages that are only available in batch mode? (I couldn't tell)

diogomart · 2024-06-13T15:20:48Z

That is the only advantage, which makes it faster.

eunos-1128 · 2024-06-14T03:04:55Z

When using set_ligand_from_file, the program would sometimes crash due to memory overflow because there were too many files in the virtual screening.

When using Python bindings, if there are too many files, I will try to perform normal docking instead of using batch mode.

diogomart · 2024-06-15T00:41:50Z

That sounds like a bug. What was the approximate number of ligands that made it crash? More like thousands, or more like millions?

rwxayheee · 2024-06-15T00:52:04Z

Hi @eunos-1128 Could you provide a bit more details of your hardware, OS and where do you run from (Jupyter notebook has a memory limit, while a python script shouldn't)?

eunos-1128 · 2024-06-17T01:53:23Z

@diogomart

That sounds like a bug. What was the approximate number of ligands that made it crash? More like thousands, or more like millions?

About a million ligands per a docking.

@rwxayheee

I execute the program using Jupyter notebook on a linux workstation, which I didn't know that would be the limitation.

rwxayheee · 2024-06-17T01:58:39Z

Hi @eunos-1128
Jupyter notebook has a default memory limit (you could use a search engine to read more on it).
It might be helpful to try running the same codes outside of Jupyter notebook. You can copy all codes to a python script, then use your environment python to run it.
Otherwise, try increasing Jupyter notebook's memory limit might help, too, but I'm not entirely sure how much it needs to dock a million ligands.

I have never done a very thorough memory monitoring on each steps, so I'm not very sure how memory usage is affected by the way we run it. For a very long time, I was not aware of the batch mode and just run one Vina command for each ligand... Ideally, memory usage shouldn't scale with number of ligands in the batch mode (?) I can take a closer look when I get a chance

eunos-1128 · 2024-06-17T05:16:09Z

@rwxayheee

Thanks for the advice.

I checked and it seems that Jupyter notebook can relax the memory limit, which I'm trying now.

If that doesn't work, I will move the code to a python file and run it.

eunos-1128 · 2024-06-17T05:29:57Z

@diogomart @rwxayheee

v.set_ligand_from_file('1iep_ligand.pdbqt')
v.compute_vina_maps(center=[15.190, 53.903, 16.917], box_size=[20, 20, 20])
The next lines are used to first load a PDBQT file containing the ligand called 1iep_ligand.pdbqt and then compute the affinity maps for each ligand atom types accroding to the Vina forcefield. You might need to read first the tutorial Basic docking to learn how to create a PDBQT file of a ligand. There is a small subility here, the behavior of the compute_vina_maps() function changes if the ligand was loaded before or after computing the vina maps. If no ligand was initialized, compute_vina_maps() will compute the affinity map for each atom types defined in the Vina forcefield (22 in total). This is very useful when we want to dock ligands in batch (a.k.a virtual screening) but we don’t necessarily know beforehand what atom types will be necessary for thoses ligands. Alternately to set_ligand_from_file(), you could also load a molecule using a molecule string in PDBQT format using the set_ligand_from_string() function.

I would like to confirm something about this part of the documentation.

Does this mean that I should run set_ligand_from_file after running compute_vina_maps if I want to dock multiple ligands at once, for example for virtual screening?

Does the order in which these functions are executed change the docking results (i.e. docking score and ligand RMSD) significantly?

rwxayheee · 2024-06-17T05:44:34Z

Hi @eunos-1128
Running compute_vina_maps before setting the first ligand can save time. The documentation recommends this, such that the maps are computed only once, but they are good for all ligands. No more redundant steps to compute the corresponding maps for ligands of different compositions.
Doing set_ligand_from_file before or after set_ligand_from_file shouldn't change the results of the docking. However, if it does, please do let us know.

eunos-1128 · 2024-06-17T06:52:08Z

@rwxayheee

Thank you!

I'll try both of two ways to compare results if I have time enough to do.

eunos-1128 · 2024-06-17T16:26:15Z

I used the write_poses method to write out the docking results after virtual screening using Python bindings, but there was no output.

When I set one ligand PDBQT at a time, not in batch mode, the output files were written out without any problem, but when I set multiple PDBQT files in batch mode, nothing was output.

I can't think of a cause for this, but do you have any ideas?

rwxayheee · 2024-06-17T16:35:53Z

Hi @eunos-1128
Could you show an example of your codes for batch docking? Doesn't need specific ligand or receptor file, but I wish to know the order of commands (initialize vina, set receptor, compute maps, set ligand, write poses) and how they are arranged in loop. It would be tremendously helpful if you could provide that information. I will try to reproduce with my files

rwxayheee · 2024-06-17T16:45:20Z

one more thing, in your first post

According to docs, set_ligand_from_file looks to take multiple ligand PDBQT files as arguments.

This might not exactly be the batch mode you're looking for virtual screening. Doing something like the following:

v.set_ligand_from_file(['Ligand_1.pdbqt', 'Ligand_2.pdbqt'])

Will generate poses each containing two ligands (could be another cause for the unexpected memory surge). The multiple ligand PDBQT files are taken as arguments for docking multiple ligands simultaneously. If you want to perform docking one at a time, you need to read them in one by one, use write_poses after each dock within the loop.

eunos-1128 · 2024-06-18T03:51:41Z

@rwxayheee

Hi, I use the function below for virtual screening.

def dock_ligands_to_protein(prot_name):
    ligand_files = glob.glob(f'{PARENT_DIR}/actives/{prot_name}/active_[1-2].pdbqt')
    ligand_files += glob.glob(f'{PARENT_DIR}/decoys/{prot_name}/decoy_[1-2].pdbqt')
    result_file = f'{PARENT_DIR}/results/vs/all/ligands_to_{prot_name}.pdbqt'

    if os.path.exists(result_file):
        return

    try:
        print(f"Starting docking {prot_name} and compounds(active/decoy)...")

        vina = Vina(sf_name="vina", seed=42, verbosity=1)

        vina.set_receptor(
            f"{PARENT_DIR}/receptors/{prot_name}_elem_charge_fixed-FHs.pdbqt"
        )

        pml.load(
            os.path.expanduser(f"~/eunos/dockings/datasets/DUD-E/all/{prot_name}/crystal_ligand.mol2")
        )
        centroid_coords = pml.centerofmass('crystal_ligand')
        pml.delete('all')

        vina.compute_vina_maps(center=centroid_coords, box_size=BOX_SIZE)
        vina.set_ligand_from_file(ligand_files)

        pp.pprint(vina.info())

        vina.randomize()
        vina.dock(exhaustiveness=32, n_poses=20)

        print(f"Finishing docking {prot_name} and compounds(active/decoy)!")

        vina.write_poses(
            result_file,
            energy_range=10000.0,
            overwrite=True,
            n_poses=20
        )
        
    except Exception as e:
        print(f"Exception occurred for {prot_name}: {e}")

    finally:
        pml.delete("all")
        gc.collect()

rwxayheee · 2024-06-18T04:01:53Z

Hi @eunos-1128
Thanks for posting the commands. I think you might be performing multiple-ligand docking (trying to fit many ligands into one structure), not individual screening. If you want to perform screening, for individual ligands, you could do:

def dock_ligands_to_protein(prot_name):
    ligand_files = glob.glob(f'{PARENT_DIR}/actives/{prot_name}/active_[1-2].pdbqt')
    ligand_files += glob.glob(f'{PARENT_DIR}/decoys/{prot_name}/decoy_[1-2].pdbqt')

    if os.path.exists(result_file):
        return

    try:
        print(f"Starting docking {prot_name} and compounds(active/decoy)...")

        vina = Vina(sf_name="vina", seed=42, verbosity=1)

        vina.set_receptor(
            f"{PARENT_DIR}/receptors/{prot_name}_elem_charge_fixed-FHs.pdbqt"
        )

        pml.load(
            os.path.expanduser(f"~/eunos/dockings/datasets/DUD-E/all/{prot_name}/crystal_ligand.mol2")
        )
        centroid_coords = pml.centerofmass('crystal_ligand')
        pml.delete('all')

        vina.compute_vina_maps(center=centroid_coords, box_size=BOX_SIZE)

        for ligand_file in ligand_files:
            ligand_name = os.path.basename(ligand_file).replace('.pdbqt','')
            vina.set_ligand_from_file(ligand_file)
        
            vina.randomize()
            vina.dock(exhaustiveness=32, n_poses=20)

            print(f"Finishing docking {prot_name} and compound: {ligand_file}!")

            result_file = f'{PARENT_DIR}/results/vs/all/{ligand_name}_to_{prot_name}.pdbqt'

            vina.write_poses(
                result_file,
                energy_range=10000.0,
                overwrite=True,
                n_poses=20
            )
        
    except Exception as e:
        print(f"Exception occurred for {prot_name}: {e}")

    finally:
        pml.delete("all")
        gc.collect()

i didn't test the codes. But i hope this could explain what I meant by individual virtual screening. Also, the input PDBQT files need to have exactly one ligand each. I don't think multiple-ligand PDBQT inputs are currently supported

eunos-1128 · 2024-06-18T04:13:43Z

@rwxayheee

Thank you for reviewing the code.

I checked set_ligand_from_file and found it seems to deal with batch ligands.
https://github.com/ccsb-scripps/AutoDock-Vina/blob/v1.2.5/build/python/vina/vina.py#L164-L186

You meant those ligands(pdbqt_filename whose type is list or tuple of PDBQT file paths) are intended to execute multiple-ligands docking?

Also, the input PDBQT files need to have exactly one ligand each. I don't think multiple-ligand PDBQT inputs are currently supported

I understand this point.

rwxayheee · 2024-06-18T04:16:27Z

Hi @eunos-1128

You meant those ligands are intended to execute multiple-ligands docking?

Yes, in my understanding it's for docking multiple ligands simultaneously. You can do a quick check by giving two ligands, and check out the output - each "pose" will contain two ligands.

I think initially you were docking multiple ligands simultaneously (trying to generate a co-complex of receptor and many, many ligands) which led to the unexpected memory surge

eunos-1128 · 2024-06-18T04:21:27Z

@rwxayheee

I think initially you were docking multiple ligands simultaneously (trying to generate a co-complex of receptor and many, many ligands) which led to the unexpected memory surge

I see...

I'm trying the way you proposed. Thank you.

eunos-1128 · 2024-06-18T10:19:34Z

@rwxayheee

It looks working for the time being.

Thanks a lot!

eunos-1128 closed this as completed Jun 11, 2024

eunos-1128 reopened this Jun 13, 2024

eunos-1128 closed this as completed Jun 14, 2024

eunos-1128 reopened this Jun 17, 2024

eunos-1128 closed this as completed Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual screening using Python bindings #320

Virtual screening using Python bindings #320

eunos-1128 commented Jun 10, 2024 •

edited

Loading

diogomart commented Jun 10, 2024

eunos-1128 commented Jun 11, 2024

eunos-1128 commented Jun 13, 2024 •

edited

Loading

diogomart commented Jun 13, 2024

eunos-1128 commented Jun 14, 2024

diogomart commented Jun 15, 2024

rwxayheee commented Jun 15, 2024

eunos-1128 commented Jun 17, 2024 •

edited

Loading

rwxayheee commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 17, 2024

eunos-1128 commented Jun 17, 2024 •

edited

Loading

rwxayheee commented Jun 17, 2024

eunos-1128 commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 17, 2024 •

edited

Loading

rwxayheee commented Jun 17, 2024

rwxayheee commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 18, 2024 •

edited

Loading

rwxayheee commented Jun 18, 2024

eunos-1128 commented Jun 18, 2024 •

edited

Loading

rwxayheee commented Jun 18, 2024 •

edited

Loading

eunos-1128 commented Jun 18, 2024

eunos-1128 commented Jun 18, 2024

Virtual screening using Python bindings #320

Virtual screening using Python bindings #320

Comments

eunos-1128 commented Jun 10, 2024 • edited Loading

diogomart commented Jun 10, 2024

eunos-1128 commented Jun 11, 2024

eunos-1128 commented Jun 13, 2024 • edited Loading

diogomart commented Jun 13, 2024

eunos-1128 commented Jun 14, 2024

diogomart commented Jun 15, 2024

rwxayheee commented Jun 15, 2024

eunos-1128 commented Jun 17, 2024 • edited Loading

rwxayheee commented Jun 17, 2024 • edited Loading

eunos-1128 commented Jun 17, 2024

eunos-1128 commented Jun 17, 2024 • edited Loading

rwxayheee commented Jun 17, 2024

eunos-1128 commented Jun 17, 2024 • edited Loading

eunos-1128 commented Jun 17, 2024 • edited Loading

rwxayheee commented Jun 17, 2024

rwxayheee commented Jun 17, 2024 • edited Loading

eunos-1128 commented Jun 18, 2024 • edited Loading

rwxayheee commented Jun 18, 2024

eunos-1128 commented Jun 18, 2024 • edited Loading

rwxayheee commented Jun 18, 2024 • edited Loading

eunos-1128 commented Jun 18, 2024

eunos-1128 commented Jun 18, 2024

eunos-1128 commented Jun 10, 2024 •

edited

Loading

eunos-1128 commented Jun 13, 2024 •

edited

Loading

eunos-1128 commented Jun 17, 2024 •

edited

Loading

rwxayheee commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 17, 2024 •

edited

Loading

rwxayheee commented Jun 17, 2024 •

edited

Loading

eunos-1128 commented Jun 18, 2024 •

edited

Loading

eunos-1128 commented Jun 18, 2024 •

edited

Loading

rwxayheee commented Jun 18, 2024 •

edited

Loading