batch mode does not work? #15

rczerminski-valo · 2021-07-30T17:35:46Z

Support of simultaneous docking of multiple ligands and batch mode for virtual screening

I am trying to dock multiple ligands using --batch option. My understanding is that input .pdbqt file for this to work should contain multiple ligands following the pattern below:

MODEL 1
... molecule 1 ...
ENDMDL
MODEL 2
... molecule 2 ...
ENDMDL

Maybe my understanding is not correct and some other mechanism should be used or there is a bug, since when I give a .pdbqt file with multiple ligands as an input it fails with "An unknown error occurred" error message.

I attached vina-An-unknown-error-occurred-bug.tar.gz file which should allow to reproduce the issue.
vina-An-unknown-error-occurred-bug.tar.gz

jeeberhardt · 2021-07-31T15:40:07Z

Hi rczerminski-valo,

Sorry about this, the batchmode lacks currently some documentation... hopefully this will be fixed very soon. For a quick example about how the batch mode works, please look at issue #11. In your example, you can only have one molecule per PDBQT file.

Hope it helps.

Best,
Jerome.

rczerminski-valo · 2021-08-01T17:56:20Z

I see. So, if this is the case, my understanding is that the only difference between --ligand and --batch modes is that --batch requires --dir argument with designated directory to write docked ligand files to - is it correct?

jeeberhardt · 2021-08-02T07:46:33Z

Yep, exact! ;)

rczerminski-valo · 2021-08-02T12:17:05Z

How difficult would it be to modify code to allow for multiple ligands input files and multi-ligand, multi-pose output files? As far as I can tell this (i.e. docking multiple ligands) is rather common usage scenario currently accomplished by scripting around vina executable.
--Ryszard

jeeberhardt · 2021-08-02T15:47:52Z

I think the greater question here is not about whether or not we could, but if we should.

The PDB format (which is the basis of the PDBQT format) is pretty clear about the use of MODEL/ENDMDL keywords:

The MODEL record specifies the model serial number when multiple models of the same structure are presented in a single coordinate entry, as is often the case with structures determined by NMR.

We are aware that many people are using this trick to generate PDBQT files with multiple ligands (along side all the different flavors of the PDB format...), but multiple issues arise from that 1) how would we differentiate multiple poses of the same ligand from different ligands and 2) more importantly the file won't be in PDB(QT) format anymore so you will face the eventual interoperability issues with other programs.

Jerome.

rczerminski-valo · 2021-08-02T17:00:25Z

Interoperability issues are definitely important, and yes using PDBQT-like format to store multi-ligand, multi-pose output might be not the good way forward. Some other format more suited for this purpose may offer better solution. I do not know what would be the most suitable one for vina, but some possibilities are, I guess, SDF, CML? Do you think any of these would be suitable? or some other format?
--Ryszard

diogomart · 2021-08-02T18:12:58Z

Hi Ryszard,
SDF and MOL2 files would be great, but unfortunately it's unlikely we will do it in the near future. There would be a similar problem though, because they don't explicitly distinguish multiple entries as a) conformers of a given molecule or as b) different molecules. So the software reading these files would have to guess based on the molecules names or some other way.

rczerminski-valo · 2021-08-03T13:03:58Z

With SDF one solution might be to have simple and well documented convention (so the "consumer" of these files does not have to guess what is what) in the multi-ligand, multi-pose output there could be records clearly marking different molecules and poses associated with them. For example if we have 2 molecules with 3 poses per molecule we would have 6 MOL records in SDF: with (mseq, pseq) fields = (1,1) (1,2) (1,3) (2,1), (2,2) (2,3)

[...]
>  <mseq>
1
> <pseq>
1
[...]
>  <mseq>
1
> <pseq>
2
[...]

rczerminski-valo · 2021-08-03T13:27:37Z

JSON based format (ideally some agreed upon standard) might be an interesting possibility as well. There is some discussion around this issue here rdkit/rdkit#1137 ... SDF seems however to be the simplest short (or medium) term solution.

diogomart · 2021-08-04T18:45:12Z

That's an interesting discussion, thanks for posting it here. It seems that the chemo-informatics experts haven't converged yet. Your suggestion with molecule and conformer indices in the SDF data fields is good. If we get to do this it will probably be very similar to that.

jeeberhardt closed this as completed Aug 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch mode does not work? #15

batch mode does not work? #15

rczerminski-valo commented Jul 30, 2021 •

edited

Loading

jeeberhardt commented Jul 31, 2021

rczerminski-valo commented Aug 1, 2021

jeeberhardt commented Aug 2, 2021

rczerminski-valo commented Aug 2, 2021 •

edited

Loading

jeeberhardt commented Aug 2, 2021

rczerminski-valo commented Aug 2, 2021 •

edited

Loading

diogomart commented Aug 2, 2021

rczerminski-valo commented Aug 3, 2021 •

edited

Loading

rczerminski-valo commented Aug 3, 2021

diogomart commented Aug 4, 2021

batch mode does not work? #15

batch mode does not work? #15

Comments

rczerminski-valo commented Jul 30, 2021 • edited Loading

jeeberhardt commented Jul 31, 2021

rczerminski-valo commented Aug 1, 2021

jeeberhardt commented Aug 2, 2021

rczerminski-valo commented Aug 2, 2021 • edited Loading

jeeberhardt commented Aug 2, 2021

rczerminski-valo commented Aug 2, 2021 • edited Loading

diogomart commented Aug 2, 2021

rczerminski-valo commented Aug 3, 2021 • edited Loading

rczerminski-valo commented Aug 3, 2021

diogomart commented Aug 4, 2021

rczerminski-valo commented Jul 30, 2021 •

edited

Loading

rczerminski-valo commented Aug 2, 2021 •

edited

Loading

rczerminski-valo commented Aug 2, 2021 •

edited

Loading

rczerminski-valo commented Aug 3, 2021 •

edited

Loading