Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch mode does not work? #15

Closed
rczerminski-valo opened this issue Jul 30, 2021 · 10 comments
Closed

batch mode does not work? #15

rczerminski-valo opened this issue Jul 30, 2021 · 10 comments

Comments

@rczerminski-valo
Copy link
Contributor

rczerminski-valo commented Jul 30, 2021

Support of simultaneous docking of multiple ligands and batch mode for virtual screening

I am trying to dock multiple ligands using --batch option. My understanding is that input .pdbqt file for this to work should contain multiple ligands following the pattern below:

MODEL 1
... molecule 1 ...
ENDMDL
MODEL 2
... molecule 2 ...
ENDMDL

Maybe my understanding is not correct and some other mechanism should be used or there is a bug, since when I give a .pdbqt file with multiple ligands as an input it fails with "An unknown error occurred" error message.

I attached vina-An-unknown-error-occurred-bug.tar.gz file which should allow to reproduce the issue.
vina-An-unknown-error-occurred-bug.tar.gz

@jeeberhardt
Copy link
Member

Hi rczerminski-valo,

Sorry about this, the batchmode lacks currently some documentation... hopefully this will be fixed very soon. For a quick example about how the batch mode works, please look at issue #11. In your example, you can only have one molecule per PDBQT file.

Hope it helps.

Best,
Jerome.

@rczerminski-valo
Copy link
Contributor Author

I see. So, if this is the case, my understanding is that the only difference between --ligand and --batch modes is that --batch requires --dir argument with designated directory to write docked ligand files to - is it correct?

@jeeberhardt
Copy link
Member

Yep, exact! ;)

@rczerminski-valo
Copy link
Contributor Author

rczerminski-valo commented Aug 2, 2021

How difficult would it be to modify code to allow for multiple ligands input files and multi-ligand, multi-pose output files? As far as I can tell this (i.e. docking multiple ligands) is rather common usage scenario currently accomplished by scripting around vina executable.
--Ryszard

@jeeberhardt
Copy link
Member

I think the greater question here is not about whether or not we could, but if we should.

The PDB format (which is the basis of the PDBQT format) is pretty clear about the use of MODEL/ENDMDL keywords:

The MODEL record specifies the model serial number when multiple models of the same structure are presented in a single coordinate entry, as is often the case with structures determined by NMR.

We are aware that many people are using this trick to generate PDBQT files with multiple ligands (along side all the different flavors of the PDB format...), but multiple issues arise from that 1) how would we differentiate multiple poses of the same ligand from different ligands and 2) more importantly the file won't be in PDB(QT) format anymore so you will face the eventual interoperability issues with other programs.

Jerome.

@rczerminski-valo
Copy link
Contributor Author

rczerminski-valo commented Aug 2, 2021

Interoperability issues are definitely important, and yes using PDBQT-like format to store multi-ligand, multi-pose output might be not the good way forward. Some other format more suited for this purpose may offer better solution. I do not know what would be the most suitable one for vina, but some possibilities are, I guess, SDF, CML? Do you think any of these would be suitable? or some other format?
--Ryszard

@diogomart
Copy link
Member

Hi Ryszard,
SDF and MOL2 files would be great, but unfortunately it's unlikely we will do it in the near future. There would be a similar problem though, because they don't explicitly distinguish multiple entries as a) conformers of a given molecule or as b) different molecules. So the software reading these files would have to guess based on the molecules names or some other way.

@rczerminski-valo
Copy link
Contributor Author

rczerminski-valo commented Aug 3, 2021

With SDF one solution might be to have simple and well documented convention (so the "consumer" of these files does not have to guess what is what) in the multi-ligand, multi-pose output there could be records clearly marking different molecules and poses associated with them. For example if we have 2 molecules with 3 poses per molecule we would have 6 MOL records in SDF: with (mseq, pseq) fields = (1,1) (1,2) (1,3) (2,1), (2,2) (2,3)

[...]
>  <mseq>
1
> <pseq>
1
[...]
>  <mseq>
1
> <pseq>
2
[...]

@rczerminski-valo
Copy link
Contributor Author

JSON based format (ideally some agreed upon standard) might be an interesting possibility as well. There is some discussion around this issue here rdkit/rdkit#1137 ... SDF seems however to be the simplest short (or medium) term solution.

@diogomart
Copy link
Member

That's an interesting discussion, thanks for posting it here. It seems that the chemo-informatics experts haven't converged yet. Your suggestion with molecule and conformer indices in the SDF data fields is good. If we get to do this it will probably be very similar to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants