Skip to content

Error parsing PDBQT to Mol: Element 'G' not found #20

Open
@linminhtoo

Description

@linminhtoo

hello Dr Pavel,
I chanced across your repo and found it useful to parse docked pdbqt files back to RDKit mol for further analysis. However, sometimes the pdbqt2mol() function fails as RDKit is not happy with the "G" atom type.

def pdbqt2molblock(pdbqt_block, smi, mol_id):

giving:

****
Post-condition Violation
Element 'G' not found
Violation occurred on line 93 in file /project/build/temp.linux-x86_64-cpython-310/rdkit/Code/GraphMol/PeriodicTable.h
Failed Expression: anum > -1
****

This "G" atom type is generated by meeko at macrocycles during ligand preparation for docking with Autodock Vina, see: forlilab/Meeko#10

I saw in your in-line code comments about this issue, but I didn't manage to modify the fix_pdbqt() function successfully:

atom_pdbqt_type = re.sub('D|A', '', line[77:79]).strip() # can add meeko macrocycle types (G and \d (CG0 etc) in the sub expression if will be going to use it

Please note that I didn't use your script to run docking; it was prepared slightly differently. I am just using your code to parse back the docked .pdbqt files
I also saw your comment on build_macrocycle=:

# can do it True, but there is some problem with >=7-chains mols

Do you have any idea how to fix this issue?
I've attached a sample offending .pdbqt file: https://gist.github.com/linminhtoo/5949437ae066fdd136709971dcc36220#file-bad-pdbqt-L26-L27
As you can see, some lines have either "G" (macrocycle), and also "CG0" (not sure if this will also cause problems). I tried brute forcing by replacing "G" with "C" but the template bond order assignment step failed (seems RDKit fails to parse the ring correctly)

Thanks,
Min Htoo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions