Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bio.PDB mmCIFParser parse exceptions #990

Closed
lennax opened this issue Nov 12, 2016 · 2 comments
Closed

Bio.PDB mmCIFParser parse exceptions #990

lennax opened this issue Nov 12, 2016 · 2 comments

Comments

@lennax
Copy link
Contributor

lennax commented Nov 12, 2016

1alw, 1det, and 1tmy still fail using this script.

Redmine issue: https://redmine.open-bio.org/issues/2626

Chris Oldfield wrote:
I recently ran the mmCIFParser object over all of PDB's mmCIF files and found a large number of files failed to parse correctly (a short script at the end to demonstrate). Of ~50k mmCIF files, 3891 files failed to parse and another 1980 were missing fields in the mmCIF dictionary.

A few examples of files that failed to parse:
http://www.rcsb.org/pdb/files/1alw.cif.gz
http://www.rcsb.org/pdb/files/1det.cif.gz
http://www.rcsb.org/pdb/files/1tmy.cif.gz

A few with missing fields:
http://www.rcsb.org/pdb/files/1mfl.cif.gz
http://www.rcsb.org/pdb/files/1tfj.cif.gz
http://www.rcsb.org/pdb/files/1zn8.cif.gz

The problem seems to be that an error in one mmCIF table, like an extra field, seems to propogate through the rest of the parse.

x86_64 gentoo linux 2008, src BioPython install

import sys
from Bio.PDB import *

if len(sys.argv) != 2:
    print "usage: mmCifParseCheck.py <structFile>"
    sys.exit(0)
structFile = sys.argv[1]

resultString = "";

#parse to structure object
numRes = 0
parser=MMCIFParser()
try:
    structure=parser.get_structure('test',structFile)
    for model in structure:
        for chain in model:
            for residue in chain:
                if(residue.id[0][:2] != "H_"):
                    numRes += 1
except:
    resultString += "parse to structure object failed\n";
else:
    resultString += "parse to structure object succeeded\n";

#parse whole mmCIF file to dict
try:
    mmcif_dict=MMCIF2Dict.MMCIF2Dict(structFile)
except:
    resultString += "parse to dict failed\n";
else:
    resultString += "parse to dict succeeded\n";

#get a required entry
try:
    id = mmcif_dict['_entry.id']
except:
    resultString += "key lookup failed\n";
else:
    resultString += "key lookup succeeded\n";

print resultString
print "number of non-het residues " + str(numRes)
@speleo3
Copy link
Contributor

speleo3 commented Jul 20, 2020

I cannot reproduce this with current master, the given script succeeds for all 6 listed mmCIF files.

@peterjc
Copy link
Member

peterjc commented Jul 20, 2020

Thanks Thomas. Given the time that has passed, I think we can just close this without identifying exactly when and how this was fixed.

@peterjc peterjc closed this as completed Jul 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants