New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDB File Models Missing Element Column #537
Comments
@gcapitani noticed that the atom name column in PDB files if shifted one position to the left for calcium, with respect to C-alpha atoms, so the |
Guessing the elements for standard aminoacids should be fine, but we need to print warnings. You don't even need to look at the column where "CA" is, standard aminoacids don't have calcium so it's safe to always assign a "C". I'd say for any other molecule we shouldn't try guessing, instead fail with a nice error message "Element is missing for line ..." |
This relates to #305 |
Actually using the chemical component dictionary (CCD) for this would provide a general solution: if the residue's 3-letter code is that of a valid chem comp and the element is not present we can read it by looking up the atom name in the CCD. That would work for any kind of molecule. |
Great! I think that the optimal solution then is to complete the |
BioJava used to include whitespace in atom names, so that |
I have been looking at the code and the // Parse element from the element field. If this field is
// missing (i.e. misformatted PDB file), then parse the
// name from the atom name.
Element element = Element.R;
if ( line.length() > 77 ) {
// parse element from element field
try {
element = Element.valueOfIgnoreCase(line.substring (76, 78).trim());
} catch (IllegalArgumentException e){}
} else {
// parse the name from the atom name
String elementSymbol = null;
// for atom names with 4 characters, the element is
// at the first position, example HG23 in Valine
if (fullname.trim().length() == 4) {
elementSymbol = fullname.substring(0, 1);
} else if ( fullname.trim().length() > 1){
elementSymbol = fullname.substring(0, 2).trim();
}
try {
if (elementSymbol!=null)
element = Element.valueOfIgnoreCase(elementSymbol);
} catch (IllegalArgumentException e){
logger.warn("Element {} was not recognised. Assigning generic element R to it",
elementSymbol);
}
}
atom.setElement(element); There are three things to improve:
I would rewrite this handling, because I think that using the |
Fix #537 - handle missing and empty Element column in PDB files
We have recently noticed that some structural bioinformatics programs (structure refinement or modelling) generate PDB files where the
Element
column is missing. The element column is the last column, where the periodic table element of theAtom
is indicated.Parsing these files with BioJava currently does not allow the calculation of structural alignments or symmetry (and any other analysis using C-alpha atoms), because to extract the C-alpha atoms of a structure the name (CA) and element (C) of the
Atoms
is checked (inStructureTools.getRepresentativeAtoms()
).The
Element
column is not completely redundant, because in case of a modified aminoacid with calcium bound to it, the name CA alone does not distinguish the calcium from the C-alpha carbon and the element column is needed to do so.On the other hand, we could print a warning when parsing such models and guess and fill the
Element
column from theAtom
names (at least for theAtoms
in aminoacids), in order to support the incomplete files.Question: is there any drawback in guessing and filling the
Element
of theAtoms
? Can we use theChemical Components
for that?The text was updated successfully, but these errors were encountered: