-
Notifications
You must be signed in to change notification settings - Fork 392
Description
PDB and mmCIF files use symmetry operators to reduce the number of atoms which need to be specified. This is used with NCS to reconstruct the asymmetric unit (e.g. in viruses), as well as for specifying biological assemblies (BA).
BioJava is able to read and understand the symmetry operations required to generate the full structure, but the data model is far from ideal for this. Most analysis applications require all atom positions to be stored in an array, so we would like to be able to store a representation of the full biological assembly. The problem is that the current model assumes that chainIDs are unique within a particular model. Thus, dealing with BAs requires one of these work-arounds:
- Use multiple Structure objects. Cons: Can't use methods requiring a single
Structure
object. Some methods which takeAtom[]
assume that all atoms share a structure through the getParent() hierarchy, e.g. for cloning atom arrays. Structure metadata (e.g. header data) is lost, duplicated, or inconsistent. - Rename chainIDs so that all are unique. Workable, now that mmCIF officially supports 4 character chains. Cons: Difficult to map back to original chainID and symmetry operations. No write support (pending Implement mmCIF file writing #188).
- Use multiple models within a single Structure. Current approach for RCSB-supplied BA files and for structure alignment files. Cons: Only intended for NMR structures. Difficult to map back to original symmetry operation. Other tools (specifically pymol, but also jmol to a lesser extent) expect models to have identical contents and be superimposed.
A long-term solution would be to associate chains with a particular NCS and CS operator. These could be additional objects in the Structure hierarchy, or could just be fields in Chain. For instance:
- Structure
- Model
- Unit Cell
- Asymmetric Unit
- Chain
- Group
- Atom