Skip to content

Better support for symmetry in the Structure model #220

@sbliven

Description

@sbliven

PDB and mmCIF files use symmetry operators to reduce the number of atoms which need to be specified. This is used with NCS to reconstruct the asymmetric unit (e.g. in viruses), as well as for specifying biological assemblies (BA).

BioJava is able to read and understand the symmetry operations required to generate the full structure, but the data model is far from ideal for this. Most analysis applications require all atom positions to be stored in an array, so we would like to be able to store a representation of the full biological assembly. The problem is that the current model assumes that chainIDs are unique within a particular model. Thus, dealing with BAs requires one of these work-arounds:

  1. Use multiple Structure objects. Cons: Can't use methods requiring a single Structure object. Some methods which take Atom[] assume that all atoms share a structure through the getParent() hierarchy, e.g. for cloning atom arrays. Structure metadata (e.g. header data) is lost, duplicated, or inconsistent.
  2. Rename chainIDs so that all are unique. Workable, now that mmCIF officially supports 4 character chains. Cons: Difficult to map back to original chainID and symmetry operations. No write support (pending Implement mmCIF file writing #188).
  3. Use multiple models within a single Structure. Current approach for RCSB-supplied BA files and for structure alignment files. Cons: Only intended for NMR structures. Difficult to map back to original symmetry operation. Other tools (specifically pymol, but also jmol to a lesser extent) expect models to have identical contents and be superimposed.

A long-term solution would be to associate chains with a particular NCS and CS operator. These could be additional objects in the Structure hierarchy, or could just be fields in Chain. For instance:

  1. Structure
  2. Model
  3. Unit Cell
  4. Asymmetric Unit
  5. Chain
  6. Group
  7. Atom

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementImprovement of existing code or methodnew featureNew method or data structure

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions