CIF file outputs should maintain the original file(s)' label_seq_id, auth_seq_id numbers and chain letterings

We have modified some of out input files (e.g., mnt/diffuse-private/raw/sampleworks/initial_dataset_40_occ_sweeps/processed/4OLE/4OLE_single_001_density_input.cif) so that the two numberings _atom_site.label_seq_id and _atom_site.auth_seq_id are the same. This numbering may or may not be respected by downstream models. Protenix for instance renumbers everything starting from chain A and residue 1. We should maintain the original labelings throughout, resetting to the original used by the www.rcsb.org (or a user PDB-style mmCIF file pre-deposition). I.e., generally label_seq_id starts from 1, and auth_seq_id is the numbering of the full biological protein. 

In particular, this means we need to propagate additional fields through the pipeline, rather than just those automatically loaded by atomworks.io.parse, atomworks.io.utils.io_utils.load_any, etc... 

As a corollary, we should make sure that we define which labeling we are using for selection strings used in evaluation. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIF file outputs should maintain the original file(s)' label_seq_id, auth_seq_id numbers and chain letterings #214

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CIF file outputs should maintain the original file(s)' label_seq_id, auth_seq_id numbers and chain letterings #214

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions