Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB files with >=100k atoms #226

Closed
GiovanniBussi opened this issue Apr 12, 2017 · 1 comment
Closed

PDB files with >=100k atoms #226

GiovanniBussi opened this issue Apr 12, 2017 · 1 comment
Assignees
Milestone

Comments

@GiovanniBussi
Copy link
Member

PDB format does not allow atom numbers >=100k.

Currently we are bound to PDB format in two places:

  • MOLINFO
  • Reference structures for RMSD and similar variables

For MOLINFO we need to access residue numbers, chains, and atom types, so we are probably forces to remain with PDB format. A possible solution would be to allow multiple PDBs (see also #134).

For reference structures instead it could be complicated to split a system on multiple files. We could perhaps think about some different format?

We could look for some already existing modification of the PDB formats, for instance the hybrid format mentioned here.

@GiovanniBussi
Copy link
Member Author

On the openmm github they discussed about this issue:
openmm/openmm#659
openmm/openmm#664

It looks like they chose the VMD format, which switches to hexadecimal past 100k. This is what I obtained from a xyz file read and write from VMD:

 ....
 ATOM  99997  Ar      X   1      45.349  38.631  15.116  0.00  0.00
 ATOM  99998  Ar      X   1      46.189  38.631  15.956  0.00  0.00
 ATOM  99999  Ar      X   1      46.189  39.471  15.116  0.00  0.00
 ATOM  186a0  Ar      X   1      45.349  39.471  15.956  0.00  0.00
 ATOM  186a1  Ar      X   1      45.349  38.631  16.796  0.00  0.00
 ATOM  186a2  Ar      X   1      46.189  38.631  17.636  0.00  0.00
 ....

This however would not be compatible with our needs since we need the serial to be unique when we store non-consecutive atoms for RMSD calculation. In the VMD representation indeed, past 100k you can have some hex number containing only digits that are the same as decimal number. E.g., 18700 in hex is equivalent to 100096 decimal. Since VMD does not mark in any special way hex numbers, there is no way to distinguish them. For instance, there would not be anyway to say if this line

 ATOM  18700  Ar      X   1      45.349  41.150   5.879  0.00  0.00

corresponds to atom 18700 or to atom 100096.

The format discussed hybrid 36 format seems more consistent since there is a 1-to-1 map between numbers and strings. The sequence of strings would be something like

99998
99999
A0000
A0001

etc. Plus it uses 26-base numbers, resulting in a maximum atom number equal to ~ 87M for 5 digits strings (for VMD syntax it is ~ 1M, so much less). However, I am not sure how easy it would be to produce such files. They provide small C and python tool to manipulate these numbers.

Actually, including in PLUMED either the VMD format or this hybrid 36 would be super easy and backward compatible (with the exception of the problem in the VMD format mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant