New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmtf parser concatenates models #1496

Open
kain88-de opened this Issue Jul 17, 2017 · 10 comments

Comments

Projects
None yet
4 participants
@kain88-de
Member

kain88-de commented Jul 17, 2017

EDIT by @jbarnoud: The issue actually comes from the handling of models by the mmtf paser. See discussion.

Expected behaviour

both give the same number of segments for 1ubq

Actual behaviour

the mmtf parser gives two segments and the pdb parser gives one

Code to reproduce the behaviour

import MDAnalysis as mda

u = mda.fetch_mmtf('1UBQ')
print(u.atoms.segments)
u = mda.Universe('1ubq.pdb')
print(u.atoms.segments)

....

Currently version of MDAnalysis:

0.17.0dev (couple days old)

@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Jul 17, 2017

Member

Behavior is even worse for *1Q0W` where the mmtf parser finds 40 segments while the PDB parser finds only 2.

Member

kain88-de commented Jul 17, 2017

Behavior is even worse for *1Q0W` where the mmtf parser finds 40 segments while the PDB parser finds only 2.

@jbarnoud

This comment has been minimized.

Show comment
Hide comment
@jbarnoud

jbarnoud Jul 24, 2017

Contributor

It seems that the problem is not so much the segments, but the whole content of the files:

import MDAnalysis as mda

u_mmtf = mda.fetch_mmtf('1Q0W')
print(len(u_mmtf.atoms))  # 32420
u_pdb = mda.Universe('/home/jon/Downloads/1q0w.pdb')
print(len(u_pdb.atoms))  # 1621

The MMTF file contains 20 times the number of atoms the PDB file has. Looking at the PDB file, there are 20 models. The issue seems to be that we do not account for models in the same way in the PDB and the MMTF parsers.

Contributor

jbarnoud commented Jul 24, 2017

It seems that the problem is not so much the segments, but the whole content of the files:

import MDAnalysis as mda

u_mmtf = mda.fetch_mmtf('1Q0W')
print(len(u_mmtf.atoms))  # 32420
u_pdb = mda.Universe('/home/jon/Downloads/1q0w.pdb')
print(len(u_pdb.atoms))  # 1621

The MMTF file contains 20 times the number of atoms the PDB file has. Looking at the PDB file, there are 20 models. The issue seems to be that we do not account for models in the same way in the PDB and the MMTF parsers.

@richardjgowers

This comment has been minimized.

Show comment
Hide comment
@richardjgowers

richardjgowers Jul 24, 2017

Member

Yeah so PDB uses models to differentiate between frames in a trajectory. In MMTF we don't do this, but maybe we should? Ie we could use u.trajectory[i] to select the ith model. I think the problem with this was that models didn't necessarily have the same number of atoms, which breaks our trajectory iteration model.

Member

richardjgowers commented Jul 24, 2017

Yeah so PDB uses models to differentiate between frames in a trajectory. In MMTF we don't do this, but maybe we should? Ie we could use u.trajectory[i] to select the ith model. I think the problem with this was that models didn't necessarily have the same number of atoms, which breaks our trajectory iteration model.

@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Jul 24, 2017

Member

We could default to the first model though.

Member

kain88-de commented Jul 24, 2017

We could default to the first model though.

@jbarnoud

This comment has been minimized.

Show comment
Hide comment
@jbarnoud

jbarnoud Jul 25, 2017

Contributor

That is what we do with the PDB parser. Actually, we do not even read after the first model when reading the topology: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/PDBParser.py#L158.

Contributor

jbarnoud commented Jul 25, 2017

That is what we do with the PDB parser. Actually, we do not even read after the first model when reading the topology: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/PDBParser.py#L158.

@jbarnoud jbarnoud changed the title from mmtf parser gives different number of segments then pdb parser to mmtf parser concatenates models Jul 25, 2017

@richardjgowers

This comment has been minimized.

Show comment
Hide comment
@richardjgowers

richardjgowers Jul 25, 2017

Member

@jbarnoud for topology purposes, we read the first model and assume that all other models have the same topology, the coordinate reader will iterate (the coordinates) over models

Member

richardjgowers commented Jul 25, 2017

@jbarnoud for topology purposes, we read the first model and assume that all other models have the same topology, the coordinate reader will iterate (the coordinates) over models

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Aug 21, 2017

Member

One issue is the handling of models, the other the handling of segments or did I misunderstand?

For PDB, segments are parts of a model. Is the same true for mmtf? Are there even segments defined in MMTF?

Assuming that a model broadly means the same thing in both formats then I would also go with the current approach to choose the first model for topology building. I think MMTFParser effectively switches the topology when another model is selected, doesn't it?

Member

orbeckst commented Aug 21, 2017

One issue is the handling of models, the other the handling of segments or did I misunderstand?

For PDB, segments are parts of a model. Is the same true for mmtf? Are there even segments defined in MMTF?

Assuming that a model broadly means the same thing in both formats then I would also go with the current approach to choose the first model for topology building. I think MMTFParser effectively switches the topology when another model is selected, doesn't it?

@richardjgowers

This comment has been minimized.

Show comment
Hide comment
@richardjgowers

richardjgowers Aug 22, 2017

Member

@orbeckst the problem is that different models can have a different number of atoms. We currently can't change the topology with frame (model) changes.

The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.

Member

richardjgowers commented Aug 22, 2017

@orbeckst the problem is that different models can have a different number of atoms. We currently can't change the topology with frame (model) changes.

The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.

@kain88-de

This comment has been minimized.

Show comment
Hide comment
@kain88-de

kain88-de Aug 22, 2017

Member

The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.

That is what most PDBs do. Actually I do not know of a rule in PDB that forbids that models have a different number of atoms. It's just what most people implicitly do, especially since it's abused as a trajectory format.

Member

kain88-de commented Aug 22, 2017

The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.

That is what most PDBs do. Actually I do not know of a rule in PDB that forbids that models have a different number of atoms. It's just what most people implicitly do, especially since it's abused as a trajectory format.

@orbeckst

This comment has been minimized.

Show comment
Hide comment
@orbeckst

orbeckst Aug 22, 2017

Member
Member

orbeckst commented Aug 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment