Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chain ids not read correctly #73

Closed
ErikMarklund opened this issue Sep 27, 2022 · 12 comments
Closed

Chain ids not read correctly #73

ErikMarklund opened this issue Sep 27, 2022 · 12 comments

Comments

@ErikMarklund
Copy link

I am trying to colour a dimeric protein based on chain id, but the chain ids in the pdb seem to get mangled during the import. Or maybe I am misunderstanding something. My setup is pretty simple (see image below). I take the chain_number and connect it to a Map Range node, mapping 1-2 to 0-1 (tried different ranges), which is then passed to a ColorRamp node. The colour from the ramp is passed to all elements of a MOL_style_colour node, which in turn connects to the geometry output. However, the resulting colouring seems more or less random, albeit anticorrelated between the two chains (see image). The chain labels for the two monomers in the pdb file are C and D. Pymol seems to handle this fine (third image), so I don't think the pdb file is broken.

Is this a bug? I will try and see if I can spot something in the source code and update if I find something suspicious.
Screenshot 2022-09-27 at 12 48 40
Screenshot 2022-09-27 at 12 48 46
Screenshot 2022-09-27 at 13 00 48

@BradyAJohnston
Copy link
Owner

Hmmm from first glance it looks like it should be working. Are you able to provide a PDB if or file I cam test with? I've had trouble with Atomium (which powers the PDB parsing) messing up some chain ids sometimes, so you could see if just Atomium via python can parse it OK.

Any custom AA or other out of the ordinary aspects to the file?

@ErikMarklund
Copy link
Author

ErikMarklund commented Sep 27, 2022

Hi again,

I checked the source code and could not find anything obviously wrong with how the chain labels are handled.

Still, the spreadsheet in blender shows that the chain_numbers are scrambled.

Screenshot 2022-09-27 at 13 38 48

EDIT: This is actually probably correct. See comment below.

@ErikMarklund
Copy link
Author

Yes. How can I send it to you? Can't drag&drop pdb files here it seems.

@ErikMarklund
Copy link
Author

I will try atomium only too. Have never used it thought so we shall see what I get out of it.

@ErikMarklund
Copy link
Author

Atomium assigns the chain ids correctly. Although atomium does not assign chain ids to atoms explicitly. A simple test script, which uses a similar logic to molecular nodes, gives the right chain ids:

#!/usr/bin/env python

import atomium as atm

pdb_path='MS2_sym-C_C.pdb'

struct = atm.open(pdb_path)

model = struct.models[0]
mols = model.molecules()
for mol in mols:
    currid = mol.id
 
    for res in mol.residues():
        for atom in res.atoms():
            print(f'{atom.name:6s}{res.name:6s}{res.id.split(".")[1]:5s}{currid:5s}')

@ErikMarklund
Copy link
Author

ErikMarklund commented Sep 27, 2022

Hi again,

I checked the source code and could not find anything obviously wrong with how the chain labels are handled.

Still, the spreadsheet in blender shows that the chain_numbers are scrambled.

Screenshot 2022-09-27 at 13 38 48

At a closer look this seems correct. The atoms come in pairs, since the dimer is symmetric and the atom ids are the same in both chains. The chain_number is 1 and 2 for the two equivalent atoms in each case (I think).

@ErikMarklund
Copy link
Author

Ok I solved it for my protein. The chains in my pdb file had atom ids that both started from the same number. I changed the atom ids in my pdb file so that they formed a contiguous range 1..965,966...1930. Now blender gets it (see image). I also used a slightly simpler node tree (lower image), but that doesn't matter.

Not sure if this is a bug or not, because I cannot find anything conclusive about whether the pdb format requires unique atom ids or not. But for structures with >99999 atoms this will be a problem because the atom id field is only 5 characters.

I am a bit surprised why the atom id from the pdb file should matter when Molecular Nodes or blender tries to retrieve info about an atom, but if it does I can see how non-unique ids would cause this problem.

Screenshot 2022-09-27 at 23 22 44

Screenshot 2022-09-27 at 23 22 53

@BradyAJohnston
Copy link
Owner

Ah thanks for investigating! I see what is happening. I use the atom IDs to ensure the ordering is correct when creating the models for molecular nodes, which is why strange things are happening with the attributes.

Frustratingly Atomium doesn't currently allow ordered access to the atoms that they appear in the file, it is instead an unordered set which I then have to reorder based on their atom IDs.

Another way to get around the issue would be reading the structure via MDAnalysis using the Molecualr Dynamnics tab, rising the .pdb for both the topology and the trajectory file. MDAnalysis does a nicer job of parsing the files, but doesn't give access to some other information like biological assemblies, and is harder to install, which is why I currently have a balance of the two.

@ErikMarklund
Copy link
Author

Mm. The pdb format is deprecated for a reason...

Would it be possible to use both chain label and atom IDs for ordering?

@BradyAJohnston
Copy link
Owner

Should be able to! Will look into it when I am back from travelling in a couple of weeks.

@mruetter
Copy link

Hi all, I had exactly the same problem and may have found the reason. The first chain in my PDB was missing the terminus row. Pymol and Chimera handled it, but Atomium did not.
So I just inserted it manually and it worked. Maybe it helps you.

@BradyAJohnston
Copy link
Owner

Are you able to try with the new MolecularNodes 2.0 and see if this issue is resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants