Chain ids not read correctly #73

ErikMarklund · 2022-09-27T11:06:47Z

I am trying to colour a dimeric protein based on chain id, but the chain ids in the pdb seem to get mangled during the import. Or maybe I am misunderstanding something. My setup is pretty simple (see image below). I take the chain_number and connect it to a Map Range node, mapping 1-2 to 0-1 (tried different ranges), which is then passed to a ColorRamp node. The colour from the ramp is passed to all elements of a MOL_style_colour node, which in turn connects to the geometry output. However, the resulting colouring seems more or less random, albeit anticorrelated between the two chains (see image). The chain labels for the two monomers in the pdb file are C and D. Pymol seems to handle this fine (third image), so I don't think the pdb file is broken.

Is this a bug? I will try and see if I can spot something in the source code and update if I find something suspicious.

BradyAJohnston · 2022-09-27T11:48:49Z

Hmmm from first glance it looks like it should be working. Are you able to provide a PDB if or file I cam test with? I've had trouble with Atomium (which powers the PDB parsing) messing up some chain ids sometimes, so you could see if just Atomium via python can parse it OK.

Any custom AA or other out of the ordinary aspects to the file?

ErikMarklund · 2022-09-27T11:49:35Z

Hi again,

I checked the source code and could not find anything obviously wrong with how the chain labels are handled.

Still, the spreadsheet in blender shows that the chain_numbers are scrambled.

EDIT: This is actually probably correct. See comment below.

ErikMarklund · 2022-09-27T11:51:24Z

Yes. How can I send it to you? Can't drag&drop pdb files here it seems.

ErikMarklund · 2022-09-27T11:52:12Z

I will try atomium only too. Have never used it thought so we shall see what I get out of it.

ErikMarklund · 2022-09-27T12:15:59Z

Atomium assigns the chain ids correctly. Although atomium does not assign chain ids to atoms explicitly. A simple test script, which uses a similar logic to molecular nodes, gives the right chain ids:

#!/usr/bin/env python

import atomium as atm

pdb_path='MS2_sym-C_C.pdb'

struct = atm.open(pdb_path)

model = struct.models[0]
mols = model.molecules()
for mol in mols:
    currid = mol.id
 
    for res in mol.residues():
        for atom in res.atoms():
            print(f'{atom.name:6s}{res.name:6s}{res.id.split(".")[1]:5s}{currid:5s}')

ErikMarklund · 2022-09-27T12:37:34Z

Hi again,

I checked the source code and could not find anything obviously wrong with how the chain labels are handled.

Still, the spreadsheet in blender shows that the chain_numbers are scrambled.

At a closer look this seems correct. The atoms come in pairs, since the dimer is symmetric and the atom ids are the same in both chains. The chain_number is 1 and 2 for the two equivalent atoms in each case (I think).

ErikMarklund · 2022-09-27T21:39:25Z

Ok I solved it for my protein. The chains in my pdb file had atom ids that both started from the same number. I changed the atom ids in my pdb file so that they formed a contiguous range 1..965,966...1930. Now blender gets it (see image). I also used a slightly simpler node tree (lower image), but that doesn't matter.

Not sure if this is a bug or not, because I cannot find anything conclusive about whether the pdb format requires unique atom ids or not. But for structures with >99999 atoms this will be a problem because the atom id field is only 5 characters.

I am a bit surprised why the atom id from the pdb file should matter when Molecular Nodes or blender tries to retrieve info about an atom, but if it does I can see how non-unique ids would cause this problem.

BradyAJohnston · 2022-09-28T02:14:27Z

Ah thanks for investigating! I see what is happening. I use the atom IDs to ensure the ordering is correct when creating the models for molecular nodes, which is why strange things are happening with the attributes.

Frustratingly Atomium doesn't currently allow ordered access to the atoms that they appear in the file, it is instead an unordered set which I then have to reorder based on their atom IDs.

Another way to get around the issue would be reading the structure via MDAnalysis using the Molecualr Dynamnics tab, rising the .pdb for both the topology and the trajectory file. MDAnalysis does a nicer job of parsing the files, but doesn't give access to some other information like biological assemblies, and is harder to install, which is why I currently have a balance of the two.

ErikMarklund · 2022-09-28T14:24:31Z

Mm. The pdb format is deprecated for a reason...

Would it be possible to use both chain label and atom IDs for ordering?

BradyAJohnston · 2022-10-03T06:46:53Z

Should be able to! Will look into it when I am back from travelling in a couple of weeks.

mruetter · 2022-11-15T08:48:24Z

Hi all, I had exactly the same problem and may have found the reason. The first chain in my PDB was missing the terminus row. Pymol and Chimera handled it, but Atomium did not.
So I just inserted it manually and it worked. Maybe it helps you.

BradyAJohnston · 2022-12-13T10:47:16Z

Are you able to try with the new MolecularNodes 2.0 and see if this issue is resolved?

BradyAJohnston closed this as completed Dec 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chain ids not read correctly #73

Chain ids not read correctly #73

ErikMarklund commented Sep 27, 2022

BradyAJohnston commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022 •

edited

ErikMarklund commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022 •

edited

ErikMarklund commented Sep 27, 2022

BradyAJohnston commented Sep 28, 2022

ErikMarklund commented Sep 28, 2022

BradyAJohnston commented Oct 3, 2022

mruetter commented Nov 15, 2022

BradyAJohnston commented Dec 13, 2022

Chain ids not read correctly #73

Chain ids not read correctly #73

Comments

ErikMarklund commented Sep 27, 2022

BradyAJohnston commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022 • edited

ErikMarklund commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022

ErikMarklund commented Sep 27, 2022 • edited

ErikMarklund commented Sep 27, 2022

BradyAJohnston commented Sep 28, 2022

ErikMarklund commented Sep 28, 2022

BradyAJohnston commented Oct 3, 2022

mruetter commented Nov 15, 2022

BradyAJohnston commented Dec 13, 2022

ErikMarklund commented Sep 27, 2022 •

edited

ErikMarklund commented Sep 27, 2022 •

edited