Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chain_selection returns KeyError: 'residue_name' #134

Closed
johnnytam100 opened this issue Mar 16, 2022 · 6 comments
Closed

chain_selection returns KeyError: 'residue_name' #134

johnnytam100 opened this issue Mar 16, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@johnnytam100
Copy link

johnnytam100 commented Mar 16, 2022

Hi Arian, I'm sorry for the frequent opening of threads!

I want to do chain selection of PDB 2VVI, which has chains A, B, C and D.

g = construct_graph(config=config, pdb_code='2VVI', chain_selection="A")

However, it returns a KeyError: 'residue_name':

DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 217 total nodes
INFO:graphein.protein.edges.distance:Found 411 hbond interactions.
INFO:graphein.protein.edges.distance:Found 42 hbond interactions.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-22-68026463de79>](https://localhost:8080/#) in <module>()
----> 1 g = construct_graph(config=config, pdb_code='2VVI', chain_selection="A")

3 frames
[/usr/local/lib/python3.7/dist-packages/graphein/protein/edges/distance.py](https://localhost:8080/#) in <listcomp>(.0)
    154     """
    155     # Check for existence of at least two Cysteine residues
--> 156     residues = [d["residue_name"] for _, d in G.nodes(data=True)]
    157     if residues.count("CYS") < 2:
    158         log.debug(

KeyError: 'residue_name'

Any idea how to resolve this error?
Thanks!

@a-r-j
Copy link
Owner

a-r-j commented Mar 16, 2022

Hi @johnnytam100, no worries!

Could you please share the config? It looks like the trouble is occurring when you’re assigning the disulphide bonds. The key error says there is a node that doesn’t have a residue_name In the graph. Potentially, we’re not filtering out some unwanted HETATMS/Ligands properly which implies a potential bug. Are you including heteroatoms intentionally?

If you can share the config I can take a closer look this afternoon.

Also, could you please share the version of Graphein you’re using? pip list Should do the trick.

I just tried the following in a colab notebook and it seems to work as expected so I may be wrong about hetatms/ligands/waters causing the problem.

from graphein.protein.graphs import construct_graph
from graphein.protein.edges.distance import add_disulfide_interactions, add_hydrogen_bond_interactions
from graphein.protein.config import ProteinGraphConfig

config = ProteinGraphConfig(
    edge_construction_functions=[add_disulfide_interactions, add_hydrogen_bond_interactions],
     keep_hets=True,
     insertions=True,
     exclude_waters = False,
     )

g = construct_graph(pdb_code="2vvi", chain_selection="A", config=config)
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 217 total nodes
INFO:graphein.protein.edges.distance:Found 12 disulfide interactions.
INFO:graphein.protein.edges.distance:Found 411 hbond interactions.
INFO:graphein.protein.edges.distance:Found 42 hbond interactions.

@johnnytam100
Copy link
Author

johnnytam100 commented Mar 16, 2022

My graphein version is graphein 1.2.0

And I have tested your code, which worked! Then I found something strange, if you swap the positions of add_disulfide_interactions and add_hydrogen_bond_interactions, like this

from graphein.protein.graphs import construct_graph
from graphein.protein.edges.distance import add_disulfide_interactions, add_hydrogen_bond_interactions
from graphein.protein.config import ProteinGraphConfig

config = ProteinGraphConfig(
    edge_construction_functions=[add_hydrogen_bond_interactions, add_disulfide_interactions],
     keep_hets=True,
     insertions=True,
     exclude_waters = False,
     )

g = construct_graph(pdb_code="2vvi", chain_selection="A", config=config)
DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 217 total nodes
INFO:graphein.protein.edges.distance:Found 411 hbond interactions.
INFO:graphein.protein.edges.distance:Found 42 hbond interactions.
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-60-1e0f5c8e3c7b>](https://localhost:8080/#) in <module>()
     10      )
     11 
---> 12 g = construct_graph(pdb_code="2vvi", chain_selection="A", config=config)

3 frames
[/usr/local/lib/python3.7/dist-packages/graphein/protein/edges/distance.py](https://localhost:8080/#) in <listcomp>(.0)
    154     """
    155     # Check for existence of at least two Cysteine residues
--> 156     residues = [d["residue_name"] for _, d in G.nodes(data=True)]
    157     if residues.count("CYS") < 2:
    158         log.debug(

KeyError: 'residue_name'

My original config was

new_funcs = {"keep_hets": False,
             "edge_construction_functions": [add_peptide_bonds,
                                              add_hydrogen_bond_interactions,
                                              add_disulfide_interactions,
                                              add_ionic_interactions,
                                              add_aromatic_interactions,
                                              add_aromatic_sulphur_interactions,
                                              add_cation_pi_interactions],
            }

config = ProteinGraphConfig(**new_funcs)

# construct graph
g = construct_graph(config=config, pdb_code='2vvi', chain_selection="A")

Swapping only "add_hydrogen_bond_interactions" and "add_disulfide_interactions" didn't work.
Then I wonder what order of these arguments should work..

@a-r-j
Copy link
Owner

a-r-j commented Mar 16, 2022

Huh, that is a bug. I’ll check it out

@a-r-j
Copy link
Owner

a-r-j commented Mar 16, 2022

Ah, I see what’s going on. The hydrogen bonds are looking at the unfiltered dataframe and adding in new nodes from the other chains. These nodes don’t have any metadata attached because they weren’t added in the earlier steps where we add information to nodes and so we get the error.

This is definitely a bug and I’ll figure out a fix. What I would recommend you do for now is to not use the chain_selection=“A” Parameter but instead use chain_selection=“all”. You can then use graphein.protein.subgraphs.extract_subgraph_from_chains() to select the chains.

Tutorial notebook here

Good catch btw! Thanks for flagging this :)

@a-r-j a-r-j added the bug Something isn't working label Mar 16, 2022
@a-r-j a-r-j self-assigned this Mar 16, 2022
a-r-j added a commit that referenced this issue Mar 16, 2022
…132 #134 #135 (#136)

* fix edges functions adding nodes to graphs with chain selections #134

* change generator comprehension in coordinate update to list comprehension to allow pickling #135

* [docs] update changelog

* update conversion dosctrings #132

* update version to 1.2.1

* prevent execution in docs #131
@a-r-j
Copy link
Owner

a-r-j commented Mar 16, 2022

Hey @johnnytam100 you should be able to proceed as normal now. v1.2.1 has fixes for this now and is on PyPI (pip install graphein==1.2.1). Would you mind giving it a try & letting me know if all's well so I can close the issues?

@johnnytam100
Copy link
Author

Thank you @a-r-j ! It is fine now with pip install graphein==1.2.1! :)

@a-r-j a-r-j closed this as completed Mar 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants