-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buggy graph encoding #205
Comments
Hi @Vincentx15 thanks for the bug report - I can reproduce it. I’ll check it out this weekend. |
Hello guys, do you have any updates on this ? Or maybe an idea about how long this issue would take to be solved ? Thanks a lot for your work ! |
Hey @Vincentx15 - apologies. It’s a super busy time for me. This is very much on my to do list. I will try to get at it later this week if not early next. Thanks for your patience. |
Ok no problem, I understand ! Thanks a lot for the support :) |
It looks like the This file is supposed to be used to embed amino acids, and currently it looks like is from https://link.springer.com/article/10.1007/s008940100038. |
ok, i find how to modif this embedding func. class graphein.protein.config.ProteinGraphConfig(*, granularity: typing.Union[typing.Literal['N', 'CA', 'C', 'O', 'CB', 'OG', 'CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ', 'OD1', 'ND2', 'CG1', 'CG2', 'CD', 'CE', 'NZ', 'OD2', 'OE1', 'NE2', 'OE2', 'OH', 'NE', 'NH1', 'NH2', 'OG1', 'SD', 'ND1', 'SG', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OXT'], typing.Literal['atom', 'centroids']] = 'CA', keep_hets: bool = False, insertions: bool = False, pdb_dir: pathlib.Path = PosixPath('../examples/pdbs'), verbose: bool = False, exclude_waters: bool = True, deprotonate: bool = False, protein_df_processing_functions: typing.List[typing.Callable] = None, edge_construction_functions: typing.List[typing.Union[typing.Callable, str]] = [<function add_peptide_bonds>], node_metadata_functions: typing.List[typing.Union[typing.Callable, str]] = [<function meiler_embedding>], edge_metadata_functions: typing.List[typing.Union[typing.Callable, str]] = None, graph_metadata_functions: typing.List[typing.Callable] = None, get_contacts_config: graphein.protein.config.GetContactsConfig = None, dssp_config: graphein.protein.config.DSSPConfig = None)[[source]](https://graphein.ai/_modules/graphein/protein/config.html#ProteinGraphConfig) by setting |
Thanks @1511878618 that's super helpful. Looks like the embedding file wasn't packaged in the tarball on PyPi. I'll try to reup with the missing file. |
Ok, and i see some comments on the parameters of function like below |
Hi @1511878618 could you open a separate issue for this. The docs do need to be updated for the Dataset utils and those comments removed. In the example you shared, the problem is that the |
ok i'll do it |
@Vincentx15 I figured out the cause of the problem. The You can see this here: import graphein.protein as gp
from graphein.protein.config import ProteinGraphConfig
from graphein.protein.graphs import construct_graph
from graphein.protein.features.nodes.amino_acid import amino_acid_one_hot
new_funcs = {
"granularity": 'CA',
"keep_hets": [False],
"edge_construction_functions":
[
gp.add_ionic_interactions,
],
"node_metadata_functions": [amino_acid_one_hot]
}
config = ProteinGraphConfig(**new_funcs)
g = construct_graph(config=config, pdb_code="4f5s", chain_selection="A")
print(len(g.nodes)) #583 nodes
gp.add_disulfide_interactions(g)
print(len(g.nodes)) #617 nodes
print(g.nodes) # here we can see the CYS residues on the B chain have been added to the graph. It's a pretty straightforward fix & I'll push an update ASAP :) |
And until I get around to the PR, here's a fixed disulfide function. I've tested in the example you used above and it works as expected :) import networkx as nx
import pandas as pd
from typing import Optional
from graphein.protein.resi_atoms import DISULFIDE_ATOMS, DISULFIDE_RESIS
from graphein.protein.edges.distance import add_interacting_resis, get_interacting_atoms, compute_distmat, filter_dataframe
import loguru as log
def add_disulfide_interactions(
G: nx.Graph, rgroup_df: Optional[pd.DataFrame] = None
):
"""
Find all disulfide interactions between CYS residues
(:const:`~graphein.protein.resi_atoms.DISULFIDE_RESIS`,
:const:`~graphein.protein.resi_atoms.DISULFIDE_ATOMS`).
Criteria: sulfur atom pairs are within 2.2A of each other.
:param G: networkx protein graph
:type G: nx.Graph
:param rgroup_df: pd.DataFrame containing rgroup data, defaults to ``None``,
which retrieves the df from the provided nx graph.
:type rgroup_df: pd.DataFrame, optional
"""
# Check for existence of at least two Cysteine residues
residues = [d["residue_name"] for _, d in G.nodes(data=True)]
if residues.count("CYS") < 2:
log.debug(
f"{residues.count('CYS')} CYS residues found. Cannot add disulfide interactions with fewer than two CYS residues."
)
return
if rgroup_df is None:
rgroup_df = G.graph["rgroup_df"]
disulfide_df = filter_dataframe(
rgroup_df, "residue_name", DISULFIDE_RESIS, True
)
disulfide_df = filter_dataframe(
disulfide_df, "atom_name", DISULFIDE_ATOMS, True
)
disulfide_df = filter_dataframe(disulfide_df, "node_id", list(G.nodes), True) # this line is the fix
distmat = compute_distmat(disulfide_df)
interacting_atoms = get_interacting_atoms(2.2, distmat)
add_interacting_resis(G, interacting_atoms, disulfide_df, ["disulfide"]) |
Hey guys, thanks a lot for the fix. Splitting in chains solved it for me ! Best, |
* fix disulphide bond node filtering #205 * Local dataset (#208) (#211) * add load local pdb_files to ProteinGraphDataset * load local pdb_files from a list * test and black and isort and add CHANGELOG.md * docstring formatting * comment jupyter; just run and ssh or vscode to interact with container * install dependency of jupyter nbextension * Fixed Dockerfile and unit tests * Update chage log Co-authored-by: xutingfeng <xutingfeng@big.ac.cn> Co-authored-by: Arian Jamasb <arjamasb@gmail.com> Co-authored-by: ryan <ryan.greenhalgh@hotmail.co.uk> Co-authored-by: Nicktf <49584439+1511878618@users.noreply.github.com> Co-authored-by: xutingfeng <xutingfeng@big.ac.cn> Co-authored-by: ryan <ryan.greenhalgh@hotmail.co.uk> * fix typo Co-authored-by: Nicktf <49584439+1511878618@users.noreply.github.com> Co-authored-by: xutingfeng <xutingfeng@big.ac.cn> Co-authored-by: ryan <ryan.greenhalgh@hotmail.co.uk>
This should be fully resolved in 1.5.2 ( |
Hi folks, thanks for this package !
I'm experiencing bugs when creating graphs from pdb. First of all, if no node_metadata_functions are given (comment below), it defaults to using meiler embeddings which use a file that cannot be found in : graphein/protein/features/nodes/meiler_embeddings.csv. Second of all, some nodes are empty, which raises errors when trying to add edges.
I enclose a small script that raises the errors for a random PDB (please download the pdb data/4f5s.ent.gz).
This should return the following error :
I'm using Ubuntu 20, python 3.10, graphein 1.5.1
Thanks in advance !
The text was updated successfully, but these errors were encountered: