In [None]:
%matplotlib widget
import logging
logging.basicConfig(level="INFO")

# Graphein - Atom Graph Tutorial

In this notebook, we'll run through residue-level graph construction in Graphein. We start by discsussing the config, the high-level API and spend the bulk of the tutorial running through the low-level API

In [None]:
from graphein.protein.config import ProteinGraphConfig

config = ProteinGraphConfig()
config.dict()

Let's run through the config:

* `granularity`: specifies the granularity of the graph (i.e. what should the nodes be). Possible values are: atom identifiers (e.g. `"CA"` for $\alpha$ carbon, `"CB"` for $\beta$ carbon), `"centroid"` to use residue centroids (under the hood, this is the same as "CA", but we use the average x,y,z coordinates for the atoms in the residue) or `"atom"` for atom-level construction. This is discussed here.
* `keep_hets`: this is a boolean specifying whether or not to keep heteroatoms present in the .pdb file. Heteroatoms are typically non-protein atoms (waters, metal ions, ligands) but can sometimes contain non-standard or modified residues.
* `insertions`: boolean specifying whether or not to keep insertions in the PDB file
* `pdb_dir` optional path to a folder in which to save pdb files. Otherwise, `/tmp/` will be used
* `verbose`: bool controlling amount of info printed
* `exclude_waters`: not implemented
* `deprotonate`: bool indicating whether or not to remove Hydrogen atoms
* `protein_df_processing_functions`: list of functions with which to process the PDB dataframe. Discussed in the low-level API.
* `edge_construction_functions`: list of functions to compute edges with
* `node_metadata_functions`: list of functions to annotate nodes with
* `edge_metadata_functions`: list of functions to annotate edges with
* `graph_meta_functions`: list of functions to annotate graph with
* `get_contacts_config`: A separate config object if using GetContacts edge construction functions

In [None]:
from graphein.protein.edges.atomic import add_atomic_edges
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges]}

config = ProteinGraphConfig(**params_to_change)
config.dict()

## High-level API
Graphein features a high-level API which should be applicable for most simple graph constructions. This can be used on either a .pdb file (so you can run whatever pre-processing you wish), or we can provide a PDB accession code and retrieve a structure from the PDB itself. If a path is provided, it takes precedence over the PDB code.

To use it we do as follows:

In [None]:
from graphein.protein.graphs import construct_graph

g = construct_graph(config=config, pdb_code="3eiy")
g = construct_graph(config=config, pdb_path="../examples/pdbs/3eiy.pdb")

In [None]:
from graphein.protein.visualisation import plot_protein_structure_graph

p = plot_protein_structure_graph(G=g, angle=0, colour_edges_by="kind", colour_nodes_by="element_symbol", label_node_ids=False, node_size_min=2, node_alpha=0.5)

As you can see, all the bond types are the same. In order to add bond order assignment, we need to pass the `add_bond_order` function to the list of edge functions. We do the same for assigning ring status with `add_ring_status`

In [None]:
from graphein.protein.edges.atomic import add_atomic_edges, add_bond_order, add_ring_status
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges, add_bond_order, add_ring_status]}

config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")
p = plot_protein_structure_graph(G=g, angle=0, colour_edges_by="kind", colour_nodes_by="element_symbol", label_node_ids=False, node_size_min=2, node_alpha=0.5)

# Print some edges to verify
for i, (u, v, a) in enumerate(g.edges(data=True)):
    if i % 30 == 0:
        print(u, v, a)

We can also include some of the distance-based functions we used in the residue graph construction. Here we add the residue-level delaunay triangulation to the graph.

In [None]:
from graphein.protein.edges.distance import add_delaunay_triangulation
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges, add_bond_order, add_delaunay_triangulation]}

config = ProteinGraphConfig(**params_to_change)
g = construct_graph(config=config, pdb_code="3eiy")
p = plot_protein_structure_graph(G=g, angle=0, colour_edges_by="kind", colour_nodes_by="element_symbol", label_node_ids=False, node_size_min=2, node_alpha=0.5)