# Usage of graph with covid-19 data

In [12]:
import subprocess
import pickle
subprocess.Popen(["python3","-m","http.server"])
from pybiographs import InteractionGraph, OntologyGraph, Mappings, CovidData

First let's load graphs. If the graphs are not loaded, the package will download them for you. The graphs are wrapped in a class but have the same methods as in the networkx : https://networkx.github.io/documentation/stable/

In [2]:
directed_graph = InteractionGraph(directed=True)
undirected_graph = InteractionGraph(directed=False)
print("protein nodes in directed :", len(directed_graph.nodes()))
print("interaction edges in directed :", len(directed_graph.edges()))
print("protein nodes in undirected :", len(undirected_graph.nodes()))
print("interaction edges in undirected :", len(undirected_graph.edges()))

protein nodes in directed : 15531
interaction edges in directed : 1665684
protein nodes in undirected : 18553
interaction edges in undirected : 5542412


Let's load the useful dictionaries to deal with the data explained in readme :

In [3]:
maps = Mappings()
print(maps.names)

('biological_processes_union', 'cell_components_union', 'covid_go_to_name', 'metabolites_id_to_name', 'molecular_functions_union', 'gene_to_proteins', 'go_to_name', 'tissue_num_mapping')


Let's open covid-19 data:

In [13]:
covid_data = CovidData()

In [14]:
human_interacting_proteins = [p for p in covid_data.keys() if covid_data[p]["human"]]
print(human_interacting_proteins)

['O15393', 'Q92499', 'Q9BYF1', 'O43765', 'P20701', 'P35232', 'P84022', 'Q8N3R9', 'Q99623']


These proteins are the ones at https://covid-19.uniprot.org/uniprotkb?query=* that are human. Mainly, there is the protein Angiotensin-converting enzyme 2 (ACE) with uniprot id Q9BYF1 (https://www.uniprot.org/uniprot/Q9BYF1)  and Transmembrane protease serine 2 (TPS2) with uniprot id O15393 (https://www.uniprot.org/uniprot/O15393). We are interested in the interactions and the data expression about those proteins in graphs.

In [6]:
undirected_graph.classify_tissue_by_node_expression(human_interacting_proteins, limit=20)

mucosa of transverse colon  :  0.4567447744774477
small intestine Peyer's patch  :  0.4547480748074808
jejunal mucosa  :  0.44589768976897687
right lung  :  0.4433212321232123
esophagus mucosa  :  0.4433168316831684
transverse colon  :  0.443035203520352
left lobe of thyroid gland  :  0.4423377337733774
colonic mucosa  :  0.44187458745874575
right lobe of thyroid gland  :  0.44181958195819576
cortex of kidney  :  0.4412145214521452
right lobe of liver  :  0.441002200220022
upper lobe of left lung  :  0.4408789878987899
body of pancreas  :  0.4405379537953795
lower esophagus mucosa  :  0.4395929592959296
minor salivary gland  :  0.4395819581958195
thyroid gland  :  0.4347623762376237
duodenum  :  0.43366336633663366
multi-cellular organism  :  0.4321221122112211
body of stomach  :  0.4315500550055006
kidney epithelium  :  0.43145324532453244


The method classify_by_node_expression will look for the expression values of genes coding the 8 proteins in all available tissues and do the mean of their expression. It will then classify and get the top 20 tissues. We can see that respiratory and digestive system are quite touched.

Now let's take the first degree neighbors of this 8 proteins in tissue upper lobe of left lung where we know the virus is quite harmful :

In [7]:
sub_graph = undirected_graph.sub_graph_from_node_propagation(human_interacting_proteins,
                                                diameter=1,
                                                tissue="upper lobe of left lung",
                                                score_threshold = 0.99,
                                                expression_threshold = 0.0)
len(sub_graph.nodes())

2641

The method gives back a subgraph with all the first degree neighbors of proteins with expression values superior to 0.0 in the upper lobe of the left lung. There is 2641 proteins in this subgraph. Let's see the most affected biological processes and cellular components:

In [8]:
undirected_graph.most_present_biological_processes(sub_graph, 
                                                   tissue = "upper lobe of left lung",
                                                   bp_size_thresh=100,
                                                   limit=20)

Most affected biological processes :
	 nuclear-transcribed mRNA catabolic process, nonsense-mediated decay
	 establishment of protein localization to endoplasmic reticulum
	 viral transcription
	 translational initiation
	 protein localization to endoplasmic reticulum
	 nuclear-transcribed mRNA catabolic process
	 mRNA catabolic process
	 protein targeting to membrane
	 RNA splicing, via transesterification reactions
	 RNA catabolic process
	 mRNA splicing, via spliceosome
	 RNA splicing, via transesterification reactions with bulged adenosine as nucleophile
	 translation
	 ribonucleoprotein complex biogenesis
	 translational elongation
	 mRNA export from nucleus
	 ribonucleoprotein complex assembly
	 RNA export from nucleus
	 cellular component biogenesis
	 ribonucleoprotein complex subunit organization


In [9]:
undirected_graph.most_present_cellular_components(sub_graph, 
                                                   tissue = "upper lobe of left lung",
                                                   cc_size_thresh=100,
                                                   limit=20)

Most affected cellular_components :
	 ribosomal subunit
	 large ribosomal subunit
	 ribosome
	 mitochondrial protein complex
	 inner mitochondrial membrane protein complex
	 oxidoreductase complex
	 ribonucleoprotein complex
	 spliceosomal complex
	 ficolin-1-rich granule lumen
	 chromosome, telomeric region
	 nuclear periphery
	 ficolin-1-rich granule
	 mitochondrial inner membrane
	 transferase complex, transferring phosphorus-containing groups
	 RNA polymerase II transcription factor complex
	 catalytic complex
	 organelle inner membrane
	 nuclear transcription factor complex
	 cytoplasmic ribonucleoprotein granule
	 mitochondrial matrix


Now, because we know how harmfull the virus is for upper lobe of lung, let's reclassify tissue by expression of all neigbors of these proteins in upper lung :

In [10]:
undirected_graph.classify_tissue_by_node_expression(list(sub_graph.nodes()), limit=20)

upper lobe of left lung  :  0.43498759930669206
omental fat pad  :  0.43317966118194534
esophagus mucosa  :  0.4326616978767176
small intestine Peyer's patch  :  0.43108744994832554
thoracic aorta  :  0.43044631099081176
transverse colon  :  0.4301820969029879
lower esophagus  :  0.43003890423044555
body of stomach  :  0.4299655439421258
right atrium auricular region  :  0.4298833265102351
left coronary artery  :  0.42981789183765
ascending aorta  :  0.42928104024753183
left adrenal gland  :  0.42915422204810466
skin of leg  :  0.42906254176648223
minor salivary gland  :  0.4289113925618228
skin of abdomen  :  0.428841720257978
spleen  :  0.4286693639398014
lower esophagus muscularis layer  :  0.42853205305271114
esophagogastric junction muscularis propria  :  0.4282778460379169
multi-cellular organism  :  0.42810372296222426
muscle layer of sigmoid colon  :  0.4280449919459946


Interesting, fat pad is very affected by the virus, which is coherent with the fact that obesity is a risk factor of complications. Then now, we have the heart, which is also a known complication of the virus.

In [11]:
sub_graph = undirected_graph.sub_graph_from_node_propagation(human_interacting_proteins,
                                                diameter=2,
                                                tissue="upper lobe of left lung",
                                                score_threshold = 0.99,
                                                expression_threshold = 0.0)
print("number proteins degree 2 :", len(sub_graph.nodes()))
undirected_graph.classify_tissue_by_node_expression(list(sub_graph.nodes()), limit=20)

number proteins degree 2 : 4744
upper lobe of left lung  :  0.42437042600971736
esophagus mucosa  :  0.4237791288561807
omental fat pad  :  0.42273617334245966
lower esophagus  :  0.42169340963607194
skin of leg  :  0.4215933015822669
body of stomach  :  0.42148918274395075
thoracic aorta  :  0.42136566687676325
skin of abdomen  :  0.42134532869033153
transverse colon  :  0.42127924295549196
right atrium auricular region  :  0.4210191761071136
left adrenal gland  :  0.42101194714602097
small intestine Peyer's patch  :  0.4209552976071204
lower esophagus muscularis layer  :  0.42054608158159784
esophagogastric junction muscularis propria  :  0.4202927177146465
muscle layer of sigmoid colon  :  0.4202878803171211
ascending aorta  :  0.4201601907081515
left coronary artery  :  0.4199956857924411
minor salivary gland  :  0.4198715230772634
right adrenal gland  :  0.41957741093143996
left adrenal gland cortex  :  0.41904059570122215
