# Aging network exploration
This notebook is dedicated to the exploration of the STRING database and Gene2Vec embedding dataset. The primary aim is to explore genes that are related to known longevity genes (LGs) by using protein-protein network information. The rationale for this approach is that gene networks provide information on the interaction of genes, thus allowing for a charecterization of gene function and intervention access points by identification of gene influencers. For example, if gene A is known to play a key role in aging when underexpressed and network discovery informs on gene B's role as an excitor of gene A, interventions can now be investigated for both genes.

#### Data
**STRING** data used in this notebook stems from their Homo Sapiens dataset. 
**Gene2Vec** data consists of the embeddings created by Du et al (2019) from the 984 GEO datasets containing information about gene co-expression. Those embeddings will be used to augment the interpretation of the network results.

### Dataset import
We use 

In [9]:
import pandas as pd
# Contains network info of protein-protein interactions
string_human_network = pd.read_csv('/mnt/e/projects/aging_target_discovery/data/STRING/9606.protein.links.full.v11.5.txt',sep=' ')
# Contains extensive number of aliases for each string protein
string_aliases = pd.read_csv('/mnt/e/projects/aging_target_discovery/data/STRING/9606.protein.aliases.v11.5.txt',sep='\t')
# Contains detailed information for every protein + preferred name for each
string_protein_info = pd.read_csv('/mnt/e/projects/aging_target_discovery/data/STRING/9606.protein.info.v11.5.txt',sep='\t')


# Create a dictionary that contains the standard names of proteins and the reverse dictionary
string_hgenes = string_protein_info['#string_protein_id'].to_list()
string_pref_name = string_protein_info['preferred_name'].to_list()
STRING_HGENE_PNAME = {s_name:p_name for s_name, p_name in zip(string_hgenes, string_pref_name)}
STRING_PNAME_HGENE = {p_name:s_name for s_name, p_name in STRING_HGENE_PNAME.items()}

19566

In [2]:
string_human_network

Unnamed: 0,protein1,protein2,neighborhood,neighborhood_transferred,fusion,cooccurence,homology,coexpression,coexpression_transferred,experiments,experiments_transferred,database,database_transferred,textmining,textmining_transferred,combined_score
0,9606.ENSP00000000233,9606.ENSP00000379496,0,0,0,0,0,0,54,0,0,0,0,103,85,155
1,9606.ENSP00000000233,9606.ENSP00000314067,0,0,0,0,0,0,0,0,180,0,0,0,61,197
2,9606.ENSP00000000233,9606.ENSP00000263116,0,0,0,0,0,0,62,0,152,0,0,0,101,222
3,9606.ENSP00000000233,9606.ENSP00000361263,0,0,0,0,0,0,0,0,161,0,0,47,58,181
4,9606.ENSP00000000233,9606.ENSP00000409666,0,0,0,0,0,60,63,0,213,0,0,0,72,270
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11938493,9606.ENSP00000485678,9606.ENSP00000354800,0,0,0,0,872,213,0,0,0,0,0,0,0,213
11938494,9606.ENSP00000485678,9606.ENSP00000308270,0,0,0,0,899,152,0,0,0,0,0,0,0,151
11938495,9606.ENSP00000485678,9606.ENSP00000335660,0,0,0,0,0,182,0,0,0,0,0,0,0,181
11938496,9606.ENSP00000485678,9606.ENSP00000300127,0,0,0,0,843,155,0,0,0,0,0,0,0,154
