Project Idea     
To identify disease pairs of inverse comorbidity using network analysis of disease-gene networks and disease function networks.    

       
Inverse Comorbidities       
Comorbidities are a common phenomenon in many diseases. It refers to the presence of multiple diseases or medical conditions in a patient, often leading to a worse state of disease. However, it has been observed that inverse comorbidities also exist. This condition – when an unexpectedly low probability of a disease occurring in people with another disease is observed – is intriguing, as it may help us gain insight into the pathogenesis of certain diseases and allow us to understand the underlying mechanisms of certain unknown disease pathways.     

The idea of the code is to find these inverse comorbidities by finding the pathways/functions lost in a particular genetic disorder and mapping their corresponding related diseases in the disease-gene network. 




In [1]:
import pandas as pd
import networkx as nx
from networkx.algorithms import bipartite

disease_gene_data = pd.read_csv('DG-AssocMiner_miner-disease-gene-edited.tsv', sep='\t')
disease_func_data = pd.read_csv('DF-Miner_miner-disease-function.tsv', sep='\t')
gene_func_data = pd.read_csv('GF-Miner_miner-gene-function.tsv', sep='\t')
disease_data = pd.read_csv('D-MeshMiner_miner-disease.tsv', sep='\t')

disease_func_data['# MESH_ID'] = disease_func_data['# MESH_ID'].apply(lambda x: x.split(':')[1])
disease_func_data['GO_ID'] = disease_func_data['GO_ID'].apply(lambda x: x.split(':')[1])


In [2]:
import json
import requests as req
disease_list = disease_gene_data['Disease Name'].unique()

def get_mesh_id(disease):
    url = 'id.nlm.nih.gov/mesh'
    res = req.get(f'http://{url}/lookup/descriptor?label={disease}')
    try:
        mesh_id = json.loads(res.text)[0]['resource'].split('/')[-1]
    except:
        mesh_id = None
    return mesh_id

len(disease_list)


519

In [3]:
disease_id = 'D004314' 
DF_network = nx.from_pandas_edgelist(disease_func_data, '# MESH_ID', 'GO_ID')
GF_network = nx.from_pandas_edgelist(gene_func_data, '# GO_ID', 'Gene')

disease_func_list = list(DF_network.neighbors(disease_id))
func_gene_list = []
for func in disease_func_list:
    func = 'GO:' + func
    try:
        func_gene_list += list(GF_network.neighbors(func))
    except:
        pass

func_gene_list = list(set(func_gene_list))
len(func_gene_list)

333

In [4]:
gene_assoc_data = pd.read_csv('gene_associations.tsv', sep='\t')
gene_assoc_data = gene_assoc_data.iloc[:, [0, 1]]
mapping = {}
for row in gene_assoc_data.iterrows():
    mapping[row[1][0]] = row[1][1]

In [5]:
disease_gene_data['Gene_Name'] = disease_gene_data['Gene ID'].apply(lambda x: mapping[x] if x in mapping else None)
disease_gene_data = disease_gene_data.dropna()
# disease_gene_data.to_csv('DG-AssocMiner_miner-disease-gene-edited.tsv', sep='\t')

In [6]:
def get_GO_IDs(disease):
    GO_list = list(DF_network.neighbors(disease))
    return GO_list

def get_genes(GO_ID):
    func = 'GO:' + GO_ID
    func_gene_list = []
    try:
        func_gene_list += list(GF_network.neighbors(func))
    except:
        pass
    return func_gene_list


In [7]:
GO_list = get_GO_IDs(disease_id)
gene_list = []
for entry in GO_list:
    gene = get_genes(entry)
    gene_list += gene

gene_list = list(set(gene_list))


In [9]:
eps = 0.01
DG_network = nx.from_pandas_edgelist(disease_gene_data, 'Disease Name', 'Gene_Name')
centralities = {}
for gene in gene_list:
    try:
        centralities[gene] = nx.degree_centrality(DG_network)[gene]
    except:
        pass
#  sort according to centrality
sorted_centralities = sorted(centralities.items(), key=lambda x: x[1], reverse=True)
inv_cmb_cand = {}
diseases = nx.bipartite.sets(DG_network)[0]
init_cent = nx.degree_centrality(DG_network)
for i in range(len(sorted_centralities)):
    # print(sorted_centralities[i][0])
    try:
        DG_network.remove_node(sorted_centralities[i][0])
    except:
        continue
new_cent = nx.degree_centrality(DG_network)
# print(new_cent)
for disease in diseases:
    relative_change = abs(new_cent[disease] - init_cent[disease]) / init_cent[disease]
    # print(relative_change)
    if relative_change > eps:
        inv_cmb_cand[disease] = relative_change
init_cent = new_cent

display(inv_cmb_cand)
print(len(inv_cmb_cand))

{'Precursor T-Cell Lymphoblastic Leukemia-Lymphoma': 0.0256375474769399,
 "Sjogren's Syndrome": 0.05983224814613845,
 'Body Weight': 0.04762227734284149,
 'Hepatitis, Chronic': 0.1608420066097765,
 'Amnesia': 0.09502569340270009,
 'Berylliosis': 0.025637547476939818,
 'Learning Disorders': 0.08832218002049784,
 'Adenocarcinoma': 0.05033560418801859,
 'Proteinuria': 0.05533383785018706,
 'Neoplasms, Experimental': 0.025637547476939797,
 'Non-alcoholic Fatty Liver Disease': 0.0769262072707542,
 'Hereditary Breast and Ovarian Cancer Syndrome': 0.025637547476939915,
 'Dystonia': 0.025637547476939915,
 'Necrotizing Enterocolitis': 0.15535731384252008,
 'Lobar Holoprosencephaly': 0.04762227734284149,
 'Juvenile Myelomonocytic Leukemia': 0.02834337607447813,
 'ovarian neoplasm': 0.05641345632121543,
 'Hydrops Fetalis, Non-Immune': 0.025637547476939887,
 'Transient Ischemic Attack': 0.17948996201844816,
 'Male infertility': 0.13390607101947308,
 'Neutropenia': 0.211048040402354,
 'Adenoma': 0.

488
