# Infer Dark GPCR Function from Clustering
This notebook will infer 'dark' GPCR function based on hierarchical clustering with kinases that have known functions. We will cluster GCPRs based on their expression in the CCLE. We will begin by importing required libraries and defining a few functions for later use. 

In [7]:
import pandas as pd
from clustergrammer_widget import *
from scipy.spatial.distance import pdist, squareform
from copy import deepcopy
net = Network()

In [8]:
gene_info = net.load_json_to_dict('../grant_pois/gene_info_with_dark.json')
net.load_file('../hzome_data/my_CCLE_exp.txt')
ccle = net.export_df()

In [9]:
def add_cats(gene_class, genes):
    ''' Add categories to list of genes '''    
    gene_title = class_titles[gene_class]
    genes_cat = []
    for inst_gene in genes:
        inst_tuple = ()
        inst_name = gene_title + ': ' + inst_gene
        if inst_gene in gene_info[gene_class]['dark']:
          inst_cat = 'Dark Gene: True'
        else:
          inst_cat = 'Dark Gene: False'
        inst_tuple = (inst_name, inst_cat)
        genes_cat.append( inst_tuple )
    return genes_cat

def filter_genes(ccle, gene_class):
    ''' Filter DataFrame to selected gene class '''
    ccle = ccle.transpose()
    all_genes = ccle.columns.tolist()
    all_gene_class = gene_info[gene_class]['all']
    found_genes = sorted(list(set(all_genes).intersection(all_gene_class)))
    ccle_filt = ccle[found_genes]
    ccle_filt = ccle_filt.transpose()
    return ccle_filt

class_titles = {}
class_titles['KIN'] = 'Kinases'
class_titles['IC'] = 'Ion Channels'
class_titles['GPCR'] = 'GPCRs'

### Calculate GPCR Similarity Matrix based on CCLE Expression
We will first filter the CCLE gene expression data for GPCRs. Then we will calculate a GPCR-GPCR similarity matrix based on CCLE expression and visualizze it using Clustergrammer. 

In [10]:
ccle_gpcr = filter_genes(ccle, 'GPCR')
net.load_df(ccle_gpcr)

Calculate Similarity Matrix

In [11]:
net.normalize(axis='row', norm_type='zscore', keep_orig=False)
ccle_gpcr = net.export_df()

In [12]:
inst_dm = pdist(ccle_gpcr, metric='cosine')
inst_dm = squareform(inst_dm)
inst_dm = 1 - inst_dm
sim_cutoff = 0.15
inst_dm[ abs(inst_dm) < sim_cutoff] = 0

In [13]:
genes = ccle_gpcr.index.tolist()
genes_cat = add_cats('GPCR', genes)
df_dm = pd.DataFrame(data=inst_dm, columns=genes_cat, index=genes_cat)
df_dm_copy = deepcopy(df_dm)
net.load_df(df_dm)
net.make_clust(views=[])

## Visualize GPCR Similarity Matrix based on CCLE Expression

In [14]:
clustergrammer_widget(network=net.widget())

We see about four well-defined clusters (at level 4 of the dendrogram) of GPCR with similar expresssion across the CCLE. I have included Enrichr links for each of the clusters. 

### Enrichr Links for Four Clusters identified from GPCR-CCLE Clustering
Level 4 Groups
* Cluster 1: https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1c2hg
* Cluster 2: https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1c2hh
* Cluster 3: https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1c2hi
* Cluster 4: https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1c2hj

Enrichment for terms from the KEGG 2016 and Wikipathways 2016 libraries appeared to show a distinct function for the third cluster.


### Cluster 3: Immune Pathway Enrichment
Cluster-3 showed enrichment for immune related pathways (e.g. chemokine-cytokine signaling) using the KEGG and Panther libraries: see Cluster-3 link above and enrichment results below: 

![GPCR_CCLE_cluster_3_KEGG_chemokine](GPCR_CCLE_cluster_3_KEGG_chemokine.png)
![GPCR_CCLE_cluster_3_Panther_chemokine](GPCR_CCLE_cluster_3_Panther_chemokine.png)

Below, we will look more closely into this cluster.

# Cluster 3: Immune Pathway
Here, we are going to focus on Cluster-3, which is enriched for immune related pathways. Below we have extracted the names of these GPCRs from Cluster-3 (using the interactive dendrogram):

In [16]:
clust_names = ["ADORA2A", "C3AR1", "CCR1", "CCR10", "CCR2", "CCR3", "CCR5", "CCR6", "CCR7", "CCR9", "CNR1", "CX3CR1", "CXCR2", "CXCR4", "CYSLTR1", "GPR132", "GPR15", "GPR171", "GPR174", "GPR18", "GPR183", "GPR34", "GPR63", "GPR65", "GPR82", "GPR83", "GPR84", "GRM3", "LPAR4", "LTB4R", "OR8B2", "P2RY10", "P2RY12", "P2RY13", "P2RY8", "PTGER4", "RRH", "S1PR4", "SUCNR1", "TAS2R10", "TAS2R14", "TSHR"]
clust_names_cat = add_cats('GPCR', clust_names)

df_dm = deepcopy(df_dm_copy)

df_dm = df_dm[clust_names_cat]
df_dm = df_dm.transpose()
df_dm = df_dm[clust_names_cat]

net.load_df(df_dm)
net.make_clust(views=[])

## Cluster 3: Immune Pathway Similarity Matrix

In [17]:
clustergrammer_widget(network=net.widget())

Cluster-3 above includes several dark GPCRs (shown with the orange category) and based on co-expression and enrichment analysis we can conclude that these dark genes may be involved in addiction. 

### Get expression data
Here we will show the specific cell lines and tissue-types that these genes are commonly highly or lowly expresssed. Below we show the 100 cell lines with the highest or lowest expression of these genes. 

In [28]:
ccle_gpcr = ccle_gpcr.transpose()
ccle_gpcr_cluster = ccle_gpcr[clust_names]
ccle_gpcr = ccle_gpcr.transpose()
ccle_gpcr_cluster = ccle_gpcr_cluster.transpose()

In [29]:
tmp_rows = ccle_gpcr_cluster.index.tolist()
tmp_rows = add_cats(gene_class='GPCR', genes=tmp_rows)
ccle_gpcr_cluster.index = tmp_rows

net.load_df(ccle_gpcr_cluster)
net.filter_N_top('col', 200, rank_type='sum')
net.make_clust()

## Cluster 3: Addiction Pathway Expression Data

In [30]:
clustergrammer_widget(network=net.widget())

We see that the genes in this cluster are highly expressed in haematopoietic and lymphoid tissues, which agrees with the pathway enrichment analysis results above. We can also see that cell lines cluster according to their histology: lymphoid neoplasms cluster into two separate groups and haematopoietic neoplasms cluster into their own group. Genes also form three large clusters with different expression across histology. 

From this can infer that these dark GPCRs are involved in the immune system. For instance the dark GPCR SUCNR1 clusters closely with CCR1 and CCR2 which are known members of the chemokine receptor family. 