# Infer Dark Ion Channel Function from Clustering
This notebook will infer 'dark' Ion Channel (IC) function based on hierarchical clustering with kinases that have known functions. We will cluster ICs based on their expression in the CCLE. We will begin by importing required libraries and defining a few functions for later use. 

In [12]:
import pandas as pd
from clustergrammer_widget import *
from scipy.spatial.distance import pdist, squareform
net = Network()

In [13]:
gene_info = net.load_json_to_dict('../grant_pois/gene_info_with_dark.json')
net.load_file('../hzome_data/my_CCLE_exp.txt')
ccle = net.export_df()

In [14]:
def add_cats(gene_class, genes):
    ''' Add categories to list of genes '''    
    gene_title = class_titles[gene_class]
    genes_cat = []
    for inst_gene in genes:
        inst_tuple = ()
        inst_name = gene_title + ': ' + inst_gene
        if inst_gene in gene_info[gene_class]['dark']:
          inst_cat = 'Dark Gene: True'
        else:
          inst_cat = 'Dark Gene: False'
        inst_tuple = (inst_name, inst_cat)
        genes_cat.append( inst_tuple )
    return genes_cat

def filter_genes(ccle, gene_class):
    ''' Filter DataFrame to selected gene class '''
    ccle = ccle.transpose()
    all_genes = ccle.columns.tolist()
    all_gene_class = gene_info[gene_class]['all']
    found_genes = sorted(list(set(all_genes).intersection(all_gene_class)))
    ccle_filt = ccle[found_genes]
    ccle_filt = ccle_filt.transpose()
    return ccle_filt

class_titles = {}
class_titles['KIN'] = 'Kinases'
class_titles['IC'] = 'Ion Channels'
class_titles['GPCR'] = 'GPCRs'

### Filter for ICs
We will first filter the CCLE gene expression data for ICs. Then we will calculate a IC-IC similarity matrix based on CCLE expression and visualizze it using Clustergrammer. 

In [15]:
ccle_ic = filter_genes(ccle, 'IC')
net.load_df(ccle_ic)

### Calculate Similarity Matrix

In [16]:
net.normalize(axis='row', norm_type='zscore', keep_orig=False)
ccle_ic = net.export_df()

In [17]:
inst_dm = pdist(ccle_ic, metric='cosine')
inst_dm = squareform(inst_dm)
inst_dm = 1 - inst_dm
sim_cutoff = 0.15
inst_dm[ abs(inst_dm) < sim_cutoff] = 0

### Visualize Similarity Matrix

In [19]:
genes = ccle_ic.index.tolist()
genes_cat = add_cats('IC', genes)
df_dm = pd.DataFrame(data=inst_dm, columns=genes_cat, index=genes_cat)
net.load_df(df_dm)
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

We see about six well-defined clusters of IC with similar expresssion across the CCLE. I have included Enrichr links for each of the six clusters. Enrichment for terms from the KEGG 2016 and Wikipathways 2016 libraries appeared to show distinct functions for some of these clusters. 

## Enrichr Links for Clusters identified from IC-CCLE Clustering
* 1 https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1botj
* 2 https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1botl
* 3 https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1botm
* 4 https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1bq99
* 5 https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1botn
* 6 https://amp.pharm.mssm.edu/Enrichr/enrich?dataset=1boto

### Cluster 1 Cardiac Pathway Enrichment
For instance, Cluster-1 showed enrichment for Cardiac and muscle contraction pathways (see link 1 and image below)

![IC_CCLE_Cluster-1_Pathways_Cardiac](IC_CCLE_cluster_1_pathways_cardiac.png)

Cluster-3 showed enrichment for addiction related pathways (e.g Nicotine addiction, Serotonergic/Dopaminergic synapse function, and Hypothetical Network for Drug addiction, see link 3 and image below). 

![IC_CCLE_Cluster-3_Pathways_Addiction](IC_CCLE_cluster_3_pathways_addiction.png)


We are going to focus on Cluster3 that appears to be associated with addiction. Below we have extracted the names of these ICs from this cluster (using the interactive dendrogram):

In [11]:
clust_names = ["ANO5", "BEST3", "CACNA1A", "CACNA1B", "CACNA1D", "CACNA1E", "CACNA2D1", "CACNA2D2", "CACNB2", "CACNB3", "CHRNA1", "CHRNA3", "CLCN5", "CNGA3", "GABRA5", "GABRB2", "GABRB3", "GABRG2", "GRIA1", "GRIA2", "GRIA4", "GRIK3", "GRIN3A", "HCN1", "HCN3", "HTR3A", "KCNA1", "KCNB1", "KCNB2", "KCNC1", "KCND2", "KCNG3", "KCNH2", "KCNH5", "KCNH6", "KCNH7", "KCNH8", "KCNIP4", "KCNJ11", "KCNJ3", "KCNJ6", "KCNK10", "KCNK3", "KCNT2", "SCN1A", "SCN2A", "SCN3A", "SCN3B", "SCN8A"]
clust_names_cat = add_cats('IC', clust_names)

df_dm = df_dm[clust_names_cat]
df_dm = df_dm.transpose()
df_dm = df_dm[clust_names_cat]

net.load_df(df_dm)
net.make_clust(views=[])
clustergrammer_widget(network=net.widget())

Cluster-3 above includes several dark ICs (shown with the orange category) and based on co-expression and enrichment analysis we can conclude that these dark genes may be involved in addiction. 

# Visualizing Cluster-3 Addiction-Related IC Expression Data
The similarity matrix above shows us that these ICs are co-expressed, but we are probably also interested in the specific cell lines and tissue-types that these genes are commonly highly or lowly expresssed. Below we show the 100 cell lines with the highest or lowest expression of these genes. 

In [34]:
ccle_ic = ccle_ic.transpose()
ccle_ic_cluster = ccle_ic[clust_names]
ccle_ic = ccle_ic.transpose()
ccle_ic_cluster = ccle_ic_cluster.transpose()

In [35]:
tmp_rows = ccle_ic_cluster.index.tolist()
tmp_rows = add_cats(gene_class='IC', genes=tmp_rows)
ccle_ic_cluster.index = tmp_rows

net.load_df(ccle_ic_cluster)
net.filter_N_top('col', 100, rank_type='sum')
net.make_clust()

In [36]:
clustergrammer_widget(network=net.widget())

From the cell line categories (column categories) we can see that these genes are highly expressed mainly in the lung and autonomic ganglia. There are also a few central nervous tissue cancer cell lines which these genes are also higly expressed. This seems to agree broadly with the enrichment analysis results. 

The genes are also expressed at a low level in several cell lines, including lung cell lines. Of the lung cell lines it appears that cell lines with the sub-histology small-cell-carcinoma have high expression of these genes while cell lines with the sub-histology adenocarcinoma have low expression. Small-cell-carcinoma are thought to have some characteristics of neuronal cells ([link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2361510/)), which might explain why they have a high expression of these likely neuronal ICs. 