# Objective

The objective of this analysis is verify the conclusion obtained in our previous analysis using the stockel dataset.

# Steps

1. We'll use the microarray expression dataset obtained by Toepel et al; link: https://jb.asm.org/content/190/11/3904
2. We'll select the genes which are part of any two-component system and the circadian clock genes using the genomic annotations of Cyanothece obtained from the studies of Welsh et al; link: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2567498/
3. We'll use scikit-learn's mutual information regression to calculate the correlation between the primary and secondary components and also between the clock genes.
4. Using a list of probable sensor-regulator pairs sorted by the mutual information value between them, we can predict which sensor is mostly likely to interact with a regulator and vice versa and also with the clock. 

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import scipy as sp
import numpy as np
import multiprocessing as mp
from sklearn.feature_selection import mutual_info_regression

# Data Preprocessing

In [2]:
df_toepel = pd.read_csv('MicroarrayData/ToepelProcessed.csv')

In [4]:
df_toepel.head()

Unnamed: 0,Contig,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10
0,Contig0.10_10_14370_15446,cce_4187,0.119984,0.574008,-0.497268,-0.946784,-0.912688,-0.555053,0.389711,0.755742,-0.552432,-0.62071,0.161136,0.618904
1,Contig0.10_11_16336_15422,cce_4186,-0.006948,0.377326,0.281436,0.096787,0.101432,-0.168668,0.030779,0.289256,0.232399,-0.285271,0.064953,0.00184
2,Contig0.10_12_16503_17552,cce_4185,0.334171,-0.366012,0.161679,-0.743794,-0.841029,-0.762609,0.36821,-0.270551,0.121346,0.252908,0.181769,0.212853
3,Contig0.10_13_17679_17981,cce_4184,0.418948,-0.146916,-0.60593,-0.099442,-0.115359,0.13087,0.194944,-0.082366,-0.290507,0.052271,0.094243,0.111787
4,Contig0.10_14_18000_18698,cce_4183,0.32485,0.036866,-0.001216,-0.144987,0.029886,0.111226,0.162255,-0.062873,0.064338,0.09079,0.173198,0.289154


A quick look at the above dataframe will show that there are repeats in the ORF column

In [6]:
len(df_toepel.ORF)==len(set(df_toepel.ORF))

False

We need to find the ORFs that are duplicated.

In [7]:
duplicates = df_toepel.loc[df_toepel.duplicated(subset='ORF',keep=False)].sort_values(by=['ORF'])

In [10]:
duplicates.head()

Unnamed: 0,Contig,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10
4089,Contig62.1_1_1274_207,cce_0015,0.358243,-0.130405,-0.188194,-0.260832,0.128074,0.250259,0.330951,-0.166504,0.013392,0.032084,-0.062092,-0.097793
3925,Contig6.1_2_481_873,cce_0015,0.395011,-0.25826,-0.303179,-0.419339,-0.122089,0.180624,0.424158,-0.302026,-0.04141,0.037448,-0.164989,-0.173628
4098,Contig62.2_1_424_44,cce_0027,-0.032668,0.067178,0.19846,-0.063797,-0.13241,0.091539,-0.109167,0.014529,0.125467,-0.069524,0.183824,0.212867
4088,Contig62.1_14_11081_10533,cce_0027,-0.069485,0.150048,0.193104,-0.299913,-0.566255,0.079567,-0.139078,-0.138654,0.194732,-0.063706,0.163336,0.179056
4698,Contig80.1_14_16018_15353,cce_0067,-0.078798,-0.036012,-0.018563,0.115044,0.136906,0.086466,-0.023025,-0.015671,-0.02026,-0.037982,0.069587,0.027299


We'll will take the mean of the expression values of the duplicates.

In [11]:
mean_columns = list(duplicates.columns)
mean_columns.remove('Contig')
mean_columns.remove('ORF')

In [12]:
df_expression = df_toepel.groupby('ORF')[mean_columns].mean().reset_index()

Now we download the Cyanothece Database with genomic annotations. 

In [13]:
GenCyanoDB = pd.read_excel('GenCyanoDB.xlsx',index_col=0,usecols=[0,1,2,3])

In [14]:
GenCyanoDB.head()

Unnamed: 0,ORF,Function,CommonName
0,cce_0001,hypothetical protein,cce_0001
1,cce_0002,alcohol dehydrogenase,cce_0002
2,cce_0003,hypothetical protein,cce_0003
3,cce_0004,cation efflux system membrane protein,czcA
4,cce_0005,conserved hypothetical protein,cce_0005


First, let's find how many genes have the keyword regulator, two-component, kinase, sensor or circadian in their functional annotation. 

In [15]:
twoComponents = GenCyanoDB[GenCyanoDB['Function'].str.contains("two-component|kinase|regulator|sensor|circadian")]

In [16]:
len(twoComponents),len(twoComponents.loc[twoComponents.ORF.isin(df_toepel.ORF)])

(211, 204)

It's clear from above that we do not have expression profiles for 7 genes. Let's find which ones.

In [18]:
twoComponents.loc[~twoComponents.ORF.isin(df_toepel.ORF)]

Unnamed: 0,ORF,Function,CommonName
11,cce_0012,two-component response regulator,cce_0012
798,cce_0800,acetate kinase,ackA1
4029,cce_4034,adenylate kinase,adk
4070,cce_4075,putative ATP-NAD/AcoX kinase,cce_4075
4586,cce_4591,guanylate kinase,gmk
4654,cce_4659,two-component response regulator,cce_4659
4710,cce_4715,putative circadian clock protein,kaiB2


We do not have the same 7 genes expression data here as well. 

Next we need to merge the 2 dataframes.

In [19]:
twoComponentsExp = df_expression.merge(twoComponents,on='ORF',how='inner')

First, we will check the length of the dataframe.

In [21]:
assert len(twoComponentsExp)==len(twoComponents.loc[twoComponents.ORF.isin(df_toepel.ORF)])

In [22]:
twoComponentsExp.head()

Unnamed: 0,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10,Function,CommonName
0,cce_0016,0.020815,0.02528,0.0359,0.010091,0.106032,0.021714,-0.047783,0.005781,0.025963,-0.010045,0.080937,0.070583,two-component sensor histidine kinase,cce_0016
1,cce_0115,0.297869,0.266111,0.055197,-0.602793,-0.45415,-0.128624,0.084665,0.146252,0.081213,-0.313614,-0.080785,0.040876,response regulator,cce_0115
2,cce_0123,-0.022049,-0.025439,-0.181873,0.103217,0.331547,0.159714,-0.03617,-0.051452,-0.372445,0.032771,0.074082,0.339362,thiamine monophosphate kinase,thiL
3,cce_0145,0.290729,-0.048554,-0.441629,-0.067762,-0.256788,0.030223,0.144654,-0.331048,-0.321781,-0.254037,-0.458068,-0.559191,putative circadian clock protein,kaiB4
4,cce_0164,0.192083,0.08388,-0.161234,-0.223011,-0.179374,0.195019,0.058844,-0.029044,0.066837,-0.054009,-0.024873,-0.001319,two-component sensor histidine kinase,cce_0164


# Mutual Information Calculation

Next we need to develop a ranked list of the interactions between the genes of the twoCompenentsExp dataframe with each other. We hope to find a primary component with a high correlation to a secondary component and vice versa. That will help us to conclude that they are a part of the two-component regulation system. As a correlation metric, we will use mutual information since it can capture non-linear interactions. 

In [23]:
class Interaction:
    def __init__(self,Exp_data,gene='all',mi_thresh=0):
        self.Exp_data = Exp_data
        if self.Exp_data.isnull().values.any():
            self.Exp_df = self.Exp_data.iloc[:,:-2].set_index('ORF').interpolate(method='linear',axis=1,limit_direction='both').T
        else:
            self.Exp_df = self.Exp_data.iloc[:,:-2].set_index('ORF').T
        if gene=='all':
            self.mi_dict = self._get_dict()
        else:
            self.gene_orf = gene
            self.mi_list = self._miscorelist(self.gene_orf)
            self.mi_thresh = mi_thresh
            self.df = self._get_df(self.mi_list,self.mi_thresh)
           
    
    def _get_dict(self):
        all_genes = list(self.Exp_df.columns)
        pool = mp.Pool(mp.cpu_count())
        results = pool.map(self._miscorelist,all_genes)
        fast_dict= dict(zip(all_genes,results))
        return fast_dict

    
    def _miscorelist(self,gene):
        all_other_genes_df = self.Exp_df.loc[:,self.Exp_df.columns!=gene]
        all_other_genes = np.array(all_other_genes_df.columns)
        this_gene_df = self.Exp_df[gene]
        mi_score = mutual_info_regression(all_other_genes_df,this_gene_df,discrete_features=False,random_state=7)
        miscore_genes = list(zip(all_other_genes,mi_score))
        sorted_miscore = sorted(miscore_genes,key = lambda x:x[1],reverse=True)
        return sorted_miscore
    
    def _get_df(self,mi_list,mi_thresh):
        my_dict = {'orf':[],'function':[],'CommonName':[],'mi':[]}
        for orf,mi in mi_list:
            if mi<=mi_thresh:
                break

            my_dict['orf'].append(orf)
            my_dict['function'].append(twoComponentsExp.loc[twoComponentsExp.ORF==orf].Function.values[0])
            my_dict['CommonName'].append(twoComponentsExp.loc[twoComponentsExp.ORF==orf].CommonName.values[0])
            my_dict['mi'].append(mi)

        return pd.DataFrame(my_dict)
    
    def get_twoComponentHybrids(self):
        return self.df.loc[self.df.function.str.contains('two-component') & self.df.function.str.contains('hybrid')]
    
    def get_twoComponentSensors(self):
        return self.df.loc[self.df.function.str.contains('two-component') & self.df.function.str.contains('sensor') & ~self.df.function.str.contains('hybrid')]
    
    def get_twoComponentRegulators(self):
        return self.df.loc[self.df.function.str.contains('two-component') & self.df.function.str.contains('regulator') & ~self.df.function.str.contains('hybrid')]

    def get_other_clock(self):
        return self.df.loc[self.df.function.str.contains('clock protein')]

In [24]:
mi = Interaction(twoComponentsExp)

# Cyanothece clock genes interaction

In [25]:
clock_genes = {i:GenCyanoDB.loc[GenCyanoDB.CommonName==i].ORF.values[0] for i in ['kaiA','kaiB1','kaiB3',
                                                                                  'kaiB4','kaiC1','kaiC2']}

## KaiA

In [26]:
kA = Interaction(twoComponentsExp,clock_genes['kaiA'],mi_thresh=0)

In [27]:
kA.get_other_clock()

Unnamed: 0,orf,function,CommonName,mi
51,cce_4716,circadian clock protein,kaiC2,0.072255
53,cce_0145,putative circadian clock protein,kaiB4,0.07034
87,cce_0422,circadian clock protein,kaiC1,0.009392


## KaiB1

In [28]:
kB1 = Interaction(twoComponentsExp,clock_genes['kaiB1'],mi_thresh=0)

In [29]:
kB1.get_other_clock()

Unnamed: 0,orf,function,CommonName,mi
80,cce_0422,circadian clock protein,kaiC1,0.092596
119,cce_0435,circadian clock protein,kaiB3,0.030393


## KaiB3

In [30]:
kB3 = Interaction(twoComponentsExp,clock_genes['kaiB3'],mi_thresh=0)

In [31]:
kB3.get_other_clock()

Unnamed: 0,orf,function,CommonName,mi
28,cce_0145,putative circadian clock protein,kaiB4,0.256484
35,cce_4716,circadian clock protein,kaiC2,0.219183
92,cce_0423,circadian clock protein,kaiB1,0.030393


## KaiB4

In [32]:
kB4 = Interaction(twoComponentsExp,clock_genes['kaiB4'],mi_thresh=0)

In [33]:
kB4.get_other_clock()

Unnamed: 0,orf,function,CommonName,mi
7,cce_0422,circadian clock protein,kaiC1,0.258369
8,cce_0435,circadian clock protein,kaiB3,0.256484
71,cce_0424,circadian clock protein,kaiA,0.07034


## KaiC1

In [34]:
kC1 = Interaction(twoComponentsExp,clock_genes['kaiC1'],mi_thresh=0)

In [35]:
kC1.get_other_clock()

Unnamed: 0,orf,function,CommonName,mi
45,cce_0145,putative circadian clock protein,kaiB4,0.258369
87,cce_0423,circadian clock protein,kaiB1,0.092596
121,cce_4716,circadian clock protein,kaiC2,0.018389
129,cce_0424,circadian clock protein,kaiA,0.009392


## KaiC2

In [36]:
kC2 = Interaction(twoComponentsExp,clock_genes['kaiC2'],mi_thresh=0)

In [37]:
kC2.get_other_clock()

Unnamed: 0,orf,function,CommonName,mi
15,cce_0435,circadian clock protein,kaiB3,0.219183
44,cce_0424,circadian clock protein,kaiA,0.072255
81,cce_0422,circadian clock protein,kaiC1,0.018389


# Finding the sensors and regulators that interact with any clock gene.

## Sensors

In [38]:
kA_sensors = kA.get_twoComponentSensors().orf.values
kB1_sensors = kB1.get_twoComponentSensors().orf.values
kB3_sensors = kB3.get_twoComponentSensors().orf.values
kB4_sensors = kB4.get_twoComponentSensors().orf.values
kC1_sensors = kC1.get_twoComponentSensors().orf.values
kC2_sensors = kC2.get_twoComponentSensors().orf.values

In [39]:
all_sensors = np.concatenate((kA_sensors,kB1_sensors,kB3_sensors,kB4_sensors,kC1_sensors,kC2_sensors))

In [40]:
sensor_set = set(all_sensors)
twoComponentsExp.loc[twoComponentsExp.ORF.isin(sensor_set)]

Unnamed: 0,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10,Function,CommonName
0,cce_0016,0.020815,0.02528,0.0359,0.010091,0.106032,0.021714,-0.047783,0.005781,0.025963,-0.010045,0.080937,0.070583,two-component sensor histidine kinase,cce_0016
4,cce_0164,0.192083,0.08388,-0.161234,-0.223011,-0.179374,0.195019,0.058844,-0.029044,0.066837,-0.054009,-0.024873,-0.001319,two-component sensor histidine kinase,cce_0164
7,cce_0220,0.031592,0.05607,-0.256886,0.885688,0.093192,0.106482,-0.177097,-0.164165,-0.237005,0.177112,-0.23706,-0.508654,two-component sensor histidine kinase,cce_0220
8,cce_0257,-0.081631,-0.054697,0.026898,0.14562,0.164885,0.028361,0.064635,0.016028,0.069793,0.060811,0.051079,0.08316,two-component sensor histidine kinase,cce_0257
10,cce_0297,-0.068525,0.123662,0.283411,-0.036259,-0.102682,0.023727,-0.12395,0.038421,0.106656,-0.108214,0.242866,0.25197,two-component sensor histidine kinase,cce_0297
41,cce_0888,-0.07112,0.134621,0.02225,0.171566,0.129925,0.056146,-0.172637,-0.00673,-0.009374,0.000966,0.187232,0.084869,two-component sensor histidine kinase,nblS
44,cce_0969,-0.038093,-0.05336,0.013366,0.326075,0.000128,0.071643,0.10019,0.046618,0.04886,0.158621,-0.050152,0.099762,two-component sensor histidine kinase,cce_0969
56,cce_1280,0.232973,-0.118961,-0.019373,0.133719,0.065886,0.000379,0.242465,-0.053988,-0.115554,0.125421,-0.092709,-0.076253,two-component sensor histidine kinase,cce_1280
63,cce_1467,-0.079229,-0.000405,0.028495,0.248306,0.192404,0.098452,0.011408,0.115528,-0.065861,-0.064349,-0.038547,0.011425,two-component sensor histidine kinase,cce_1467
65,cce_1519,0.016529,-0.1536,-0.076803,0.238766,0.51286,0.382623,0.034715,0.059536,0.293938,0.277991,-0.114741,-0.314579,two-component sensor histidine kinase,cce_1519


## Regulators

In [41]:
kA_regulators = kA.get_twoComponentRegulators().orf.values
kB1_regulators = kB1.get_twoComponentRegulators().orf.values
kB3_regulators = kB3.get_twoComponentRegulators().orf.values
kB4_regulators = kB4.get_twoComponentRegulators().orf.values
kC1_regulators = kC1.get_twoComponentRegulators().orf.values
kC2_regulators = kC2.get_twoComponentRegulators().orf.values

In [42]:
all_regulators = np.concatenate((kA_regulators,kB1_regulators,kB3_regulators,kB4_regulators,kC1_regulators,kC2_regulators))

In [43]:
regulator_set = set(all_regulators)
twoComponentsExp.loc[twoComponentsExp.ORF.isin(regulator_set)]

Unnamed: 0,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10,Function,CommonName
5,cce_0165,0.167916,-0.040029,-0.119393,0.058448,-0.007087,-0.032999,0.119196,0.012788,-0.101795,-0.000547,-0.077687,-0.137936,two-component response regulator,cce_0165
9,cce_0289,0.42769,-0.633335,-0.543692,-0.453485,0.020178,-0.0058,0.522106,-0.732386,-0.508574,0.389071,-0.167385,-0.367987,two-component response regulator,cce_0289
11,cce_0298,-0.859906,0.302983,0.888468,-2.067169,-1.23346,-1.10627,-0.604962,0.062011,0.47738,-1.001964,0.030049,0.222085,two-component response regulator,rpaA
17,cce_0446,0.088962,-0.539088,-0.440255,-0.055116,0.656495,0.482188,0.159705,-0.335296,0.075567,0.222504,-0.3872,-1.171355,two-component response regulator,cce_0446
26,cce_0657,0.209038,-0.029466,-0.017365,-0.044108,-0.001327,0.069888,0.244357,0.041029,0.018494,0.208102,0.029098,0.220373,two-component response regulator,cce_0657
28,cce_0678,0.729695,-0.004934,-0.347867,-1.221355,-0.678034,-0.012713,0.576448,-0.210719,-0.430487,-0.013558,-0.146894,-0.110686,two-component response regulator,cce_0678
32,cce_0712,-0.263031,-0.347852,-0.134602,0.015084,0.996096,0.227458,0.011033,-0.294074,-0.027673,0.178137,-0.178267,-0.323043,two-component response regulator,cce_0712
33,cce_0713,0.40452,-0.565923,-0.445753,-0.552652,0.577719,-0.124507,0.747937,-0.199326,-0.618193,0.33207,-0.591838,-0.495129,two-component response regulator,cce_0713
37,cce_0754,0.688477,-0.608896,-0.380871,-0.712311,-0.624534,-0.486799,0.767455,-0.461739,-0.434447,0.762226,-0.358481,-0.089696,two-component response regulator,cce_0754
45,cce_0970,-0.174724,-0.706983,-0.519383,0.07776,0.934947,0.62377,0.072982,-0.568899,0.009641,0.223051,-0.345968,-1.105309,two-component transcription regulator,cce_0970


# Finding the most common sensors and regulators that interact with all the clock genes

## Sensors

In [44]:
sensors = [kA_sensors,kB1_sensors,kB3_sensors,kB4_sensors,kC1_sensors,kC2_sensors]

def most_common_elements(given_set):
    main_set = set(given_set[0])

    for sarray in given_set[1:]:
        sset = set(sarray)
        main_set.intersection_update(sset)
    return main_set

In [45]:
twoComponentsExp.loc[twoComponentsExp.ORF.isin(most_common_elements(sensors))]

Unnamed: 0,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10,Function,CommonName


## Regulators

In [46]:
regulators = [kA_regulators,kB1_regulators,kB3_regulators,kB4_regulators,kC1_regulators,kC2_regulators]

In [47]:
twoComponentsExp.loc[twoComponentsExp.ORF.isin(most_common_elements(regulators))]

Unnamed: 0,ORF,Day1_L2,Day1_L6,Day1_L10,Day1_D2,Day1_D6,Day1_D10,Day2_LL2,Day2_LL6,Day2_LL10,Day2_LD2,Day2_LD6,Day2_LD10,Function,CommonName
28,cce_0678,0.729695,-0.004934,-0.347867,-1.221355,-0.678034,-0.012713,0.576448,-0.210719,-0.430487,-0.013558,-0.146894,-0.110686,two-component response regulator,cce_0678


# Finding the kaiA,kaiB,kaiC combination that interacts with the maximum number of sensors and regulators

In [48]:
from itertools import product
clock_gene_copies = list(clock_genes.keys())
kaiA_copies = clock_gene_copies[0:1]
kaiB_copies = clock_gene_copies[1:4]
kaiC_copies = clock_gene_copies[4:]
print(list(product(kaiA_copies,kaiB_copies,kaiC_copies)))

[('kaiA', 'kaiB1', 'kaiC1'), ('kaiA', 'kaiB1', 'kaiC2'), ('kaiA', 'kaiB3', 'kaiC1'), ('kaiA', 'kaiB3', 'kaiC2'), ('kaiA', 'kaiB4', 'kaiC1'), ('kaiA', 'kaiB4', 'kaiC2')]


In [49]:
clock_gene_sensors = dict(zip(clock_gene_copies,sensors))
clock_gene_regulators = dict(zip(clock_gene_copies,regulators))

number_dict = {}

for a,b,c in product(kaiA_copies,kaiB_copies,kaiC_copies):
    sensors_list = [clock_gene_sensors[i] for i in (a,b,c)]
    regulators_list = [clock_gene_regulators[i] for i in (a,b,c)]
    sensor_len = len(twoComponentsExp.loc[twoComponentsExp.ORF.isin(most_common_elements(sensors_list))])
    regulator_len = len(twoComponentsExp.loc[twoComponentsExp.ORF.isin(most_common_elements(regulators_list))])
    total_len = sensor_len+regulator_len
    number_dict[(a,b,c)] = [sensor_len,regulator_len,total_len]
    
{k:v for k,v in sorted(number_dict.items(),key=lambda x: x[1][2],reverse=True)}

{('kaiA', 'kaiB1', 'kaiC2'): [5, 5, 10],
 ('kaiA', 'kaiB4', 'kaiC1'): [3, 6, 9],
 ('kaiA', 'kaiB3', 'kaiC2'): [3, 5, 8],
 ('kaiA', 'kaiB1', 'kaiC1'): [2, 5, 7],
 ('kaiA', 'kaiB4', 'kaiC2'): [2, 5, 7],
 ('kaiA', 'kaiB3', 'kaiC1'): [2, 4, 6]}

# cce_0678 case study

In [50]:
cce_0678 = Interaction(twoComponentsExp,'cce_0678',mi_thresh=0)
cce_0678.get_twoComponentSensors()

Unnamed: 0,orf,function,CommonName,mi
6,cce_0164,two-component sensor histidine kinase,cce_0164,0.398052
19,cce_4426,two-component sensor histidine kinase,cce_4426,0.24213
25,cce_1280,two-component sensor histidine kinase,cce_1280,0.224904
45,cce_2232,two-component sensor histidine kinase,cce_2232,0.192331
48,cce_0220,two-component sensor histidine kinase,cce_0220,0.188958
55,cce_1983,"probable phytochrome A, two-component sensor p...",aphA,0.171895
56,cce_2546,probable two-component sensor histidine kinase,cce_2546,0.16409
81,cce_1878,two-component sensor histidine kinase,cce_1878,0.128905
83,cce_1535,two-component sensor histidine kinase,cce_1535,0.125863
85,cce_0888,two-component sensor histidine kinase,nblS,0.120969


# rpaA case study

In [51]:
cce_0298 = Interaction(twoComponentsExp,'cce_0298',mi_thresh=0)
cce_0298.get_twoComponentSensors()

Unnamed: 0,orf,function,CommonName,mi
11,cce_4097,two-component sensor serine/threonine kinase,cce_4097,0.523945
17,cce_0220,two-component sensor histidine kinase,cce_0220,0.444084
18,cce_1280,two-component sensor histidine kinase,cce_1280,0.427351
19,cce_0297,two-component sensor histidine kinase,cce_0297,0.425929
27,cce_3379,two-component sensor histidine kinase,cce_3379,0.391207
30,cce_4426,two-component sensor histidine kinase,cce_4426,0.34577
48,cce_3327,two-component sensor histidine kinase,cce_3327,0.229004
64,cce_1519,two-component sensor histidine kinase,cce_1519,0.165512
102,cce_2546,probable two-component sensor histidine kinase,cce_2546,0.086544
103,cce_4204,two-component sensor histidine kinase,cce_4204,0.085351


# sasA case study

In [52]:
cce_1751 = Interaction(twoComponentsExp,'cce_1751',mi_thresh=0)
cce_1751.get_twoComponentRegulators()

Unnamed: 0,orf,function,CommonName,mi
3,cce_1952,two-component response regulator receiver protein,cce_1952,0.46667
11,cce_4002,two-component response regulator,rpaB,0.369447
12,cce_0712,two-component response regulator,cce_0712,0.339289
15,cce_4578,two-component response regulator,cce_4578,0.291141
19,cce_4714,"two-component response regulator, NarL subfamily",cce_4714,0.275003
38,cce_2376,two-component response regulator,cce_2376,0.168753
42,cce_1695,two-component response regulator,cce_1695,0.154762
48,cce_2365,two-component response regulator,cce_2365,0.127119
57,cce_3895,two-component response regulator,cce_3895,0.09706
58,cce_3714,putative two-component system response regulator,cce_3714,0.094877


# rpaB case study

In [54]:
cce_4002 = Interaction(twoComponentsExp,'cce_4002',mi_thresh=0)
cce_4002.df.head(10)

Unnamed: 0,orf,function,CommonName,mi
0,cce_1775,nitrogen regulatory protein P-II,glnB,0.50083
1,cce_3901,putative serine/threonine protein kinase,cce_3901,0.476028
2,cce_4716,circadian clock protein,kaiC2,0.44084
3,cce_3420,pyruvate kinase,pykF1,0.434361
4,cce_1878,two-component sensor histidine kinase,cce_1878,0.426623
5,cce_4714,"two-component response regulator, NarL subfamily",cce_4714,0.384725
6,cce_1751,adaptive-response sensory histidine kinase,sasA,0.369447
7,cce_0921,ribose-phosphate pyrophosphokinase,prsA,0.368488
8,cce_2505,two-component hybrid sensor and regulator,cce_2505,0.368191
9,cce_3723,two-component hybrid sensor and regulator,cce_3723,0.344679


# Verifying Previous Conclusions

1. *cce_1983/aphA might be a photoreceptor that regulates the clock genes and the other TFs.* - Seen here as well.
2. *cce_0888/nblS is another interesting component that interacts with the clock genes. In 7942, it is shown to be a gene involved in photosynthesis related gene expression during high light and nutrient stress.* - Seen here as well.
3. *In the previous literature review study report, cce_0678 was proposed to interact with the RubisCo genes. In this study, it was shown that there is a very high mutual information score between cce_0678 and the probable photoreceptor aphA discussed above. This further highlights it's importance as a regulator.* - Seen here as well.
4. *rpaA and rpaB are equally important in cyanothece signaling network according to this analysis. - Seen here as well.*
5. *sasA may not be the kinase that interacts with rpaA in cyanothece. On the otherhand, rpaB may be the regulator that interacts with sasA.* - Seen here as well. 
6. *The KaiB copies may be present not just to maintain robustness but they may play important roles in the signaling network.* - Seen here as well.

# New Conclusion

1. We are missing some interactions because of the frequency of data collection in each case. We need more frequent data. Otherwise we cannot postulate newer interactions. At this moment our analysis is extremely biased towards those interactions which have already been captured in other cyanobacteria like 7942 or nostoc. 
2. The interaction of the clock genes are not the same and extremely erratic. We need more frequent data to conclude anything about the clock genes. However, it is clear that the clock gene copies are present not just to maintain robustness.