## Objective: 
### Present the same data as show in ReadMe.pptx file, but in a clear and simple format that is easily presented.

#### 1. Gene Expression Over Specific Time Frames
    - **annotation data**
#### 2. The Mapping of Gene Clusters to Biological Functions
    - **gene cluster description data**
#### 3. Process of Creating Gene Clusters 
    - **expression data**
        1. median gene expression from responder groups
        2. retained genes with log2 fold chang > 2 between any two points
        3. retained genese that correlated with at least two other genes, with correlation > 0.8
        4. cluster genes based on their expression profile with k-means algorithm (k=10)
    - **gene clusters data**

In [10]:
# Import necessary modules
import csv
import pandas as pd
from pandas import DataFrame
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
pandas2ri.activate()

In [25]:
# Import R files data
readRDS = robjects.r['readRDS']
r_rds_df = readRDS('data/JNJ2297.rds')

with localconverter(robjects.default_converter + pandas2ri.converter):
    pd_rdf_df = robjects.conversion.rpy2py(r_rds_df)

pd_rdf_df

0,1
expression,[RTYPES.REALSXP]
annotation,[RTYPES.VECSXP]
gene.clusters,[RTYPES.VECSXP]
gene.cluster.discription,[RTYPES.VECSXP]


In [26]:
# Import expression data
expression_df = pd.read_csv("data/JNJ2297.expression.tsv", sep="\t")
print(expression_df.shape)
print('Each row in the expression table represents a gene and each column represents a sample.')
print('Each instance value represents how much of a gene is found in a sample.')
expression_df.head()

(106, 95)
Each row in the expression table represents a gene and each column represents a sample.
Each instance value represents how much of a gene is found in a sample.


Unnamed: 0,PID.600198,PID.600203,PID.600215,PID.700507,PID.700517,PID.700522,PID.700532,PID.700535,PID.700548,PID.700563,...,PID.701909,PID.701944,PID.701979,PID.701993,PID.701994,PID.702002,PID.702023,PID.702048,PID.702056,PID.702064
Gid.1,-2.981974,-2.600316,-1.329146,-0.782551,-2.466272,-2.262907,-0.109869,-2.017782,-0.804588,-1.398005,...,-1.266799,0.536325,-2.369271,-0.833726,-1.753441,-0.064273,-1.024963,-1.750789,-0.954474,-0.863653
Gid.2,5.091374,6.319749,7.230259,6.24932,4.970128,5.045313,6.631676,5.837953,5.950087,5.76003,...,7.66941,4.717194,5.588064,5.855133,5.724978,5.440656,5.237794,4.863286,6.563976,4.98586
Gid.3,-1.507883,-1.650258,-0.999301,-2.944621,-0.974871,-2.145208,-2.425205,-1.898438,-0.90134,-1.48648,...,-0.713068,-2.12972,-2.332149,0.006302,-0.771941,-1.676827,-2.282308,-2.564208,-1.308401,-1.781719
Gid.4,-3.753588,-3.260971,-2.854492,-2.113714,-1.885751,-3.488628,-2.617938,-2.075582,-3.372451,-2.112743,...,-2.286875,-2.227281,-2.177518,-1.537582,-2.417876,-2.906556,-2.522521,-3.333705,-1.881001,-2.034995
Gid.5,6.493748,8.070056,9.4638,8.495926,6.354211,6.786793,8.870514,7.494251,8.308216,6.677745,...,9.899105,8.291938,7.643939,7.83622,8.736446,8.820776,7.869138,7.853277,9.226687,7.529547


In [27]:
# Import gene clusters data
gene_clusters_df = pd.read_csv("data/JNJ2297.gene.clusters.tsv", sep="\t")
print(gene_clusters_df.shape)
print('This table maps each gene to one of four main clusters: [M.1, M.2, M.3, M.4].')
gene_clusters_df.head()

(106, 1)
This table maps each gene to one of four main clusters: [M.1, M.2, M.3, M.4].


Unnamed: 0,module
Gid.1,M.1
Gid.2,M.2
Gid.3,M.2
Gid.4,M.1
Gid.5,M.2


In [28]:
# Import annotation data
annotation_df = pd.read_csv("data/JNJ2297.annotation.tsv", sep="\t")
print(annotation_df.shape)
print('Each row in the annotation dataframe represents a sample from a specific patient at a specific time point.')
print('Genes are clustered into four clusters [M.1, M.2, M.3, M.4], and the columns are the mean value of those clusters')
annotation_df.head()

(95, 9)
Each row in the annotation dataframe represents a sample from a specific patient at a specific time point.
Genes are clustered into four clusters [M.1, M.2, M.3, M.4], and the columns are the mean value of those clusters


Unnamed: 0,Gender,Age,Height.cm,Weight.kg,M.1,M.2,M.3,M.4,time
1,Female,46.65,,,-2.981974,1.40933,-0.217659,1.772206,9
2,Female,30.33,,,-2.600316,1.359622,0.023615,1.944382,9
3,Female,83.81,149.3,47.6,-2.74058,1.506242,0.162396,2.574374,2
4,Female,43.13,168.0,61.2,-2.113714,0.859283,0.506133,2.652673,5
5,Male,18.13,170.18,86.0,-1.885751,-0.938905,1.797638,6.198975,1


In [29]:
# Import gene cluster description data
gene_cluster_discription_df = pd.read_csv("data/JNJ2297.gene.cluster.discription.tsv", sep="\t")
print(gene_cluster_discription_df.shape)
print('This table maps the gene clusters [M.1, M.2, M.3, M.4] to biological functions.')
print('Each row is a biological function, and include teh probability of the gene mapping.')
gene_cluster_discription_df.head()

(36, 11)
This table maps the gene clusters [M.1, M.2, M.3, M.4] to biological functions.
Each row is a biological function, and include teh probability of the gene mapping.


Unnamed: 0,Cluster,module,ID,Description,GeneRatio,BgRatio,pvalue,p.adjust,qvalue,geneID,Count
1,M.1,M,R-HSA-8957275,Post-translational protein phosphorylation,2/3,108/10554,0.000309,0.009528,0.001962,2335/255738,2
2,M.1,M,R-HSA-381426,Regulation of Insulin-like Growth Factor (IGF)...,2/3,125/10554,0.000414,0.009528,0.001962,2335/255738,2
3,M.1,M,R-HSA-8866427,VLDLR internalisation and degradation,1/3,12/10554,0.003407,0.029671,0.006111,255738,1
4,M.1,M,R-HSA-354194,GRB2:SOS provides linkage to MAPK signaling fo...,1/3,15/10554,0.004258,0.029671,0.006111,2335,1
5,M.1,M,R-HSA-372708,p130Cas linkage to MAPK signaling for integrins,1/3,15/10554,0.004258,0.029671,0.006111,2335,1
