# GSE19804 - Notebook Version 1.0
This is a series specific file that makes modifications to the GSE19804 matrix imported through FIT, exporting only the data needed for FaST processing.  See [GSE19804](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19804) for more information on series. This program is written in R.

From the series data:
>RNA was extracted from paired tumor and normal tissues for gene expression analysis.

A total of **120** subjects with **60** cancer and **60** normal classes.

### Get/Create Directories
Assumes this notebook is in `GenClass-Stability/importTools/notebooks/`

In [28]:
notebook_dir <- getwd() # get the working directory
main_dir <- dirname(dirname(notebook_dir)) # get two levels up
gse_dir = file.path(main_dir,"GSE","GSE19804")

In [29]:
setwd(gse_dir)

### Import Matrix
Assumes this notebook is in `GenClass-Stability/importTools/notebooks/` and GSE data from SIT is stored in `GenClass-Stability/GSE`.

In [30]:
matrix <- read.table("filteredRMA.txt",header=TRUE,row.names=1)

### Modify Matrix
Remove `Lung.` and everything after second `.` from column names.

Note: this remaining code is specific to the GSE and the data that you want to test.  However, the format for classes and expressions should always be the same.

In [31]:
classes <- gsub("Lung.", "", colnames(matrix))
classes = gsub("\\..*","",classes)

Modify the column names with the classes.

In [32]:
colnames(matrix) <- classes

Get select gene expressions.

In [33]:
patterns <- c("Cancer", "Normal")
expressions = matrix[ , grepl( paste(patterns, collapse="|") , names( matrix ) ) ]

### Write Classes
First remove extra numbers added by datafram

In [34]:
classes = gsub("\\..*","",colnames(expressions))
classes = as.matrix(classes)
write.table(classes,file.path(gse_dir,"classes.txt"),sep = "\t", quote = FALSE, row.names=FALSE, col.names=FALSE)

### Write Expressions

In [35]:
expressions = t(expressions)
write.table(expressions,file.path(gse_dir,"exprs.txt"),sep = "\t", row.names=FALSE, col.names=FALSE)