# Create dataset for directed classification

Script used to create a dataset with select attributes

## Read in z scores and format

In [1]:
### Read in z_scores and format ###
z_scores = read.csv("data_RNA_Seq_mRNA_median_all_sample_Zscores.txt", sep = "\t", stringsAsFactors = FALSE, header = TRUE)

map = z_scores[,c(1,2)]

inst_names = colnames(z_scores)[3:length(colnames(z_scores))]

z_scores = z_scores[,-2]
z_scores = as.data.frame(t(z_scores), stringsAsFactors = FALSE)
names(z_scores) = map[,1]
z_scores = z_scores[-1,]

z_scores = as.data.frame(apply(z_scores, 2, as.numeric))

name_fix = c()
for(name in inst_names){name_fix = c(name_fix,substr(name,11,nchar(name)-3))}


row.names(z_scores) = name_fix


## Read in Clinical Data and format

In [2]:
## Read in Clinical Data ###
patient = as.data.frame(read.csv("data_clinical_patient.txt", sep = "\t", stringsAsFactors = FALSE, header = TRUE, skip = 4))
p_nam = patient[,1]

p_nam_fix = c()
for(name in p_nam){p_nam_fix = c(p_nam_fix,substr(name,11,nchar(name)))}

patient = patient[,2:ncol(patient)]

row.names(patient) = p_nam_fix

## Merge Sets

Isolate Status from Clinical Data and merge with z scores

In [3]:
status = data.frame(patient[,"OS_STATUS"])
colnames(status) = "STATUS"
rownames(status) = p_nam_fix

mg = merge.data.frame(status,z_scores, by="row.names")

## Determine attributes to add to dataset
fn used to determine if attribute is in dataset: 

In [4]:
check_attrib = function(df, attrib){ #Script to determine if an attribute is in the dataframe
  ret = attrib %in% colnames(df)
  return(ret)
}

Use this cell to pick attributes. Prints attributes that are NOT contained in the set (move on when print is empty) 

In [5]:
attributes = c("NTRK1", "MYCN", "MDM2", "ALK", "CHD5", "CADM1",
               "CD44", "CD-133", "KIT", "NTRK2", "DLK1","STATUS")


attributes[!check_attrib(mg, attributes)]# return attributes not included in set

Return a dataset with the attributes given in the cell above. Write to csv

In [6]:
select_attributes = attributes[check_attrib(mg,attributes)]

dr = mg[mg$STATUS!="",select_attributes]# Don't return individuals with empty status

write.csv(dr, "dir_class.csv")