# Extract Genes from GSEA Output

## How to use this notebook:
Add files from GSEA output and counts to directory. Define filenames and only_core in the next cell. Run all cells in notebook to produce dimentionality reduced counts file.

In [28]:
GSEA_output_filename = "ASGHARZADEH_NEUROBLASTOMA_POOR_SURVIVAL_DN.tsv"
count_filename = "data_expression_median.txt"
output_count_reduced_filename = "nb_poor_survival.csv"
only_core = TRUE #set to false if you want to include all genes in set and not just core enrichment

In [29]:
GE_output = read.csv(GSEA_output_filename, sep = "\t")

GE_output = GE_output[,c("SYMBOL", "CORE.ENRICHMENT")]

if (only_core == TRUE){core = GE_output[GE_output$"CORE.ENRICHMENT"== "Yes",]}

Run to see list of selected genes from GSEA:

In [30]:
core

Unnamed: 0,SYMBOL,CORE.ENRICHMENT
30,LARS2,Yes
31,PABPC1,Yes
32,AMBP,Yes
33,RPL37A,Yes
34,RPS24,Yes
35,RPLP0,Yes
36,APOA2,Yes
37,RPL17,Yes
38,RPS27A,Yes
39,FAU,Yes


Read in count data:

In [31]:
count = read.csv("data_expression_median.txt", sep = "\t")

gene_names = count[,1]

Run to check for duplicates -> False if duplicate gene names

In [32]:
#Check for duplicate gene_name entries!
length(gene_names) == length(unique(gene_names))

In [33]:
count = count[,3:ncol(count)]
row.names(count) = gene_names

inst_names = colnames(count)

name_fix = c()
for(name in inst_names){name_fix = c(name_fix,substr(name,11,nchar(name)-3))}

colnames(count) = name_fix

select = count[gene_names %in% core$SYMBOL,]


Run to see the dataframe before it's written to file

In [34]:
select 

Unnamed: 0,PAAPFA,PACLJN,PACPJG,PACRYY,PACRZM,PACSNL,PACSSR,PACUGP,PACVNB,PACYGY,...,PASCKI,PASCLP,PASCUF,PASDZJ,PASFRV,PASGPY,PASJRT,PASJYB,PASKJX,PASLGS
AHSG,5.096412,5.25545,5.609652,5.091222,6.460192,5.218087,5.328534,5.025275,5.387869,5.068619,...,5.236501,5.201537,5.716132,5.199523,5.08412,4.966972,5.170553,5.085419,5.200529,5.045098
ALB,5.101793,4.531768,4.333602,4.55608,4.344362,4.029106,4.411971,4.176784,4.239106,4.635178,...,4.068029,4.205821,4.234127,4.324647,3.944992,4.063642,4.340299,4.160839,4.252368,3.853929
AMBP,6.222121,6.461928,6.400747,6.122267,6.54979,5.877912,6.548087,6.312594,6.052339,6.607948,...,5.977461,6.197083,6.375751,6.155474,6.126246,6.391919,6.246756,6.213895,6.399987,5.953752
APOA1,6.824961,7.147871,6.871821,6.687666,7.291929,6.580556,7.214306,6.587791,7.059619,7.254345,...,6.520644,6.781994,6.720137,6.914521,6.970325,7.087758,7.063235,6.924598,7.11686,6.723874
APOA2,5.470557,4.954743,4.906363,5.358824,5.456729,5.033963,5.368371,5.3967,5.246746,5.616757,...,5.149314,5.102743,5.167847,5.221074,5.091614,5.351821,5.342564,5.149549,5.235384,5.079143
CFB,5.798694,6.10489,5.664659,6.092027,5.732362,6.259788,5.916759,5.909288,5.928758,5.916618,...,5.87181,5.003521,5.691859,6.570744,5.867128,5.349502,5.449962,5.36992,5.629864,5.421731
FABP1,3.738233,3.865052,3.87038,4.498427,4.128958,4.128548,4.053543,4.125933,3.995767,4.440808,...,4.263008,3.83669,4.093822,4.265953,3.746865,3.936625,4.058847,4.385527,4.15796,3.890042
FAU,8.482239,8.44974,8.513563,8.447882,8.335683,8.31009,7.251564,7.652688,8.03279,8.433779,...,8.207772,8.177934,8.251571,7.89963,8.064148,8.120758,8.210707,8.394911,7.822526,8.338585
FGA,4.409271,4.393173,4.536432,4.198303,4.612787,3.72303,4.569873,4.40243,4.513883,4.286931,...,4.2455,4.073124,4.174636,4.293077,4.06378,4.188782,4.36612,4.281995,4.293495,4.038053
FGB,4.271566,4.310536,4.436717,4.093239,4.332373,3.981626,4.679871,4.303806,4.311187,4.359809,...,3.944006,4.294608,4.225439,4.156103,4.052537,4.22896,4.335856,4.382237,4.302189,4.134434


In [35]:
write.csv(select, file=output_count_reduced_filename)