Apply the processing methods to a given dataset. These scripts process both gene expression and methylation data.
Rscript pre_processing.R GSE disease outName
# First the GSE, second the disease and third is the name of result RData object
The output is a RData object. For gene expression, this object contains the processed matrix, differential expression results, HiPathia results and KEGG results. For methylation, this object contains the processed beta-values matrix.
We provide four examples:
- GSE23117 is a dataset generated by Affymetrix platform
- GSE24706 is a dataset generated by Illumina gene expression array platform
- GSE57869 is a dataset generated by Illumina methylation array platform
- GSE110914 is a dataset generated by Illumina RNA-Seq platform
Each generating protocol has a different pipeline. These pipelines are located in lib.R file as different functions.
GCF_000001405.38_GRCh38.p12_map.csv file is a tab separated table to match gene symbol with entrez identifiers to perform KEGG pathway analysis.
KEGG_genes.tsv file is a tab separated table with the whole universe genes collected in KEGG.
GSEXXXXX.txt are tab separated tables with the information about phenodata.
Rscript pre_processing.R GSE23117 SjS GSE23117