GitHub

Gene Expression Data Pre-processing

Because the raw gene expression data consist of missing and repeated gene Entrez ID, dataset pre-processing was performed. Missing and repeated data can lead to poor survival analysis and the incorrect interpretation of predictors like the diagnosis stage [1]. Based on [2,3 the missing gene Entrez IDs were removed, and the gene expression values of the repeated gene Entrez IDs were averaged across all of the samples. Table 1 presents the de-tails of the gene expression data used in this research.

References

Nur, U.; Shack, L.G.; Rachet, B.; Carpenter, J.R.; Coleman, M.P. Modelling relative survival in the presence of incomplete data: A tutorial. Int. J. Epidemiol. 2009, 39, 118–128.
Liu, W.; Wang, W.; Tian, G.; Xie, W.; Lei, L.; Liu, J.; Huang, W.; Xu, L.; Li, E. Topologically inferring pathway activity for precise survival outcome prediction: Breast cancer as a case. Mol. Biosyst. 2017, 13, 537–548.
Mohammed, A.; Biegert, G.; Adamec, J.; Helikar, T. Identification of potential tissue-specific cancer biomarkers and devel-opment of cancer versus normal genomic classifiers. Oncotarget 2017, 8, 85692–85715, doi:10.18632/oncotarget.21127.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

About

Releases

Packages

NiesHW/gene_expression_preprocess

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages