In this project, we have a dataset of Cancer genes found in different Cancer cell lines/tissues. I clustered the similar cancer genes using hierarchical and k-means clustering.
Data source: Platform file for row names(genes name): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL14924 dataset: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30034 Download Series File(s) from Download family available at this link
Cancer data.txt file is extracted from the GSE30034-GPL14924_series_matrix.txt
Genes Expression profiling by RT-PCR
row-names and col-names are taken from source files GPL14924-tbl-1.txt and GSE30034-GPL14924_series_matrix.txt respectively, and merged them in our Cancer data.txt to create a clean and complete copy of dataset, which was not available from source. I have uploaded them here suffixing row-names and colnames before source file name respectively.
I belive applying data analytics techniques in medical and healthcare field can help save lives and because of my interest in healthcare analytics, I did this project to cluster cancer genes. This is an original work. More research work is needed be done in this regard. Connect with me on https://www.linkedin.com/in/vasi-rahman