It is for the paper of "MGRFE: multilayer recursive feature elimination based on embedded genetic algorithm for cancer classification". The detailed information is available at the file MGRFE.pdf.
In this repository, you can find our whole source codes, the generated results and all 19 data sets for cancer classification used by MGRFE.
We proposed MGRFE, a novel multilayer recursive feature elimination algorithm based on embedded variable length encoding genetic algorithm, which aims at selecting minimal discriminatory genes associated closely with the phenotypes in micro-array gene datasets. The work combined the evolutionary calculating of embedded genetic algorithm and explicit feature decline of recursive feature elimination as GaRFE, which is taken as the feature selection unit at each layer of MGRFE.
The mostly used total 19 benchmark micro-array datasets including multi-class and imbalanced datasets are divided into two large datasets (e.g., the Dataset One and Dataset Two) and used to validate the proposed method and make a comprehensive comparison with other popular feature selection methods for cancer classification. Many promising results were obtained by MGRFE on these datasets. MGRFE can reaches Acc 100% within just 5 genes on 10 (52.6%) of 19 datasets, and Acc higher than 90% within 10 genes on all 19 datasets. MGRFE also shows the robustness for multi-class datasets and imbalanced datasets according to Sn, Sp, Avc, and MCC metrics. Based on classification performance comparison with other 20 methods on the two large Datasets, our proposed method MGRFE is proved to be more superior than most of current popular feature selection methods for achieving better classification accuracy with smaller gene size.
Furthermore, the biological function analysis using literature mining for predicted bio-markers confirmed that the selected genes by MGRFE are biologically relevant to cancer phenotypes.
MGRFE can represent a complementary feature selection algorithm for high-dimensional bio-data analysis and is significant for cancer diagnosis and further biomedical research.
- The codes in this project are in Python 3.6. And following related python packages are also depended and should be installed.
- Currently, you should edit the paths (e.g., the project path
D:\\codes\\python\\MGRFE\\
) in the scripts based on your settings. - An example showing how to use the existing codes to deal with your own micro-array gene expression data set is available at the Demo folder.
- Note that thorough comments have been added in the scripts to improve readability and might offer you a better understanding about the proposed algorithm and its implementation.
For any bugs, implementation doublts or anything else, just feel free to contact chengpengeace@gmail.com or start an issue instead.