This project has investigated the effectiveness of several Machine Learning models in the classification of DNA sequences. The purpose is to classify the sequences in the dataset into seven gene classes. The models utilized in this project are Random Forests, Support Vector Machine, and Logistic Regression. Data has been processed using the K-mer counting method with K values of 3, 5, and 7. The final results show The maximum F1 score of 0.963 can be achieved on this dataset with Logistic Regression model. Furthermore, the experiments suggests that the Random Forest model can be used with various K values while the other two models work well only with higher K values.
-
Notifications
You must be signed in to change notification settings - Fork 0
Moeinh77/Gene-Classification-Python
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
Classification of genes from DNA sequence data using Python and SKLearn
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published