Built a classifier for cervical cancer screening prediction using the Kaggle dataset (https://www.kaggle.com/datasets/loveall/cervical-cancer-risk-classification).
Performed BINARY CLASSIFICATION for each of the 4 target variables individually namely Hinselmann, Schiller, Cytology and Biopsy using two classifiers: SVM and KNN.
- Dealt with missing values
- Identified and removed the outliers
- Normalized the data
- Used SMOTE to balance the classes
- Identified useful features and eliminated redundant features
- Reduced the dimensionality of data
- Used PCA to extract the principal components
- Used two classifiers for this task: SVM and KNN
- Tested the results by tuning the hyperparameters for each classifier, e.g., regularizer weight for soft-margin in SVM and the value of k in KNN
- Three evaluation metrics are used for each classifier: Accuracy, Precision and Recall
- Confusion matrix for each target variable is plotted
- Visualized the normalized data distribution using boxplot
- Identified correlated features using correlation heatmap
- Plotted the confusion matrix
- Used seaborn and matplotlib libraries for visualization.