This data set is about blood donation prediction.This dataset contains blood donation details of regular university visits. The goal is to predict whether the person has donated blood in March 2007. The target is to predict whether a donor will give blood the next time in the blood donation campaign held at the university campus.
Basic understanding of features:
1.Unnamed:0 :-This column represents the donor's unique ID.
2.Months since Last Donation :: This is the number of months since this donor’s most recent donation.
3.Number of Donations :This is the total number of donations that the donor has made.
4.Total Volume Donated (c.c.) :: This is the total amount of blood that the donor has donated in cubic centimetres.
5.Months since First Donation ::This is the number of months since the donor’s first donation.
6.Made Donation in March 2007 ::A binary variable representing whether he/she donated blood in March 2007 (1 stands for donating
blood; 0 stands for not donating blood)
-
'Logistic Regression': 0.6458333333333334,
-
'KNN': 0.7013888888888888,
-
'SVC': 0.6597222222222222,
-
'Decision Tree Classifier': 0.7013888888888888,
-
'RandomForestClassifier': 0.6944444444444444,
-
'XGBClassifier': 0.7222222222222222
We have compared the performance of various binary classification algorithms.Fit the data with 77% accuracy with the XGBoosting Classifier.Hence XGBC gives balanced accuracy in every meassures and is considered as the best model with respect to our business case.Since the dataset was small,imbalanced and with very less features ,we have come to a conclusion that the above score that we obtained is the best.