-
The dataset we used for this project is an extraction of 1994 Census database. It consists of demographic features such as age, workclass, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country and final-weight which is a combination of some features. Our main goal is to predict whether income exceeds $50K/yr based on census data.
-
For this machine learning project, we used some Supervised Learning Models such as Gradient Boosting, SVM, Logistic Regression, Naive Bayes and Decision Tree.
dataset: http://archive.ics.uci.edu/ml/datasets/Census+Income
Supervised Learning / Binary Classification
- After applying Pre-processing and other steps, we tried to get ROC-AUC scores for different models.
-
In the end, we tried to get different scores for different situations.
-
Binary Encoding vs One-Hot Encoding
- Simple & Distribution Based Imputation Comparison
- Column Drop Comparison
*On Jupyter notebook, you can see more explanation for the project and comparisons of the models.