Create a classification model to predict whether a person makes over $50k a year.
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics).
The columns are described as follows:
- Age
- Workclass
- Fnlwgt
- Education
- education_num
- marital_status
- occupation
- relationship
- race
- sex
- capital_gain
- capital_loss
- native_country
- income
https://drive.google.com/file/d/1iT33AiIyE2_vg8eMCtIDPJPJfKkjvdQh/view?usp=sharing
-
Data Collection : Gather a dataset that includes features such as age, education, occupation, marital status, and capital gain/loss, as well as a label indicating whether the person makes over $50k a year.
-
Data Preprocessing : Preprocess the dataset by removing missing values, converting categorical variables to numerical values, and scaling/normalizing the features as needed.
-
Model Selection : Choose a classification algorithm to use for the model, such as logistic regression, decision tree, random forest, KNN or support vector machines (SVM).
-
Model Training : Split the preprocessed dataset into training and testing sets, and use the training set to train the chosen classification model.
-
Model Evaluation : Use the trained model to make predictions on the testing set, and evaluate its performance using metrics such as accuracy, precision, recall, and F1 score.