Predicting Income with Random Forest Classifier

Introduction

This project focuses on predicting individuals' income using machine learning techniques, particularly employing a Random Forest Classifier. The dataset comprises various attributes such as age, workclass, education, marital status, occupation, etc., which are utilized to predict the income level of individuals. The project utilizes Python libraries such as Pandas, Seaborn, Matplotlib, and Scikit-learn for data preprocessing, analysis, and modeling.

Dataset

The dataset used in this project contains the following attributes:

Age: Age of the individual
Workclass: Type of workclass (e.g., Private, Self-emp-not-inc, Self-emp-inc, etc.)
Fnlwgt: Final weight
Education: Highest level of education achieved
Educational-num: Numeric representation of education level
Marital-status: Marital status of the individual
Occupation: Type of occupation
Relationship: Relationship status
Race: Race of the individual
Gender: Gender of the individual
Capital-gain: Capital gain earned
Capital-loss: Capital loss incurred
Hours-per-week: Number of working hours per week
Native-country: Country of origin
Income: Income level (target attribute)

Preprocessing

Data preprocessing is crucial for building an accurate predictive model. The following preprocessing steps were performed:

Cleaning the data: Handling missing values, removing outliers.
Converting strings to numericals: Encoding categorical variables into numerical representations using techniques like one-hot encoding or label encoding.
Correlation analysis: Understanding the relationship between different attributes and the target variable.

Model Building

The Random Forest Classifier algorithm was chosen for predicting income levels due to its ability to handle categorical variables effectively and its robustness against overfitting. The following steps were taken for model building:

Data splitting: Splitting the dataset into training and testing sets.
Model training: Training the Random Forest Classifier model on the training data.
Model evaluation: Evaluating the model's performance on the testing data using accuracy scores.
Fine-tuning: Tuning the hyperparameters of the model to optimize its performance further.

Results

The Random Forest Classifier model achieved promising accuracy scores, indicating its effectiveness in predicting individuals' income levels based on the provided attributes.

Dependencies

Python 3.x
Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn

Usage

To replicate or further explore this project, follow these steps:

Clone this repository to your local machine.
Ensure that all dependencies are installed.
Run the provided Python script, which contains the code for data preprocessing, model building, and evaluation.
Analyze the results and modify the code as needed for experimentation or improvement.

Contributors

KishoreMuruganantham

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Income_Prediction.ipynb		Income_Prediction.ipynb
LICENSE		LICENSE
README.md		README.md
adult.csv		adult.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Income with Random Forest Classifier

Introduction

Dataset

Preprocessing

Model Building

Results

Dependencies

Usage

Contributors

License

About

Releases

Packages

Languages

License

KishoreMuruganantham/Income-Prediction-Model

Folders and files

Latest commit

History

Repository files navigation

Predicting Income with Random Forest Classifier

Introduction

Dataset

Preprocessing

Model Building

Results

Dependencies

Usage

Contributors

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages