Skip to content

This project utilizes a Random Forest Classifier to predict individuals' income levels based on various attributes such as age, education, and occupation. Through data preprocessing and model training, it achieves accurate predictions, demonstrating the power of machine learning.

License

Notifications You must be signed in to change notification settings

KishoreMuruganantham/Income-Prediction-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Income with Random Forest Classifier

Introduction

This project focuses on predicting individuals' income using machine learning techniques, particularly employing a Random Forest Classifier. The dataset comprises various attributes such as age, workclass, education, marital status, occupation, etc., which are utilized to predict the income level of individuals. The project utilizes Python libraries such as Pandas, Seaborn, Matplotlib, and Scikit-learn for data preprocessing, analysis, and modeling.

Dataset

The dataset used in this project contains the following attributes:

  • Age: Age of the individual
  • Workclass: Type of workclass (e.g., Private, Self-emp-not-inc, Self-emp-inc, etc.)
  • Fnlwgt: Final weight
  • Education: Highest level of education achieved
  • Educational-num: Numeric representation of education level
  • Marital-status: Marital status of the individual
  • Occupation: Type of occupation
  • Relationship: Relationship status
  • Race: Race of the individual
  • Gender: Gender of the individual
  • Capital-gain: Capital gain earned
  • Capital-loss: Capital loss incurred
  • Hours-per-week: Number of working hours per week
  • Native-country: Country of origin
  • Income: Income level (target attribute)

Preprocessing

Data preprocessing is crucial for building an accurate predictive model. The following preprocessing steps were performed:

  1. Cleaning the data: Handling missing values, removing outliers.
  2. Converting strings to numericals: Encoding categorical variables into numerical representations using techniques like one-hot encoding or label encoding.
  3. Correlation analysis: Understanding the relationship between different attributes and the target variable.

Model Building

The Random Forest Classifier algorithm was chosen for predicting income levels due to its ability to handle categorical variables effectively and its robustness against overfitting. The following steps were taken for model building:

  1. Data splitting: Splitting the dataset into training and testing sets.
  2. Model training: Training the Random Forest Classifier model on the training data.
  3. Model evaluation: Evaluating the model's performance on the testing data using accuracy scores.
  4. Fine-tuning: Tuning the hyperparameters of the model to optimize its performance further.

Results

The Random Forest Classifier model achieved promising accuracy scores, indicating its effectiveness in predicting individuals' income levels based on the provided attributes.

Dependencies

  • Python 3.x
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn

Usage

To replicate or further explore this project, follow these steps:

  1. Clone this repository to your local machine.
  2. Ensure that all dependencies are installed.
  3. Run the provided Python script, which contains the code for data preprocessing, model building, and evaluation.
  4. Analyze the results and modify the code as needed for experimentation or improvement.

Contributors

License

This project is licensed under the MIT License.

About

This project utilizes a Random Forest Classifier to predict individuals' income levels based on various attributes such as age, education, and occupation. Through data preprocessing and model training, it achieves accurate predictions, demonstrating the power of machine learning.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published