Skip to content

Training classification models estimating the likelihood that an individual will achieve an annual income of $80,000 or more, utilizing a sophisticated analysis of multiple determinants. This comprehensive assessment incorporates work experience , Industry , job title , state and many other features to deliver a prediction of financial success

Notifications You must be signed in to change notification settings

DEV270201/Employee-Salary-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Employee Salary Classifier

Training classification models estimating the likelihood that an individual will achieve an annual income of $80,000 or more, utilizing a sophisticated analysis of multiple determinants. This comprehensive assessment incorporates work experience , Industry , job title , state and many other features to deliver a nuanced prediction of financial success.

Dataset

The dataset contains 17 columns and around 28000 rows. It contains different features like work experience , industry , job title , state, education degree etc..

Data Preprocessing

  1. Used Fuzzy Wuzzy to match different combinations of the name USA as data was manually entered by different users.
  2. Clubbed different education degrees into 4 most common degree categories. Similarly, done for Gender and Race.
  3. Took top 10 Industries and replaced other industries with "Other" for convenience.
  4. Took top 500 Job titles while replaced other job titles with "Other".
  5. Scaled down the Bonus column using RobustScalar.
  6. Used Frequency encoding for State, Industry, Job title, Race due to high cardinality.
  7. Used Label encoding for target variable.
  8. Used Ordinal encoding for Education degree, Work experience, Age.
  9. Used One Hot encoding for Gender.
  10. Removed outliers from our data.

Model Training

With this dataset, Random Forest exhibited the highest model accuracy, reaching 81%, surpassing Naive Bayes,Logistic Regression,KNN which hovered around 76%, with Decision Tree slightly ahead at 78%.

Developed with ❤️ by Devansh Shah, Smit Vora & Harsh Shah

About

Training classification models estimating the likelihood that an individual will achieve an annual income of $80,000 or more, utilizing a sophisticated analysis of multiple determinants. This comprehensive assessment incorporates work experience , Industry , job title , state and many other features to deliver a prediction of financial success

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published