# **Income Prediction Model**

## **Introduction**

Income inequality, where income is distributed unevenly among a population, is a growing problem in developing nations across the world. With the rapid rise of artificial intelligence (AI) and worker automation, this problem could continue to grow if steps are not taken to address the issue.

The objective of this project is to create a machine learning model to predict whether an individual earns above or below a certain amount. This solution can potentially reduce costs and improve the accuracy of monitoring key population indicators such as income levels in between census years.

Accurate income predictions can provide valuable insights that help policymakers better manage and avoid income inequality globally. By leveraging machine learning techniques, we aim to provide a robust tool for analyzing income distribution, which is crucial for developing effective socioeconomic policies and interventions.

The project follows the CRISP-DM methodology as follows:
- Business Understanding 
- Data Understanding
- Data Preparation
- Modelling
- Evaluation
- Deployment

## **Business Understanding**

#### **Business Objective**
To address income inequality by providing accurate and timely predictions of individual income levels, aiding policymakers in their efforts to manage and reduce income disparity effectively.

**Business Needs:** A model to accurately predict if an individual earns above 50,000 USD.

#### **Project Goals**
- To create a machine learning model to predict if an individual earns above 50,000 USD.
- To create an API for user interaction

#### **Hypotheses**
 
Null Hypothesis (H0): There is no significant relationship between wages per hour and income level above 50,000 USD.

Alternative Hypothesis (H1): There is a significant relationship between wages per hour and income level above 50,000 USD.

#### **Analytical Questions**
1. What are the key factors influencing whether an individual earns above 50,000 USD?
2. How does the number of hours worked per week impact income level predictions?
3. What role do education level and occupation play in determining income above the specified threshold?
4. Are there any significant regional differences in income levels above 50,000 USD?
5. How does industry type impact income level?

#### **Timeline** **Week One:**
- Data Collection & Cleaning
- Exploratory Data Analysis
- Feature Engineering
- Model Building & Evaluation

**Week Two:**
- Model Tuning & Optimization
- Final Model Evaluation
- App Deployment
- Documentation & Reporting

## **Data Understanding**

Load Necessary Packages

In [None]:
# Data handling packages 
import pandas as pd
import numpy as np 

# Visualization packages
import matplotlib.pyplot as plt 
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

# Feature Processing packages
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import RobustScaler, FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.feature_selection import SelectKBest, mutual_info_classif as Mutual_Info_Classif
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as imbPipeline
from sklearn.metrics import confusion_matrix

# Machine Learning packages
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

# Hyperparameters Fine-tuning packages
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV, cross_val_score, StratifiedKFold

import joblib
