# Income Prediction from Census Data

## Motivation for the Prediction Task

Predicting whether someone has a high income or not can provide valuable insights for various organizations:

- **Government Aid Programs**: In the UK, the government lacks a centralized income tracking system. Understanding the features that predict income could help determine eligibility for welfare programs such as the Winter Fuel Payment allowance.

- **Charities**: Some door-to-door charities use age as a proxy for income when identifying potential donors. This analysis could validate such assumptions and reveal additional predictive features. (See [this discussion](https://www.reddit.com/r/Scotland/comments/1deg1fg/charity_not_allowed_to_speak_to_anyone_under_the/) for context.)

- **Financial Services**: Accountants and financial advisers typically see lower demand from individuals with lower incomes, making income prediction useful for targeting their services effectively e.g. ads on sites that have detailed personal information, such as Facebook.

## Explanatory data analysis


In [4]:
import sys
import pandas as pd
from pathlib import Path

current_dir = Path.cwd()
src_directory = current_dir.parent
sys.path.append(str(src_directory))

from income_predict import preprocessing, cleaning

In [5]:
parquet_path = src_directory / "data" / "census_income.parquet"

df_raw = pd.read_parquet(parquet_path)
print(df_raw.head())


   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   
2   38           Private  215646    HS-grad              9   
3   53           Private  234721       11th              7   
4   28           Private  338409  Bachelors             13   

       marital-status         occupation   relationship   race     sex  \
0       Never-married       Adm-clerical  Not-in-family  White    Male   
1  Married-civ-spouse    Exec-managerial        Husband  White    Male   
2            Divorced  Handlers-cleaners  Not-in-family  White    Male   
3  Married-civ-spouse  Handlers-cleaners        Husband  Black    Male   
4  Married-civ-spouse     Prof-specialty           Wife  Black  Female   

   capital-gain  capital-loss  hours-per-week native-country income  
0          2174             0              40  United-States  <=50K  
1             0             0             