# Investigation of California Socioeconomic Relations Dataset

This contains the chapter on how we analysed the data and discusses a summary of our findings.

- [Requirements](#library-imports)
- [Training the Model](#model)
- [Testing the model](#test)

## Importing required libraries<a class="anchor" id="library-imports"></a>

In [2]:
# Standard python packages
import os
import sys
from pathlib import Path

# Other package imports
import numpy as npb
import pandas as pd
from matplotlib import pyplot as plt

In [4]:
data_folder = Path("../data/processed")

In [17]:
income_df = pd.read_csv(data_folder / "X19_INCOME.csv")

print(income_df.columns[3:80])

Index(['HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Total: Households -- (Estimate)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Total: Households -- (Margin of Error)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Less than $10,000: Households -- (Estimate)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Less than $10,000: Households -- (Margin of Error)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): $10,000 to $14,999: Households -- (Estimate)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): $10,000 to $14,999: Households -- (Margin of Error)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): $15,000 to $19,999: Households -- (Estimate)',
       'HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS)

In [14]:
income_total = income_df["HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Total: Households -- (Estimate)"]
income_total.head

<bound method NDFrame.head of 0        1292
1         443
2         370
3         483
4         717
5         560
6         679
7         672
8         537
9         589
10       1142
11        251
12        250
13        480
14        233
15        508
16        514
17        349
18        416
19        748
20        368
21        464
22        579
23        446
24        447
25        408
26        593
27        258
28        392
29        521
         ... 
18968     385
18969      79
18970     889
18971     995
18972     473
18973     618
18974       0
18975     840
18976     811
18977     441
18978     564
18979     591
18980     511
18981     708
18982     705
18983     370
18984     304
18985     568
18986     388
18987     550
18988     230
18989     513
18990     415
18991     332
18992     698
18993     837
18994     715
18995     635
18996     417
18997     378
Name: HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS): Total: Households -- (Estimate), 

In [21]:
earnings_df = pd.read_csv(data_folder / "X20_EARNINGS.csv")

print(earnings_df.columns[3:])

Index(['SEX BY EARNINGS IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) FOR THE POPULATION 16 YEARS AND OVER WITH EARNINGS IN THE PAST 12 MONTHS: Total: Population 16 years and over with earnings -- (Estimate)',
       'SEX BY EARNINGS IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) FOR THE POPULATION 16 YEARS AND OVER WITH EARNINGS IN THE PAST 12 MONTHS: Total: Population 16 years and over with earnings -- (Margin of Error)',
       'SEX BY EARNINGS IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) FOR THE POPULATION 16 YEARS AND OVER WITH EARNINGS IN THE PAST 12 MONTHS: Male: Population 16 years and over with earnings -- (Estimate)',
       'SEX BY EARNINGS IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) FOR THE POPULATION 16 YEARS AND OVER WITH EARNINGS IN THE PAST 12 MONTHS: Male: Population 16 years and over with earnings -- (Margin of Error)',
       'SEX BY EARNINGS IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) FOR THE POPUL