# Investigation of California Socioeconomic Relations Dataset

This contains the chapter on how we initially manipulated and parsed the dataset

- [Requirements](#library-imports)
- [Introduction](#intro)
- [Data processing](#data-processing)

## Importing required libraries<a class="anchor" id="library-imports"></a>

In [8]:
# Standard python packages
import os
import sys

# Other package imports
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

## Introduction<a class="anchor" id="intro"></a>

We began the task by looking at the BG_METADATA_2016 csv file as well as the various other csv files and checking out what sort of different areas we were interested to find a correlation between

In [9]:
## CODE GOES HERE

## Obtain and process data<a class="anchor" id="data-processing"></a>

We used pandas and dictionaries available in python in order to map the long column names for each csv into a more readable table so we could identify the different pieces of data

In [10]:
metadata = pd.read_csv("../data/raw/california/train/BG_METADATA_2016.csv")

In [11]:
def replace_columns(df):
    labels = pd.Series(metadata["Full_Name"].values,index=metadata["Short_Name"]).to_dict()
    df = df.rename(columns=labels)
    return df

In [12]:
dfs = {}

path = "../data/raw/california/train/"
for f in os.listdir(path):
    print(f)
    df = pd.read_csv(path+f)
    df = replace_columns(df)
    dfs[f] = df

X02_RACE.csv
X99_IMPUTATION.csv
X00_COUNTS.csv
X20_EARNINGS.csv
X01_AGE_AND_SEX.csv
X03_HISPANIC_OR_LATINO_ORIGIN.csv
X21_VETERAN_STATUS.csv
X17_POVERTY.csv
X12_MARITAL_STATUS_AND_HISTORY.csv
X16_LANGUAGE_SPOKEN_AT_HOME.csv
X22_FOOD_STAMPS.csv
X08_COMMUTING.csv
X09_CHILDREN_HOUSEHOLD_RELATIONSHIP.csv
X27_HEALTH_INSURANCE.csv
X11_HOUSEHOLD_FAMILY_SUBFAMILIES.csv
BG_METADATA_2016.csv
X19_INCOME.csv
X23_EMPLOYMENT_STATUS.csv
X14_SCHOOL_ENROLLMENT.csv
X15_EDUCATIONAL_ATTAINMENT.csv
X07_MIGRATION.csv


In [14]:
dfs['X27_HEALTH_INSURANCE.csv']

Unnamed: 0.1,Unnamed: 0,GEOID,TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Total: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Total: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Under 18 years: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Under 18 years: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Under 18 years: With one type of health insurance coverage: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Under 18 years: With one type of health insurance coverage: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Under 18 years: With one type of health insurance coverage: With employer-based health insurance only: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: Under 18 years: With one type of health insurance coverage: With employer-based health insurance only: Civilian noninstitutionalized population -- (Margin of Error),...,TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: With Medicare and Medicaid/means-tested public coverage: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: Other private only combinations: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: Other private only combinations: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: Other public only combinations: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: Other public only combinations: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: Other coverage combinations: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: With two or more types of health insurance coverage: Other coverage combinations: Civilian noninstitutionalized population -- (Margin of Error),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: No health insurance coverage: Civilian noninstitutionalized population -- (Estimate),TYPES OF HEALTH INSURANCE COVERAGE BY AGE: 65 years and over: No health insurance coverage: Civilian noninstitutionalized population -- (Margin of Error),OBJECTID
0,0,15000US060014001001,3018,195,387,90,380,90,256,67,...,18,0,12,9,14,257,87,0,12,3
1,1,15000US060014002001,1105,103,220,47,220,47,203,47,...,9,0,12,0,12,39,25,0,12,4
2,2,15000US060014002002,855,101,115,42,115,42,94,40,...,19,0,12,0,12,38,28,0,12,5
3,3,15000US060014003001,1466,533,477,220,463,217,463,217,...,12,0,12,0,12,0,12,0,12,6
4,4,15000US060014003002,1265,248,105,73,102,73,84,68,...,12,0,12,0,12,0,12,0,12,7
5,5,15000US060014003003,985,259,110,86,110,86,73,55,...,100,0,12,0,12,13,20,0,12,8
6,6,15000US060014003004,1520,321,230,138,230,138,169,114,...,70,0,12,0,12,62,88,0,12,9
7,7,15000US060014004001,1387,209,184,80,184,80,163,77,...,12,0,12,0,12,0,12,0,12,10
8,8,15000US060014004002,1187,236,120,75,120,75,109,72,...,17,0,12,6,10,18,21,0,12,11
9,9,15000US060014004003,1591,270,345,104,337,103,290,104,...,25,0,12,11,26,13,22,0,12,12
