# American Community Survey - California - 2017 Analysis

In this notebook, I will practice conducting statistical tests and visualizing data within Python. 

The data I'm using comes from the 2017 American Community Survey. I am looking at data from California only. The table I am using is "Selected Characteristics of People at Specified Levels of Poverty in the Past 12 Months." This will allow us to see how poverty levels differ in different counties of California and among different populations.

In [8]:
# Libraries
import pandas as pd

## Data Preparation

Let's import and clean up the data.

In [10]:
# Import data file
data = pd.read_csv('ACS_17_1YR_S1703_with_ann.csv')
metadata = pd.read_csv('ACS_17_1YR_S1703_metadata.csv')

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Columns: 283 entries, GEO.id to HC04_MOE_VC51
dtypes: object(283)
memory usage: 35.5+ KB


In [24]:
data.head(5)

Unnamed: 0,GEO.id,GEO.id2,GEO.display-label,HC01_EST_VC01,HC01_MOE_VC01,HC02_EST_VC01,HC02_MOE_VC01,HC03_EST_VC01,HC03_MOE_VC01,HC04_EST_VC01,...,HC04_EST_VC50,HC04_MOE_VC50,HC01_EST_VC51,HC01_MOE_VC51,HC02_EST_VC51,HC02_MOE_VC51,HC03_EST_VC51,HC03_MOE_VC51,HC04_EST_VC51,HC04_MOE_VC51
0,Id,Id2,Geography,Total; Estimate; Population for whom poverty s...,Total; Margin of Error; Population for whom po...,Less than 50 percent of the poverty level; Est...,Less than 50 percent of the poverty level; Mar...,Less than 100 percent of the poverty level; Es...,Less than 100 percent of the poverty level; Ma...,Less than 125 percent of the poverty level; Es...,...,Less than 125 percent of the poverty level; Es...,Less than 125 percent of the poverty level; Ma...,Total; Estimate; WORK STATUS - Population 16 t...,Total; Margin of Error; WORK STATUS - Populati...,Less than 50 percent of the poverty level; Est...,Less than 50 percent of the poverty level; Mar...,Less than 100 percent of the poverty level; Es...,Less than 100 percent of the poverty level; Ma...,Less than 125 percent of the poverty level; Es...,Less than 125 percent of the poverty level; Ma...
1,0500000US06001,06001,"Alameda County, California",1636780,2648,4.4,0.4,9.2,0.5,12.1,...,17.1,1.5,231484,7735,14.8,1.5,24.6,1.5,29.2,1.6
2,0500000US06013,06013,"Contra Costa County, California",1138850,1328,4.2,0.6,9.3,0.9,12.4,...,14.1,1.7,168044,7131,13.6,1.9,22.6,2.8,28.2,3.0
3,0500000US06019,06019,"Fresno County, California",972580,1778,9.0,0.9,21.1,1.3,27.9,...,32.9,2.5,177421,6349,19.1,2.3,36.6,2.4,43.9,2.5
4,0500000US06029,06029,"Kern County, California",860951,2581,9.8,1.0,21.4,1.6,28.9,...,29.6,3.0,179140,7436,19.4,2.3,35.5,3.0,44.9,3.2


In [25]:
metadata

Unnamed: 0,GEO.id,Id
0,GEO.id2,Id2
1,GEO.display-label,Geography
2,HC01_EST_VC01,Total; Estimate; Population for whom poverty s...
3,HC01_MOE_VC01,Total; Margin of Error; Population for whom po...
4,HC02_EST_VC01,Less than 50 percent of the poverty level; Est...
5,HC02_MOE_VC01,Less than 50 percent of the poverty level; Mar...
6,HC03_EST_VC01,Less than 100 percent of the poverty level; Es...
7,HC03_MOE_VC01,Less than 100 percent of the poverty level; Ma...
8,HC04_EST_VC01,Less than 125 percent of the poverty level; Es...
9,HC04_MOE_VC01,Less than 125 percent of the poverty level; Ma...


This is a very large file with nonintuitive column names and a descriptor row. Let's separate it into smaller DataFrames with descriptive column names to make our analysis easier. 

### Poverty by County

In [21]:
data_county = data[['GEO.id2', 'GEO.display-label', 'HC01_EST_VC01', 'HC01_MOE_VC01', 'HC02_EST_VC01', 'HC02_MOE_VC01', 'HC03_EST_VC01', 'HC03_MOE_VC01', 'HC04_EST_VC01', 'HC04_MOE_VC01']]
data_county.columns = ['county_id', 'county', 'total_population', 'total_population_error', 'county_50_poverty', 'county_50_poverty_error', 'county_100_poverty', 'county_100_poverty_error', 'county_125_poverty', 'county_125_poverty_error']
data_county[data_county.county_id != 'Id2'].reset_index()

Unnamed: 0,index,county_id,county,total_population,total_population_error,county_50_poverty,county_50_poverty_error,county_100_poverty,county_100_poverty_error,county_125_poverty,county_125_poverty_error
0,1,6001,"Alameda County, California",1636780,2648,4.4,0.4,9.2,0.5,12.1,0.7
1,2,6013,"Contra Costa County, California",1138850,1328,4.2,0.6,9.3,0.9,12.4,1.1
2,3,6019,"Fresno County, California",972580,1778,9.0,0.9,21.1,1.3,27.9,1.5
3,4,6029,"Kern County, California",860951,2581,9.8,1.0,21.4,1.6,28.9,1.8
4,5,6037,"Los Angeles County, California",10015695,5187,6.1,0.2,14.9,0.3,20.0,0.4
5,6,6059,"Orange County, California",3154479,3139,4.9,0.3,11.5,0.5,15.3,0.6
6,7,6065,"Riverside County, California",2385745,4745,6.0,0.5,12.9,0.7,17.8,0.9
7,8,6067,"Sacramento County, California",1509927,2925,5.9,0.6,14.1,0.8,19.3,1.0
8,9,6071,"San Bernardino County, California",2093857,5512,6.8,0.5,16.2,0.8,21.7,1.0
9,10,6073,"San Diego County, California",3256674,4245,5.4,0.4,11.8,0.6,15.6,0.7
