# Data Reduction

## Goals for this notebook:


- Show a definition of PUMAs for South King County
- Import PSQL searches and load as DataFrames
- Find the total number of persons we can identify in South King County as OY

## Method

We use Pandas to import PSQL searches and loaded the results as DataFrames. We find count totals with the 'sum()' method. Finally we desplay the total number of persons we can identify in South King County as OY.

## Detailed Steps

Import the necessary packages

In [8]:
import pandas as pd
from sqlalchemy import create_engine

Create a pointer to the PSQL database

In [9]:
engine = create_engine("postgresql:///opportunity_youth")

Import a psql table as a pandas DataFrame and display the dataframe. This result is the sum of all people sampled in King County.

In [13]:
puma_name = pd.read_sql(sql="SELECT * FROM tot_people_in_KC;", con=engine)
puma_name

Unnamed: 0,sum
0,2118268.0


List all the OY sample results for all the pumas in King County.

In [78]:
puma_totals = pd.read_sql(sql="SELECT * FROM total_by_puma_f;", con=engine)
puma_totals_df = pd.DataFrame(puma_totals)
puma_totals_df = puma_totals_df.style.hide_index()
puma_totals_df

puma,sum
11601,657
11602,1325
11603,1032
11604,916
11605,908
11606,812
11607,926
11608,1086
11609,755
11610,1853


So, we have data from 16 pumas. These are persons   
   - between ages 16 and 24, 
   - not enrolled in school, 
   - are unimployed or have not worked. 
The total sample across all pumas is:

In [74]:
puma_totals_df['sum'].sum()

19984.0

That's ~20,000 OY in King County from our sample.


We define South King County by the following six PUMAs:

In [79]:
#force rightmost column to display wider
pd.options.display.max_rows
pd.set_option('display.max_colwidth', -1)

puma_names_list = pd.read_sql(sql="SELECT * FROM puma_names_finder0;", con=engine)
puma_names_df = pd.DataFrame(puma_names_list).style.hide_index()
puma_names_df

puma,puma_name
11610,"King County (Central)--Renton City, Fairwood, Bryn Mawr & Skyway"
11611,"King County (West Central)--Burien, SeaTac, Tukwila Cities & White Center"
11612,"King County (Far Southwest)--Federal Way, Des Moines Cities & Vashon Island"
11613,King County (Southwest Central)--Kent City
11614,King County (Southwest)--Auburn City & Lakeland
11615,"King County (Southeast)--Maple Valley, Covington & Enumclaw Cities"


The weighted sum of persons for each PUMA in South King County is given by:

In [80]:
puma_oy_totals = pd.read_sql(sql="SELECT * FROM OY_by_puma0;", con=engine)
puma_oy_totals_df = pd.DataFrame(puma_oy_totals).style.hide_index()
puma_oy_totals_df

puma,sum
11610,1853
11611,2038
11612,1977
11613,2006
11614,1530
11615,1210


Adding the sum column:

In [76]:
print('In South King county there are ' + str(puma_oy_totals['sum'].sum()) + ' persons we can identify as OY.')

In South King county there are 10614.0 persons we can identify as OY.
