## County Stats Normalized: 2012 - 2019
---
#### ALL variables are either a proportion, median, average, ordinal, or per-capita, (there are also 2 text categoricals). This means each variable is independent of total population, so it's safe to make comparisons across groups or use this data in a model

- **age, and gend**
  - `age_median`: (Existing)
  - `age_undr19_prop`: What percent of the population is under 19?
  - `gend_m_prop`: What percent of the population is male?
  - `age_undr19_gend_m_prop`: What percent of under-19 year old are male?
  - `gend_m_age_undr19_prop`: What percent of males are under age 19?
  - (Columns also exist for the inverse of the above)
- **inc**: Income
  - `inc_hh_med`: (Existing) Median household income
  - `inc_per_cap`: (Existing) Per capita income
- **hh**: Households
  - `hh_size_avg`: (Existing) Average household size
- **race**: Race
  - `race_{x}_prop`: What percent of the population is race x?
  - `race_prop_stdev`: What is the standard deviation of the race proportions? This shows how diverse an area is, as the std of proportions is not affected by total population
- **hsgrad**: Adults who graduated high school
  - `hsgrad_graduated_prop`: What percent of adults (age 25+) have a high school diploma or equivalent?
- **civ_lf**: Civilian labor force
  - `civ_lf_prop`: What percent of the population is in the civilian labor force?
  - `civ_lf_employed_prop`: What percent of the civilian labor force is employed?
- **ps**: Poverty status
  - `ps_total_prop`: What percent of people whose poverty status is known are below the poverty line?
  - `ps_undr18_total_prop`: What percent of under-18 people whose poverty status is known are below the poverty line?
  - `ps_undr18_prop`: What percent of people below the poverty line are under 18?
- **stud**: Current students
  - `stud_enrolled_prop`: Percent of people who could be enrolled in school that actually are enrolled
  - `stud_hs_prop`: What percent of gradeschool students (1-12) are high schoolers? (lower number indicates dropouts, which may associate with crime)
  - `stud_undergrad_prop`: What percent of enrolled students are undergraduates?
- **citz**: U.S. Citizens
  - `citz_prop`: What percent of the population is a us citizen?
  - `citz_birth_prop`: What percent of us citizens were born in the us?
  - `citz_co_prop`: What percent of citizens were born in Colorado?
- **hu**: Housing units
  - `hu_per_cap`: Housing units per capita (num housing units / total population)
  - `hu_occ_prop`: Percent of housing units which are occupied
  - `hu_blt_1950_1959_prop` (7x bins): Percent of housing units which were built between 1950 and 1959
  - `hu_blt_freq_yr`: (categorical) The name of the year range in which the greatest number of houses were built
  - `hu_blt_freq_yr_ord`: (ordinal, 1-7) `hu_blt_freq_yr` converted to ordinal, in which higher number means more recent year range (better)
- **hu_oo**: Housing units: owner-occupied
  - `hu_oo_prop`: Percent of **occupied** housing units that are **owner-occupied**. The remaining proportion is renter-occupied
  - `hu_oo_150_250_prop` (6x bins): Percent of **owner occupied** properties worth between 150k and 250k
  - `hu_oo_freq_val`: (categorical) The name of the dollar value range in which the greatest number of homes fall
  - `hu_oo_freq_val_ord`: (ordinal, 1-6) `hu_oo_freq_val` converted to ordinal, in which a higher number means higher value

In [1]:
import pandas as pd
from grouped_df import GroupedDF # simple class to split a df into multiple dfs based on its column names

GroupedDF.groups = {'gend':'Gender', 'age':'Age', 'race':'Race', 'inc':'Income', 'citz':'Citizenship', 'hh':'Households',
    'hsgrad':'Adults with high school deg', 'civ_lf':'Civilian laborforce', 'ps':'Poverty status',
    'stud':'Current students', 'hu':'Housing units', 'hu_blt':'Housing units - Year built',
    'hu_oo':'Housing units - Owner-Occupied'}

df = pd.read_csv('output/county_stats_normalized.csv', index_col=0)
df.head(3)

Unnamed: 0,year,county,pop,age_over18_prop,age_undr19_prop,gend_m_prop,gend_f_prop,age_undr19_gend_m_prop,age_undr19_gend_f_prop,age_over18_gend_m_prop,...,inc_per_cap,citz_per_cap,hu_per_cap,age_median,inc_hh_median,hh_size_avg,hu_oo_freq_val_ord,hu_oo_freq_val,hu_blt_freq_yr_ord,hu_blt_freq_yr
0,2012.0,ADAMS,460468.0,0.705276,0.294724,0.502904,0.497107,0.511838,0.488162,0.49917,...,24357,0.860368,0.35452,32.4,56633,2.91,3,v150k_250k,7,b2000_2009
1,2012.0,ALAMOSA,15606.0,0.728374,0.271626,0.501282,0.499295,0.519226,0.482189,0.49459,...,19657,0.968986,0.42112,32.2,38045,2.49,2,v50k_150k,1,b1949_e
2,2012.0,ARAPAHOE,596226.0,0.737078,0.262922,0.490666,0.509325,0.512506,0.487487,0.482876,...,32845,0.872052,0.399446,35.7,60400,2.55,3,v150k_250k,4,b1970_1979


In [2]:
g = GroupedDF(df, custom={'hu': ['year', 'county', 'hu_per_cap', 'hu_occ_prop']})

g.display()

gend: Gender


Unnamed: 0,year,county,gend_m_prop,gend_f_prop,gend_m_age_undr19_prop,gend_m_age_over18_prop,gend_f_age_undr19_prop,gend_f_age_over18_prop
0,2012.0,ADAMS,0.502904,0.497107,0.29996,0.70004,0.289421,0.710579
1,2012.0,ALAMOSA,0.501282,0.499295,0.28135,0.71865,0.26232,0.73768
2,2012.0,ARAPAHOE,0.490666,0.509325,0.274625,0.725375,0.251649,0.748351



age: Age


Unnamed: 0,year,county,age_over18_prop,age_undr19_prop,age_undr19_gend_m_prop,age_undr19_gend_f_prop,age_over18_gend_m_prop,age_over18_gend_f_prop,age_median
0,2012.0,ADAMS,0.705276,0.294724,0.511838,0.488162,0.49917,0.500845,32.4
1,2012.0,ALAMOSA,0.728374,0.271626,0.519226,0.482189,0.49459,0.505674,32.2
2,2012.0,ARAPAHOE,0.737078,0.262922,0.512506,0.487487,0.482876,0.517115,35.7



race: Race


Unnamed: 0,year,county,race_hispanic_prop,race_white_prop,race_black_prop,race_asian_prop,race_other_prop,race_prop_stdev
0,2012.0,ADAMS,0.363882,0.512502,0.028167,0.033236,0.024269,0.206129
1,2012.0,ALAMOSA,0.4604,0.497693,0.007049,0.003781,0.040305,0.227001
2,2012.0,ARAPAHOE,0.1764,0.611792,0.093302,0.047074,0.034754,0.215377



inc: Income


Unnamed: 0,year,county,inc_per_cap,inc_hh_median
0,2012.0,ADAMS,24357,56633
1,2012.0,ALAMOSA,19657,38045
2,2012.0,ARAPAHOE,32845,60400



citz: Citizenship


Unnamed: 0,year,county,citz_birth_prop,citz_co_prop,citz_per_cap
0,2012.0,ADAMS,0.950229,0.565176,0.860368
1,2012.0,ALAMOSA,0.983203,0.631001,0.968986
2,2012.0,ARAPAHOE,0.937754,0.429728,0.872052



hh: Households


Unnamed: 0,year,county,hh_size_avg
0,2012.0,ADAMS,2.91
1,2012.0,ALAMOSA,2.49
2,2012.0,ARAPAHOE,2.55



hsgrad: Adults with high school deg


Unnamed: 0,year,county,hsgrad_graduated_prop
0,2012.0,ADAMS,0.604913
1,2012.0,ALAMOSA,0.630942
2,2012.0,ARAPAHOE,0.525874



civ_lf: Civilian laborforce


Unnamed: 0,year,county,civ_lf_prop,civ_lf_employed_prop
0,2012.0,ADAMS,0.512761,0.905485
1,2012.0,ALAMOSA,0.459503,0.899317
2,2012.0,ARAPAHOE,0.533424,0.9184



ps: Poverty status


Unnamed: 0,year,county,ps_total_prop,ps_undr18_total_prop,ps_undr18_prop
0,2012.0,ADAMS,0.141516,0.20324,0.407657
1,2012.0,ALAMOSA,0.218233,0.198585,0.237543
2,2012.0,ARAPAHOE,0.117654,0.159459,0.344372



stud: Current students


Unnamed: 0,year,county,stud_enrolled_prop,stud_hs_prop,stud_undergrad_prop
0,2012.0,ADAMS,0.279257,0.305236,0.164248
1,2012.0,ALAMOSA,0.359793,0.366708,0.426147
2,2012.0,ARAPAHOE,0.279887,0.340967,0.191012



hu: Housing units


Unnamed: 0,year,county,hu_per_cap,hu_occ_prop
0,2012.0,ADAMS,0.35452,0.925198
1,2012.0,ALAMOSA,0.42112,0.890596
2,2012.0,ARAPAHOE,0.399446,0.939482



hu_blt: Housing units - Year built


Unnamed: 0,year,county,hu_blt_2000_plus_prop,hu_blt_lt_1950_prop,hu_blt_1950_1959_prop,hu_blt_1960_1969_prop,hu_blt_1970_1979_prop,hu_blt_1980_1989_prop,hu_blt_1990_1999_prop,hu_blt_freq_yr_ord,hu_blt_freq_yr
0,2012.0,ADAMS,0.236957,0.037722,0.124776,0.120157,0.184906,0.12477,0.169059,7,b2000_2009
1,2012.0,ALAMOSA,0.098904,0.233719,0.089927,0.099513,0.213786,0.131163,0.131771,1,b1949_e
2,2012.0,ARAPAHOE,0.165498,0.030085,0.069361,0.093458,0.261392,0.235182,0.142715,4,b1970_1979



hu_oo: Housing units - Owner-Occupied


Unnamed: 0,year,county,hu_oo_prop,hu_oo_lt_50_prop,hu_oo_50_150_prop,hu_oo_150_250_prop,hu_oo_250_400_prop,hu_oo_400_750_prop,hu_oo_750_plus_prop,hu_oo_freq_val_ord,hu_oo_freq_val
0,2012.0,ADAMS,0.662818,0.085687,0.198166,0.475317,0.177598,0.054211,0.00902,3,v150k_250k
1,2012.0,ALAMOSA,0.632496,0.117504,0.431929,0.290924,0.107239,0.047812,0.004592,2,v50k_150k
2,2012.0,ARAPAHOE,0.639821,0.029387,0.154892,0.390722,0.266929,0.114133,0.043937,3,v150k_250k



