### Top 25 Most Affordable Places to Raise a Family

<p> Analysis by Alex Mahadevan</p>

In [128]:
import pandas as pd

<p>I used a combination of PANDAS and Excel to clean and merge all of the data herein. Sources include the [U.S. Census Bureau's American Community Survey]("https://www.census.gov/acs/www/data/data-tables-and-tools/data-profiles/2015/"), [University of Michigan's Institute for Social Research]("http://www.icpsr.umich.edu/icpsrweb/NACJD/studies/35019"), [The U.S. Bureau of Labor Statistics]("https://data.bls.gov/map/MapToolServlet?survey=la&map=county&seasonal=u") and [The Robert Wood Johnson Foundation]("http://www.countyhealthrankings.org/reports/2017-county-health-rankings-key-findings-report").

<p>Let's read it all in</p>

In [129]:
df = pd.read_csv("/Users/alexmahadevan/Code/raise_a_family/county data.csv" , index_col=0)

<p>Here is a list of the variables we are considering. For now, I'm not going to analyze crime, because Florida and Alabama were not included and I'll eventually hardcode those in after an initial culling of the herd.</p>

In [130]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Float64Index: 3220 entries, 1.0 to nan
Data columns (total 31 columns):
SUMLEV                        3141 non-null float64
REGION                        3141 non-null float64
DIVISION                      3141 non-null float64
STATE                         3141 non-null float64
COUNTY                        3141 non-null float64
STNAME                        3141 non-null object
CTY                           3141 non-null object
Unnamed: 8                    3135 non-null object
POPESTIMATE2016               3141 non-null float64
AVG_NETMIG                    3141 non-null float64
POP_CHANGE                    3141 non-null float64
BIRTHS_PER_CAPITA             3141 non-null float64
MEDIAN_HOUSEHOLD_INCOME       3141 non-null float64
PERCENT_W_HEALTH_INSURANCE    3141 non-null float64
PERCENTAGE_IN_POVERTY         3141 non-null float64
PERCENT_WITH_KIDS             3141 non-null float64
BACHELORS_DEGREE              3141 non-null float64
MEDIAN_MO

<p>Now I'm going to tweak PANDAS so it doesn't revert every number to its scientific format</p>

In [131]:
pd.set_option('display.float_format', lambda x: '%.3f' % x)

<p>Now let's start with the initial Z-scores. Here's the formula we'll use to normalize all the data so we can compare apples to apples: z = (x – μ) / σ.</p>

In [132]:
df['zFAMKIDSCHANGE'] = (df.CHANGE_IN_FAM_WITH_KIDS - df.CHANGE_IN_FAM_WITH_KIDS.mean())/ df.CHANGE_IN_FAM_WITH_KIDS.std()

In [133]:
df['zPERCENT_WITH_KIDS'] = (df.PERCENT_WITH_KIDS - df.PERCENT_WITH_KIDS.mean())/ df.PERCENT_WITH_KIDS.std()

In [134]:
df['zBIRTHS_PER_CAPITA'] = (df.BIRTHS_PER_CAPITA - df.BIRTHS_PER_CAPITA.mean())/ df.BIRTHS_PER_CAPITA.std()

In [135]:
df['zAVG_NETMIG'] = (df.AVG_NETMIG - df.AVG_NETMIG.mean())/ df.AVG_NETMIG.std()

In [136]:
df['zPOP_CHANGE'] = (df.POP_CHANGE - df.POP_CHANGE.mean())/ df.POP_CHANGE.std()

In [137]:
df['zINCOME_AS_PERCENT_OF_STATE'] = (df.INCOME_AS_PERCENT_OF_STATE - df.INCOME_AS_PERCENT_OF_STATE.mean())/ df.INCOME_AS_PERCENT_OF_STATE.std()

In [138]:
df['zPERCENTAGE_IN_POVERTY'] = -1*((df.PERCENTAGE_IN_POVERTY - df.PERCENTAGE_IN_POVERTY.mean())/ df.PERCENTAGE_IN_POVERTY.std())

In [139]:
df['zPERCENT_W_HEALTH_INSURANCE'] = (df.PERCENT_W_HEALTH_INSURANCE - df.PERCENT_W_HEALTH_INSURANCE.mean())/ df.PERCENT_W_HEALTH_INSURANCE.std()

In [140]:
df['zUNEMPLOYMENT'] = -1*((df.UNEMPLOYMENT - df.UNEMPLOYMENT.mean())/ df.UNEMPLOYMENT.std())

In [141]:
df['zBACHELORS_DEGREE'] = (df.BACHELORS_DEGREE - df.BACHELORS_DEGREE.mean())/ df.BACHELORS_DEGREE.std()

In [142]:
df['zMORTGAGE_INCOME_RATIO'] = (df.MORTGAGE_INCOME_RATIO - df.MORTGAGE_INCOME_RATIO.mean())/ df.MORTGAGE_INCOME_RATIO.std()

In [143]:
df['zRENT_INCOME_RATIO'] = (df.RENT_INCOME_RATIO - df.RENT_INCOME_RATIO.mean())/ df.RENT_INCOME_RATIO.std()

<p>We now have normalized values for 12 variables. We will use these 12 variables to narrow the list down to 11 or 200 cities. Then I'll add crime and school data along with pollution data to the mix.</p>
<p>First, I'm going to drop all of the counties that have populations smaller than 100,000.</p>

In [144]:
cities['CTY'] = cities['name'] + ", " + cities['county']

In [156]:
merged = pd.merge(cities, df, how='outer', on='CTY', copy=True)

In [157]:
merged.head(5)

Unnamed: 0,id,name,county,state_code,state,zip_codes,type,latitude,longitude,area_code,...,zBIRTHS_PER_CAPITA,zAVG_NETMIG,zPOP_CHANGE,zINCOME_AS_PERCENT_OF_STATE,zPERCENTAGE_IN_POVERTY,zPERCENT_W_HEALTH_INSURANCE,zUNEMPLOYMENT,zBACHELORS_DEGREE,zMORTGAGE_INCOME_RATIO,zRENT_INCOME_RATIO
0,1.0,Adak,Aleutians West Census Area,AK,Alaska,99546,City,51.88,-176.658,907,...,,,,,,,,,,
1,2.0,Akhiok,Kodiak Island Borough,AK,Alaska,99615,City,56.946,-154.17,907,...,,,,,,,,,,
2,3.0,Akiachak,Bethel Census Area,AK,Alaska,99551,CDP,60.909,-161.431,907,...,,,,,,,,,,
3,4.0,Akiak,Bethel Census Area,AK,Alaska,99552,City,60.912,-161.214,907,...,,,,,,,,,,
4,5.0,Akutan,Aleutians East Borough,AK,Alaska,99553,City,54.136,-165.773,907,...,,,,,,,,,,
