# Top 25 Most Affordable Places to Raise a Family

Analysis by Alex Mahadevan  
Data journalist, [The Penny Hoarder]("https://www.thepennyhoarder.com")

### Data Cleaning

<p>We begin by importing the Pandas data analysis library.</p> 

In [1]:
import pandas as pd

<p>We used a combination of PANDAS and Excel to clean and merge all of the data herein. Sources include the [U.S. Census Bureau's American Community Survey](https://www.census.gov/acs/www/data/data-tables-and-tools/data-profiles/2015/), [University of Michigan's Institute for Social Research](http://www.icpsr.umich.edu/icpsrweb/NACJD/studies/35019), [The U.S. Bureau of Labor Statistics](https://data.bls.gov/map/MapToolServlet?survey=la&map=county&seasonal=u) and [The Robert Wood Johnson Foundation](http://www.countyhealthrankings.org/reports/2017-county-health-rankings-key-findings-report).
<p>The data cleaning process was daunting, dirty and rigorous, so we omitted it from the final analysis.</p>

<p>Now, we'll read in the initial dataset we'll be working from.</p>
<p>Notes:</p>
* The crime data for some smaller counties may be missing or underreported.
* We used counties instead of metropolitan statistical area or city data to have the most robust analysis with the widest number of variables, and to capture suburbs.
* Some of the data points were not available for Alaska and Hawaii. That's OK, since those states are pretty expensive to live in anyway (they'll likely be thrown out of the analysis).

In [2]:
df = pd.read_csv("/Users/alexmahadevan/Code/raise_a_family/county data.csv" , index_col=0)


<p>Here is a list of the variables we are considering.</p>

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Float64Index: 3220 entries, 1.0 to nan
Data columns (total 40 columns):
SUMLEV                        3141 non-null float64
REGION                        3141 non-null float64
DIVISION                      3141 non-null float64
STATE                         3141 non-null float64
COUNTY                        3141 non-null float64
STNAME                        3141 non-null object
CTY                           3141 non-null object
HEALTHCARE_COST               3134 non-null float64
PERCENT_FOOD_INSECURE         3135 non-null float64
CHILD_MORTALITY_RATE          1943 non-null float64
MENTAL_DISTRESS               3135 non-null float64
CHILDREN_UNINSURED            3135 non-null float64
DISCONNECTED_YOUTH            2046 non-null float64
DAILY_POLLUTION               3108 non-null float64
WATER_VIOLATION               3077 non-null float64
HOUSING_PROBLEMS              3135 non-null float64
POPESTIMATE2016               3141 non-null float64
AVG_NETM

<p>As you can see, for most variables, we have more than 3,100 counties to consider.</p>
<p>Since most people prefer to live where other people are living (and not in an igloo in Yakutat, Alaska), let's cull this list to only the counties with more population than the median.</p>
<p>First let's find the average.</p>

In [4]:
df.POPESTIMATE2016.mean()

102874.06080865966

<p>As you can see, the average population of a U.S. county as of 2016 was 102,874 (plus one twentieth of a person).</p>
<p>Let's drop the counties with a population less than that amount.</p>

In [5]:
df = df[df.POPESTIMATE2016 > 102874.06080865966]

In [6]:
df.COUNTY.describe()

count    584.000000
mean      90.041096
std      104.140746
min        1.000000
25%       25.000000
50%       67.000000
75%      113.000000
max      810.000000
Name: COUNTY, dtype: float64

<p>Now I'm going to tweak PANDAS so it doesn't revert every number to its scientific format</p>

In [7]:
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [8]:
df.fillna(0)

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTY,HEALTHCARE_COST,PERCENT_FOOD_INSECURE,CHILD_MORTALITY_RATE,...,INCOME_AS_PERCENT_OF_STATE,UNEMPLOYMENT,MORTGAGE_INCOME_RATIO,RENT_INCOME_RATIO,HEALTHCARE_INCOME_RATIO,DOCTORS,EXERCISE_ACCESS,FOOD_ENVIRONMENT_INDEX,CIVIC_ASSOCIATIONS,CRIME_RATE
2.000,50.000,3.000,6.000,1.000,3.000,Alabama,"Baldwin County, Alabama",9413.000,14.000,48.000,...,116.800,5.400,38.187,57.172,5.339,11.000,72.000,7.500,231.000,217.000
8.000,50.000,3.000,6.000,1.000,15.000,Alabama,"Calhoun County, Alabama",10673.000,18.000,70.000,...,93.900,6.700,40.449,65.468,3.907,14.000,59.000,6.100,170.000,574.000
35.000,50.000,3.000,6.000,1.000,69.000,Alabama,"Houston County, Alabama",9997.000,18.000,78.000,...,97.200,5.900,40.011,60.161,4.158,31.000,53.000,6.400,135.000,390.000
37.000,50.000,3.000,6.000,1.000,73.000,Alabama,"Jefferson County, Alabama",9495.000,20.000,90.000,...,108.000,5.900,35.857,56.309,4.803,0.000,78.000,5.700,970.000,797.000
41.000,50.000,3.000,6.000,1.000,81.000,Alabama,"Lee County, Alabama",8563.000,18.000,67.000,...,103.600,5.300,36.774,56.275,5.205,31.000,63.000,6.000,137.000,275.000
45.000,50.000,3.000,6.000,1.000,89.000,Alabama,"Madison County, Alabama",9793.000,16.000,65.000,...,131.900,5.200,47.574,75.808,5.922,7.000,82.000,6.900,386.000,584.000
49.000,50.000,3.000,6.000,1.000,97.000,Alabama,"Mobile County, Alabama",9860.000,20.000,83.000,...,94.900,6.900,37.669,55.595,4.443,6.000,69.000,5.700,483.000,550.000
51.000,50.000,3.000,6.000,1.000,101.000,Alabama,"Montgomery County, Alabama",9046.000,23.000,87.000,...,95.200,5.900,39.545,54.507,4.905,65.000,76.000,5.000,342.000,408.000
52.000,50.000,3.000,6.000,1.000,103.000,Alabama,"Morgan County, Alabama",9990.000,14.000,78.000,...,105.400,5.600,43.739,76.379,4.580,3.000,71.000,7.600,150.000,192.000
59.000,50.000,3.000,6.000,1.000,117.000,Alabama,"Shelby County, Alabama",9866.000,11.000,48.000,...,158.100,4.400,48.911,74.351,7.114,6.000,73.000,8.400,222.000,180.000


### Normalizing Data

<p>Now let's start with the initial Z-scores. Here's the formula we'll use to normalize all the data so we can compare apples to apples: z = (x – μ) / σ.</p>

In [9]:
df['zFAMKIDSCHANGE'] = (df.CHANGE_IN_FAM_WITH_KIDS - df.CHANGE_IN_FAM_WITH_KIDS.mean())/ df.CHANGE_IN_FAM_WITH_KIDS.std()

In [10]:
df['zPERCENT_WITH_KIDS'] = (df.PERCENT_WITH_KIDS - df.PERCENT_WITH_KIDS.mean())/ df.PERCENT_WITH_KIDS.std()

In [11]:
df['zBIRTHS_PER_CAPITA'] = (df.BIRTHS_PER_CAPITA - df.BIRTHS_PER_CAPITA.mean())/ df.BIRTHS_PER_CAPITA.std()

In [12]:
df['zAVG_NETMIG'] = (df.AVG_NETMIG - df.AVG_NETMIG.mean())/ df.AVG_NETMIG.std()

In [13]:
df['zPOP_CHANGE'] = (df.POP_CHANGE - df.POP_CHANGE.mean())/ df.POP_CHANGE.std()

In [14]:
df['zINCOME_AS_PERCENT_OF_STATE'] = (df.INCOME_AS_PERCENT_OF_STATE - df.INCOME_AS_PERCENT_OF_STATE.mean())/ df.INCOME_AS_PERCENT_OF_STATE.std()

In [15]:
df['zPERCENTAGE_IN_POVERTY'] = -1*((df.PERCENTAGE_IN_POVERTY - df.PERCENTAGE_IN_POVERTY.mean())/ df.PERCENTAGE_IN_POVERTY.std())

In [16]:
df['zPERCENT_W_HEALTH_INSURANCE'] = (df.PERCENT_W_HEALTH_INSURANCE - df.PERCENT_W_HEALTH_INSURANCE.mean())/ df.PERCENT_W_HEALTH_INSURANCE.std()

In [17]:
df['zUNEMPLOYMENT'] = -1*((df.UNEMPLOYMENT - df.UNEMPLOYMENT.mean())/ df.UNEMPLOYMENT.std())

In [18]:
df['zBACHELORS_DEGREE'] = (df.BACHELORS_DEGREE - df.BACHELORS_DEGREE.mean())/ df.BACHELORS_DEGREE.std()

In [19]:
df['zMORTGAGE_INCOME_RATIO'] = (df.MORTGAGE_INCOME_RATIO - df.MORTGAGE_INCOME_RATIO.mean())/ df.MORTGAGE_INCOME_RATIO.std()

In [20]:
df['zRENT_INCOME_RATIO'] = (df.RENT_INCOME_RATIO - df.RENT_INCOME_RATIO.mean())/ df.RENT_INCOME_RATIO.std()

In [21]:
df['zHEALTHCARE_INCOME_RATIO'] = (df.HEALTHCARE_INCOME_RATIO - df.HEALTHCARE_INCOME_RATIO.mean())/ df.HEALTHCARE_INCOME_RATIO.std()

In [22]:
df['zDOCTORS'] = (df.DOCTORS - df.DOCTORS.mean())/ df.DOCTORS.std()

In [23]:
df['zEXERCISE_ACCESS'] = (df.EXERCISE_ACCESS - df.EXERCISE_ACCESS.mean())/ df.EXERCISE_ACCESS.std()

In [24]:
df['zFOOD_ENVIRONMENT_INDEX'] = (df.FOOD_ENVIRONMENT_INDEX - df.FOOD_ENVIRONMENT_INDEX.mean())/ df.FOOD_ENVIRONMENT_INDEX.std()

In [25]:
df['zCIVIC_ASSOCIATIONS'] = (df.CIVIC_ASSOCIATIONS - df.CIVIC_ASSOCIATIONS.mean())/ df.CIVIC_ASSOCIATIONS.std()

In [26]:
df['zCRIME_RATE'] = -1*((df.CRIME_RATE - df.CRIME_RATE.mean())/ df.CRIME_RATE.std())

In [27]:
df['zDAILY_POLLUTION'] = -1*((df.DAILY_POLLUTION - df.DAILY_POLLUTION.mean())/ df.DAILY_POLLUTION.std())

In [28]:
df['zCHILD_MORTALITY_RATE'] = -1*((df.CHILD_MORTALITY_RATE - df.CHILD_MORTALITY_RATE.mean())/ df.CHILD_MORTALITY_RATE.std())

In [29]:
df['zCHILDREN_UNINSURED'] = -1*((df.CHILDREN_UNINSURED - df.CHILDREN_UNINSURED.mean())/ df.CHILDREN_UNINSURED.std())

In [30]:
df['zDISCONNECTED_YOUTH'] = -1*((df.DISCONNECTED_YOUTH - df.DISCONNECTED_YOUTH.mean())/ df.DISCONNECTED_YOUTH.std())

In [31]:
df['zHOUSING_PROBLEMS'] = -1*((df.HOUSING_PROBLEMS - df.HOUSING_PROBLEMS.mean())/ df.HOUSING_PROBLEMS.std())

### Ranking Counties
<p>Now that we have normalized scores for all of our 23 of our variables. You'll notice, we multiplied some of those by -1, because families want less pollution, crime, etc.</p>

In [32]:
df['zSCORE_TOTAL'] = (df['zPERCENT_WITH_KIDS'] + df['zBIRTHS_PER_CAPITA'] + df['zFAMKIDSCHANGE'] + df['zAVG_NETMIG'] + df['zPOP_CHANGE']  + df['zINCOME_AS_PERCENT_OF_STATE'] + df['zPERCENTAGE_IN_POVERTY'] + df['zPERCENT_W_HEALTH_INSURANCE'] + df['zUNEMPLOYMENT'] + df['zBACHELORS_DEGREE'] + df['zMORTGAGE_INCOME_RATIO'] + df['zRENT_INCOME_RATIO'] + df['zHEALTHCARE_INCOME_RATIO'] + df['zDOCTORS'] + df['zEXERCISE_ACCESS'] + df['zFOOD_ENVIRONMENT_INDEX'] + df['zCIVIC_ASSOCIATIONS'] + df['zCRIME_RATE'] + df['zDAILY_POLLUTION'] + df['zCHILD_MORTALITY_RATE'] + df['zCHILDREN_UNINSURED'] + df['zDISCONNECTED_YOUTH'] + df['zHOUSING_PROBLEMS'])/23

In [33]:
df_sorted = df.sort_values('zSCORE_TOTAL', ascending=False)

In [34]:
df_sorted.head(50)

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTY,HEALTHCARE_COST,PERCENT_FOOD_INSECURE,CHILD_MORTALITY_RATE,...,zEXERCISE_ACCESS,zFOOD_ENVIRONMENT_INDEX,zCIVIC_ASSOCIATIONS,zCRIME_RATE,zDAILY_POLLUTION,zCHILD_MORTALITY_RATE,zCHILDREN_UNINSURED,zDISCONNECTED_YOUTH,zHOUSING_PROBLEMS,zSCORE_TOTAL
2919.0,50.0,3.0,5.0,51.0,107.0,Virginia,"Loudoun County, Virginia",8272.0,4.0,30.0,...,0.433,2.771,-0.286,1.187,-0.18,1.311,0.324,1.403,1.171,1.582
268.0,50.0,4.0,8.0,8.0,35.0,Colorado,"Douglas County, Colorado",9521.0,9.0,25.0,...,1.294,1.524,-0.412,1.2,1.699,1.633,1.069,1.403,1.385,1.361
2564.0,50.0,3.0,6.0,47.0,187.0,Tennessee,"Williamson County, Tennessee",8545.0,8.0,26.0,...,-1.459,1.628,-0.248,0.98,-0.305,1.568,0.697,0.951,1.171,1.296
740.0,50.0,2.0,3.0,18.0,57.0,Indiana,"Hamilton County, Indiana",9303.0,9.0,28.0,...,0.605,1.317,-0.171,1.407,-1.307,1.44,0.324,1.177,1.815,1.264
2099.0,50.0,2.0,3.0,39.0,41.0,Ohio,"Delaware County, Ohio",9350.0,9.0,26.0,...,0.433,1.628,-0.453,1.205,-1.369,1.568,0.697,1.403,1.385,1.221
455.0,50.0,3.0,5.0,13.0,117.0,Georgia,"Forsyth County, Georgia",10233.0,7.0,28.0,...,-0.255,1.94,-0.539,1.26,-0.43,1.44,-0.048,0.499,0.956,1.179
2895.0,50.0,3.0,5.0,51.0,59.0,Virginia,"Fairfax County, Virginia",7799.0,6.0,33.0,...,1.38,2.356,1.378,1.168,0.572,1.119,-0.048,1.403,0.527,1.141
1226.0,50.0,3.0,5.0,24.0,27.0,Maryland,"Howard County, Maryland",8563.0,8.0,32.0,...,1.207,1.94,-0.197,0.66,-0.618,1.183,1.069,1.177,0.956,1.069
1407.0,50.0,2.0,4.0,27.0,139.0,Minnesota,"Scott County, Minnesota",8093.0,6.0,24.0,...,0.949,2.044,-0.61,1.118,-0.618,1.697,1.069,1.177,1.385,1.057
2873.0,50.0,3.0,5.0,51.0,13.0,Virginia,"Arlington County, Virginia",7469.0,8.0,42.0,...,1.38,1.94,-0.133,0.907,-0.242,0.541,0.324,1.403,0.527,1.016


In [35]:
df_sorted.to_csv("/Users/alexmahadevan/Code/raise_a_family/county data sorted.csv")

<p>Just glancing at the top few counties, you can see affordability kind of got lost in there, even though we did include healthcare, rent and mortgage costs in the analysis.</p>
<p>To do the next part of the analysis, I'm only picking out the top 200 counties.</p> 
<p>Then I'll hard-code the school grades from [Niche](https://www.niche.com/k12/search/best-school-districts/), and re-run the analysis putting more weight on affordability. That should produce our final list.</p>

### Final Analysis
<p>Now that we have comprehensive school grades, we can re-rank the top 200 counties.</p>

In [36]:
df = pd.read_csv("/Users/alexmahadevan/Code/raise_a_family/final 200 no z.csv")

In [37]:
df.head(1)

Unnamed: 0.1,Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTY,SCHOOL_GRADE,HEALTHCARE_COST,...,INCOME_AS_PERCENT_OF_STATE,UNEMPLOYMENT,MORTGAGE_INCOME_RATIO,RENT_INCOME_RATIO,HEALTHCARE_INCOME_RATIO,DOCTORS,EXERCISE_ACCESS,FOOD_ENVIRONMENT_INDEX,CIVIC_ASSOCIATIONS,CRIME_RATE
0,2919,50,3,5,51,107,Virginia,"Loudoun County, Virginia",4.0,8272,...,190.0,3.2,46.428,74.013,14.923,5,89,10.0,239,85


<p>Now we will re-run the z-score analysis on the final 200 counties.</p>

In [38]:
df['zFAMKIDSCHANGE'] = (df.CHANGE_IN_FAM_WITH_KIDS - df.CHANGE_IN_FAM_WITH_KIDS.mean())/ df.CHANGE_IN_FAM_WITH_KIDS.std()

In [39]:
df['zPERCENT_WITH_KIDS'] = (df.PERCENT_WITH_KIDS - df.PERCENT_WITH_KIDS.mean())/ df.PERCENT_WITH_KIDS.std()

In [40]:
df['zBIRTHS_PER_CAPITA'] = (df.BIRTHS_PER_CAPITA - df.BIRTHS_PER_CAPITA.mean())/ df.BIRTHS_PER_CAPITA.std()

In [41]:
df['zAVG_NETMIG'] = (df.AVG_NETMIG - df.AVG_NETMIG.mean())/ df.AVG_NETMIG.std()

In [42]:
df['zPOP_CHANGE'] = (df.POP_CHANGE - df.POP_CHANGE.mean())/ df.POP_CHANGE.std()

In [43]:
df['zINCOME_AS_PERCENT_OF_STATE'] = (df.INCOME_AS_PERCENT_OF_STATE - df.INCOME_AS_PERCENT_OF_STATE.mean())/ df.INCOME_AS_PERCENT_OF_STATE.std()

In [44]:
df['zPERCENTAGE_IN_POVERTY'] = -1*((df.PERCENTAGE_IN_POVERTY - df.PERCENTAGE_IN_POVERTY.mean())/ df.PERCENTAGE_IN_POVERTY.std())

In [45]:
df['zPERCENT_W_HEALTH_INSURANCE'] = (df.PERCENT_W_HEALTH_INSURANCE - df.PERCENT_W_HEALTH_INSURANCE.mean())/ df.PERCENT_W_HEALTH_INSURANCE.std()

In [46]:
df['zUNEMPLOYMENT'] = -1*((df.UNEMPLOYMENT - df.UNEMPLOYMENT.mean())/ df.UNEMPLOYMENT.std())

In [47]:
df['zBACHELORS_DEGREE'] = (df.BACHELORS_DEGREE - df.BACHELORS_DEGREE.mean())/ df.BACHELORS_DEGREE.std()

In [93]:
df['zMEDIAN_MORTAGE_PAYMENTS'] = -1*((df.MEDIAN_MORTAGE_PAYMENTS - df.MEDIAN_MORTAGE_PAYMENTS.mean())/ df.MEDIAN_MORTAGE_PAYMENTS.std())

In [94]:
df['zMEDIAN_RENT_PAYMENT'] = -1*((df.MEDIAN_RENT_PAYMENT - df.MEDIAN_RENT_PAYMENT.mean())/ df.MEDIAN_RENT_PAYMENT.std())

In [95]:
df['zHEALTHCARE_COST'] = -1*((df.HEALTHCARE_COST - df.HEALTHCARE_COST .mean())/ df.HEALTHCARE_COST.std())

In [51]:
df['zDOCTORS'] = (df.DOCTORS - df.DOCTORS.mean())/ df.DOCTORS.std()

In [52]:
df['zEXERCISE_ACCESS'] = (df.EXERCISE_ACCESS - df.EXERCISE_ACCESS.mean())/ df.EXERCISE_ACCESS.std()

In [53]:
df['zFOOD_ENVIRONMENT_INDEX'] = (df.FOOD_ENVIRONMENT_INDEX - df.FOOD_ENVIRONMENT_INDEX.mean())/ df.FOOD_ENVIRONMENT_INDEX.std()

In [73]:
df['zCIVIC_ASSOCIATIONS'] = (df.CIVIC_ASSOCIATIONS - df.CIVIC_ASSOCIATIONS.mean())/ df.CIVIC_ASSOCIATIONS.std()

In [55]:
df['zCRIME_RATE'] = -1*((df.CRIME_RATE - df.CRIME_RATE.mean())/ df.CRIME_RATE.std())

In [56]:
df['zDAILY_POLLUTION'] = -1*((df.DAILY_POLLUTION - df.DAILY_POLLUTION.mean())/ df.DAILY_POLLUTION.std())

In [57]:
df['zCHILD_MORTALITY_RATE'] = -1*((df.CHILD_MORTALITY_RATE - df.CHILD_MORTALITY_RATE.mean())/ df.CHILD_MORTALITY_RATE.std())

In [58]:
df['zCHILDREN_UNINSURED'] = -1*((df.CHILDREN_UNINSURED - df.CHILDREN_UNINSURED.mean())/ df.CHILDREN_UNINSURED.std())

In [59]:
df['zDISCONNECTED_YOUTH'] = -1*((df.DISCONNECTED_YOUTH - df.DISCONNECTED_YOUTH.mean())/ df.DISCONNECTED_YOUTH.std())

In [60]:
df['zHOUSING_PROBLEMS'] = -1*((df.HOUSING_PROBLEMS - df.HOUSING_PROBLEMS.mean())/ df.HOUSING_PROBLEMS.std())

In [61]:
df['zGRADE'] = (df.SCHOOL_GRADE - df.SCHOOL_GRADE.mean())/df.SCHOOL_GRADE.std()

<p>Now, let's create four categories grouping some of these variables together. Then we can weigh affordability greater, and rank the counties in other categories.</p>
<p>Here are the categories: Education, Health, Family Friendly and Affordability.</p>

In [96]:
df['EDUCATION'] = (df.zBACHELORS_DEGREE + df.zGRADE)/2

In [97]:
df['FAMILY_FRIENDLY'] = (df.zPERCENT_WITH_KIDS + df.zFAMKIDSCHANGE + df.zCRIME_RATE + df.zFOOD_ENVIRONMENT_INDEX + df.zBIRTHS_PER_CAPITA + df.zCIVIC_ASSOCIATIONS)/6

In [103]:
df['AFFORDABILITY'] = (df.zMEDIAN_MORTAGE_PAYMENTS + df.zMEDIAN_RENT_PAYMENT + df.zHEALTHCARE_COST + df.zUNEMPLOYMENT + df.zINCOME_AS_PERCENT_OF_STATE)/5

In [104]:
df['HEALTH_INDEX'] = (df.zDOCTORS + df.zDAILY_POLLUTION + df.zCHILD_MORTALITY_RATE + df.zCHILDREN_UNINSURED + df.zPERCENT_W_HEALTH_INSURANCE + df.zEXERCISE_ACCESS + df.zPERCENTAGE_IN_POVERTY)/7

In [105]:
df['RANK'] = 0.6*df.AFFORDABILITY + 0.2*df.EDUCATION + 0.1*df.HEALTH_INDEX + 0.1*df.FAMILY_FRIENDLY

In [106]:
df.sort_values('RANK' , ascending=False)

Unnamed: 0.1,Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTY,SCHOOL_GRADE,HEALTHCARE_COST,...,zGRADE,EDUCATION,AFFORDABILITY,HEALTH_INDEX,zCIVIC_ASSOCIATION,FAMILY_FRIENDLY,RANK,zMEDIAN_MORTAGE_PAYMENTS,zMEDIAN_RENT_PAYMENT,zHEALTHCARE_COST
2,2564,50,3,6,47,187,Tennessee,"Williamson County, Tennessee",4.000,8545,...,1.122,1.513,0.885,0.134,-0.326,0.520,0.899,-0.422,-0.544,0.586
3,740,50,2,3,18,57,Indiana,"Hamilton County, Indiana",4.000,9303,...,1.122,1.523,0.784,0.260,-0.228,0.736,0.875,0.423,0.119,-0.114
19,1392,50,2,4,27,109,Minnesota,"Olmsted County, Minnesota",3.800,6668,...,0.323,0.347,0.916,0.501,-0.539,0.258,0.695,0.738,0.879,2.319
60,856,50,2,4,19,103,Iowa,"Johnson County, Iowa",3.800,7036,...,0.323,0.879,0.860,0.206,-0.700,-0.305,0.682,0.679,0.706,1.979
4,2099,50,2,3,39,41,Ohio,"Delaware County, Ohio",3.800,9350,...,0.323,0.906,0.665,0.439,-0.588,0.552,0.679,-0.404,0.372,-0.158
49,3109,50,2,3,55,25,Wisconsin,"Dane County, Wisconsin",4.000,7353,...,1.122,1.097,0.675,0.336,0.880,-0.108,0.647,0.108,0.493,1.686
11,950,50,2,4,20,91,Kansas,"Johnson County, Kansas",4.000,9732,...,1.122,1.358,0.458,0.451,0.435,0.440,0.636,0.237,0.396,-0.511
76,2855,50,1,1,50,7,Vermont,"Chittenden County, Vermont",4.000,7303,...,1.122,1.140,0.627,0.598,-0.490,-0.399,0.624,0.005,-0.227,1.732
0,2919,50,3,5,51,107,Virginia,"Loudoun County, Virginia",4.000,8272,...,1.122,1.646,0.066,0.381,-0.375,1.489,0.556,-1.858,-2.502,0.838
41,118,50,3,7,5,7,Arkansas,"Benton County, Arkansas",4.000,9231,...,1.122,0.155,0.879,-0.673,-0.442,0.570,0.548,1.230,1.004,-0.048


In [102]:
df.to_csv("/Users/alexmahadevan/Desktop/test.csv")