# CS498 Data Visualisation - Final Project Data Pre-processing
## United States Health Inequality data pre-processing

**Author: Graham Chester**

**Date: 20-Jul-2018**

This Jupyter notebook takes two main datasets (health by city, life expectancy by county), and two value-mapping datasets, and wrangles them into a single CSV file (health.csv) suitable for use by D3 visualisation.

The datasets used are:

1) Center for Disease control 500 Cities health data, sourced from CDC 500 Cities portal: https://chronicdata.cdc.gov/500-Cities/500-Cities-Local-Data-for-Better-Health-2017-relea/6vp6-wxuq

2) Life Expectancy by income dataset, sourced from the Health Inequality Project https://healthinequality.org/data/

3) City to lat, long, FIPS mapping dataset, sourced from SimpleMaps https://simplemaps.com/data/us-cities)

4) Commuting Zone to FIPS mapping dataset, sourced from IPUMS census data https://usa.ipums.org/usa/volii/1990LMAascii.txt

## Imports and convenient display settings

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
import seaborn as sns

%matplotlib inline

# set Jupyter to display ALL output from a cell (not just last output)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# set pandas and numpy options to make print format nicer
pd.set_option('display.width',110); pd.set_option('display.max_columns',100)
pd.set_option('display.max_colwidth', 200); pd.set_option('display.max_rows', 500)
np.set_printoptions(linewidth=100, threshold=5000, edgeitems=10, suppress=True)

## Read CDC 500 Cities health dataset, filter and reshape 
1) Read dataset with only required fields (for performance as dataset is 230MB)

2) Clean up data types, names and filter for required rows

3) Rename columns with more information for better display in d3

4) Pivot so there is one row per city with many measure

In [3]:
# only read in the required columns as dataset is large
columns = ['StateAbbr','StateDesc','CityName','GeographicLevel','Data_Value','Data_Value_Type','PopulationCount',
           'CityFIPS','Short_Question_Text', 'Category', 'Measure']

health_R = pd.read_csv('data/500_Cities.csv', dtype={'CityFIPS': str}, usecols=columns, na_filter=True)

# change the United States total row so it behaves like a state when visualising
health_R.loc[health_R.StateDesc=='United States', "CityName"] = "United States"
health_R.loc[health_R.StateDesc=='United States', "StateDesc"] = "Average"
health_R.shape
# health_R.head(1)

# we only need non-adusted numbers, and only for cities not electoral areas
health_F = health_R[(health_R.Data_Value_Type=='Crude prevalence') & 
                    (health_R.GeographicLevel.isin(['City','US'])) ]
health_F.shape

# pivot so that we have columns for all the health measures, instead of a row for each
health_P = pd.pivot_table(health_F, values='Data_Value', index=['StateDesc', 'CityName'], 
                          columns=['Short_Question_Text'], aggfunc=np.sum).reset_index()
health_P.shape

# merge with fields from original dataset to get overall health row
health = pd.merge(health_P, health_F[['StateDesc','CityName','CityFIPS','PopulationCount']].drop_duplicates(),
                  how='left', on=['StateDesc','CityName'])
health.shape
# health.head(2)

# subjectively assign weights to each preventative measure (negative), unhealthy behaviour and health outcome
weights = {
           'Health Insurance':        6,
           'Arthritis':               3,
           'Binge Drinking':          6,
           'High Blood Pressure':     7,
           'Taking BP Medication':    3,
           'Cancer (except skin)':   10,
           'Current Asthma':          4,
           'Coronary Heart Disease': 10,
           'Annual Checkup':         -3,
           'Cholesterol Screening':  -3,
           'Colorectal Cancer Screening': -2,
           'COPD'                       : 10,
           'Core preventive services for older men':   -3,
           'Core preventive services for older women': -3,
           'Current Smoking':         7,
           'Dental Visit':           -1,
           'Diabetes':                8,
           'High Cholesterol':        6,
           'Chronic Kidney Disease':  8,
           'Physical Inactivity':     3,
           'Mammography':            -2,
           'Mental Health':           6,
           'Obesity':                 7,
           'Pap Smear Test':         -2,
           'Sleep < 7 hours':         2,
           'Physical Health':         5,
           'Stroke':                  9,
           'Teeth Loss':              1,
          }

# calculate health score for a city as a rating with 0 (worst) to 100 (best)
columns = list(health_F.Short_Question_Text.unique())
health['health_score'] = health[columns].mul(pd.Series(weights), axis=1).sum(axis=1)

health['health_score'] = health.health_score - health.health_score.min()
health['health_score'] = (98 * (1 - health.health_score / health.health_score.max()) +1).astype(int)

health['Health Insurance'] = 100 - health['Health Insurance']

# population should be integer
health['PopulationCount'] = health.PopulationCount.astype(int)
# health.head(2)


(810103, 11)

(14028, 11)

(501, 30)

(501, 32)

## Add City and county details to health dataframe for each city
1) Read US Cities data file. 

2) Append latitude, longitude, county FIPS to the above health dataframe (so can be joined with life dataframe)

In [4]:
# read us cities data
uscities = pd.read_csv('data/uscitiesLatLongFIPS.csv', dtype={'PlaceFP':str, 'county_fips':str})
# uscities.head(2)

# reformat fields to not lose leading zeroes, and to corrent spelling of prefixes
uscities['county_fips'] = uscities.county_fips.apply('{:0>5}'.format)
uscities['city_ascii']  = uscities.city_ascii.str.replace('Saint', 'St.')

# merge city details with health dataframe
health = pd.merge(health, uscities[['city_ascii', 'state_name','lat','lng', 'county_fips','county_name']], 
                  how='left', left_on=['StateDesc','CityName'], right_on=['state_name','city_ascii'])

# change lat and long onUnited States row so d3 can place properly on screen
health.loc[health.CityName=='United States', 'lat'] =  35
health.loc[health.CityName=='United States', 'lng'] = -71
health.shape
# health.head(2)
# 36.846815	-76.285218

(501, 39)

## Read life expectancy by income dataset and pre-process
1) Read life expectancy by income range dataset

2) Rename colums for ease, and calculate various average life expectacies and disparity with income

In [5]:
# read into dataframe only the required columns
life = pd.read_csv('data/health_ineq_all_online_tables.csv', skiprows=6,
                   usecols=['cz','czname','statename','stateabbrv', 'le_agg_q1_F', 'le_agg_q2_F', 'le_agg_q3_F', 
                            'le_agg_q4_F', 'le_agg_q1_M', 'le_agg_q2_M', 'le_agg_q3_M', 'le_agg_q4_M', ])

# better names for colums for easier understanding in d3
life = life.rename(columns={'le_agg_q1_F': 'life_q1_f', 'le_agg_q2_F': 'life_q2_f', 
                            'le_agg_q3_F': 'life_q3_f', 'le_agg_q4_F': 'life_q4_f',
                            'le_agg_q1_M': 'life_q1_m', 'le_agg_q2_M': 'life_q2_m',
                            'le_agg_q3_M': 'life_q3_m', 'le_agg_q4_M': 'life_q4_m',})

# calculate average life expectancies for male, female, overall, and for top and bottom 25% income earners
life['expectancy_f'] = life[['life_q1_f', 'life_q2_f', 'life_q3_f', 'life_q4_f']].mean(axis=1).round(1)
life['expectancy_m'] = life[['life_q1_m', 'life_q2_m', 'life_q3_m', 'life_q4_m']].mean(axis=1).round(1)
life['expectancy_top25'] = (life['life_q4_f']/2 + life['life_q4_m']/2).round(1)
life['expectancy_bot25'] = (life['life_q1_f']/2 + life['life_q1_m']/2).round(1)
life['expectancy_avg'] = (life['expectancy_f']/2 + life['expectancy_m']/2).round(1)


# calculate the disparity in life expectancy between top and bottom income earners
life['disparity_m']  = (life['life_q4_m'] - life['life_q1_m']).round(1)
life['disparity_f']  = (life['life_q4_f'] - life['life_q1_f']).round(1)
life['disparity_avg']  = (life['expectancy_top25'] - life['expectancy_bot25']).round(1)
life = life.dropna(axis=0, how='all')
life.shape
# life.head(2)

(595, 20)

## Add county FIPS code to life expectancy dataset
1) Read commuting zone to county FIPS mapping dataset

2) Add county FIPS code to life expectancy data

3) Drop non-required fields

In [6]:
cz_FIPS = pd.read_csv('data/1990LMAascii.csv',sep='\t', dtype={'FIPS': str})
cz_FIPS = cz_FIPS[cz_FIPS['County Name'] != 'Market Area Total']
cz_FIPS['stateabbrv'] = cz_FIPS['County Name'].str[-2:]
cz_FIPS.shape
# cz_FIPS.head(2)

# life = pd.merge(life, cz_FIPS, how='left', left_on=['stateabbrv','cz'], right_on=['stateabbrv','LMA/CZ'])
life = pd.merge(life, cz_FIPS, how='left', left_on=['cz'], right_on=['LMA/CZ'])
# life.head(1)
life = life.drop(['Total Population', 'Labor Force','County Name','LMA/CZ'], axis=1)
life.shape
# life.head(2)

(3141, 6)

(2857, 22)

## Merge health and life expectancy datasets

1) Merge datasets by county - note that city level life expectancy data was not available, however in most relevant cases US counties are small and cover one or just a few similar cities.

2) Calculate the mean life expectancies for the United States row

3) Save to CSV

4) Check for any NaN values (should just be the US overall row)

In [17]:
# merge health and life datasets by county FIPS code
total = pd.merge(health, life, how='left', left_on=['county_fips'], right_on=['FIPS'])
total = total.sort_values(['CityName','StateDesc'])

# calculate mean life expectancy data for United States overall
usa_means = life.mean(axis=0) 
total.loc[total.CityName=='United States', usa_means.index[1:-1]] = np.round(usa_means[1:-1].values,1)
total.loc[total.CityName=='United States'] = total.loc[total.CityName=='United States'].fillna('')
total.shape
# total.head(3)
               
total['Annual Checkup']        = 100 - total['Annual Checkup']
total['Cholesterol Screening'] = 100 - total['Cholesterol Screening']
total['Colorectal Cancer Screening']              = 100 - total['Colorectal Cancer Screening']
total['Core preventive services for older men']   = 100 - total['Core preventive services for older men']
total['Core preventive services for older women'] = 100 - total['Core preventive services for older women']
total['Dental Visit']          = 100 - total['Dental Visit']
total['Health Insurance']      = 100 - total['Health Insurance']
total['Mammography']           = 100 - total['Mammography']
total['Pap Smear Test']        = 100 - total['Pap Smear Test']
total['Taking BP Medication']  = 100 - total['Taking BP Medication']


# rename columns to include more details for d3 to display
new_columns = {'Annual Checkup':                          '1P:No Annual Checkup', 
               'Cholesterol Screening':                   '1P:No Cholesterol Screening',
               'Colorectal Cancer Screening' :            '1P:No Colorectal Cancer Screening', 
               'Core preventive services for older men':  '1P:No Preventive services-older men',
               'Core preventive services for older women':'1P:No Preventive services-older women',
               'Dental Visit':                            '1P:No Dental Visit',
               'Health Insurance':                        '1P:No Health Insurance',
               'Mammography':                             '1P:No Mammography',
               'Pap Smear Test':                          '1P:No Pap Smear Test',
               'Taking BP Medication':                    '1P:Not Taking BP Medication',
               'Binge Drinking':         '2B:Binge Drinking',
               'Current Smoking':        '2B:Current Smoking',
               'Obesity':                '2B:Obesity',
               'Physical Inactivity':    '2B:Physical Inactivity',
               'Sleep < 7 hours':        '2B:Sleep < 7 hours',
               'Arthritis':              '3O:Arthritis',
               'COPD':                   '3O:Chronic Pulmonary Disease',
               'Cancer (except skin)':   '3O:Cancer',
               'Chronic Kidney Disease': '3O:Chronic Kidney Disease',
               'Coronary Heart Disease': '3O:Coronary Heart Disease',
               'Current Asthma':         '3O:Asthma',
               'Diabetes':               '3O:Diabetes',
               'High Blood Pressure':    '3O:High Blood Pressure',
               'High Cholesterol':       '3O:High Cholesterol',
               'Mental Health':          '3O:Mental Health',
               'Physical Health':        '3O:Physical Health',
               'Stroke':                 '3O:Stroke',
               'Teeth Loss':             '3O:All Teeth Lost',
               'expectancy_top25':  '4L:Life Expectancy Top 25%:Life expectancy in years, Top Quartile Income',
               'expectancy_bot25':  '4L:Life Expectancy Bottom 25%:Life expectancy in years, Bottom Quartile Income',
               'expectancy_avg':    '4L:Life Expectancy Average: Average Life Expectancy in Years',
                }
total = total.rename(columns=new_columns)

# rename columns to include detailed description as appended to the above column name
lookup = health_F[['Short_Question_Text','Measure']] \
                    .drop_duplicates().sort_values(['Short_Question_Text'])
lookup['Short_Question_Text'] = lookup.Short_Question_Text.map(new_columns)
lookup['Measure'] = lookup['Short_Question_Text'] + ':' + lookup.Measure
lookup = lookup.set_index('Short_Question_Text')['Measure'].to_dict()
total = total.rename(columns=lookup)

# Add Health Rank
total = total.sort_values("health_score",ascending=False).reset_index(drop=True)
total['health_rank'] = total.index+1
total = total.sort_values("PopulationCount",ascending=False).reset_index(drop=True)

total = pd.concat([total.loc[0:2],total]) # fix for d3 skipping forst 3 rows
total.index.name = 'id'

# save to CSV file for d3
total.to_csv('health.csv')

# check for any rows with NaNs, should just be a few fields on the United States row
total[total.isnull().any(axis=1)]

(501, 61)

Unnamed: 0_level_0,StateDesc,CityName,1P:No Annual Checkup:Visits to doctor for routine checkup within the past Year among adults aged >=18 Years,3O:Arthritis:Arthritis among adults aged >=18 Years,2B:Binge Drinking:Binge drinking among adults aged >=18 Years,3O:Chronic Pulmonary Disease:Chronic obstructive pulmonary disease among adults aged >=18 Years,3O:Cancer:Cancer (excluding skin cancer) among adults aged >=18 Years,1P:No Cholesterol Screening:Cholesterol screening among adults aged >=18 Years,3O:Chronic Kidney Disease:Chronic kidney disease among adults aged >=18 Years,"1P:No Colorectal Cancer Screening:Fecal occult blood test, sigmoidoscopy, or colonoscopy among adults aged 50–75 Years","1P:No Preventive services-older men:Older adult men aged >=65 Years who are up to date on a core set of clinical preventive services: Flu shot past Year, PPV shot ever, Colorectal cancer screening","1P:No Preventive services-older women:Older adult women aged >=65 Years who are up to date on a core set of clinical preventive services: Flu shot past Year, PPV shot ever, Colorectal cancer screening, and Mammogram past 2 Years",3O:Coronary Heart Disease:Coronary heart disease among adults aged >=18 Years,3O:Asthma:Current asthma among adults aged >=18 Years,2B:Current Smoking:Current smoking among adults aged >=18 Years,1P:No Dental Visit:Visits to dentist or dental clinic among adults aged >=18 Years,3O:Diabetes:Diagnosed diabetes among adults aged >=18 Years,1P:No Health Insurance:Current lack of health insurance among adults aged 18–64 Years,3O:High Blood Pressure:High blood pressure among adults aged >=18 Years,3O:High Cholesterol:High cholesterol among adults aged >=18 Years who have been screened in the past 5 Years,1P:No Mammography:Mammography use among women aged 50–74 Years,3O:Mental Health:Mental health not good for >=14 days among adults aged >=18 Years,2B:Obesity:Obesity among adults aged >=18 Years,1P:No Pap Smear Test:Papanicolaou smear use among adult women aged 21–65 Years,3O:Physical Health:Physical health not good for >=14 days among adults aged >=18 Years,2B:Physical Inactivity:No leisure-time physical activity among adults aged >=18 Years,2B:Sleep < 7 hours:Sleeping less than 7 hours among adults aged >=18 Years,3O:Stroke:Stroke among adults aged >=18 Years,1P:Not Taking BP Medication:Taking medicine for high blood pressure control among adults aged >=18 Years with high blood pressure,3O:All Teeth Lost:All teeth lost among adults aged >=65 Years,CityFIPS,PopulationCount,health_score,city_ascii,state_name,lat,lng,county_fips,county_name,cz,czname,statename,stateabbrv_x,life_q1_f,life_q2_f,life_q3_f,life_q4_f,life_q1_m,life_q2_m,life_q3_m,life_q4_m,expectancy_f,expectancy_m,"4L:Life Expectancy Top 25%:Life expectancy in years, Top Quartile Income","4L:Life Expectancy Bottom 25%:Life expectancy in years, Bottom Quartile Income",4L:Life Expectancy Average: Average Life Expectancy in Years,disparity_m,disparity_f,disparity_avg,FIPS,stateabbrv_y,health_rank
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1


In [16]:
total

Unnamed: 0_level_0,StateDesc,CityName,1P:No Annual Checkup:Visits to doctor for routine checkup within the past Year among adults aged >=18 Years,3O:Arthritis:Arthritis among adults aged >=18 Years,2B:Binge Drinking:Binge drinking among adults aged >=18 Years,3O:Chronic Pulmonary Disease:Chronic obstructive pulmonary disease among adults aged >=18 Years,3O:Cancer:Cancer (excluding skin cancer) among adults aged >=18 Years,1P:No Cholesterol Screening:Cholesterol screening among adults aged >=18 Years,3O:Chronic Kidney Disease:Chronic kidney disease among adults aged >=18 Years,"1P:No Colorectal Cancer Screening:Fecal occult blood test, sigmoidoscopy, or colonoscopy among adults aged 50–75 Years","1P:No Preventive services-older men:Older adult men aged >=65 Years who are up to date on a core set of clinical preventive services: Flu shot past Year, PPV shot ever, Colorectal cancer screening","1P:No Preventive services-older women:Older adult women aged >=65 Years who are up to date on a core set of clinical preventive services: Flu shot past Year, PPV shot ever, Colorectal cancer screening, and Mammogram past 2 Years",3O:Coronary Heart Disease:Coronary heart disease among adults aged >=18 Years,3O:Asthma:Current asthma among adults aged >=18 Years,2B:Current Smoking:Current smoking among adults aged >=18 Years,1P:No Dental Visit:Visits to dentist or dental clinic among adults aged >=18 Years,3O:Diabetes:Diagnosed diabetes among adults aged >=18 Years,1P:No Health Insurance:Current lack of health insurance among adults aged 18–64 Years,3O:High Blood Pressure:High blood pressure among adults aged >=18 Years,3O:High Cholesterol:High cholesterol among adults aged >=18 Years who have been screened in the past 5 Years,1P:No Mammography:Mammography use among women aged 50–74 Years,3O:Mental Health:Mental health not good for >=14 days among adults aged >=18 Years,2B:Obesity:Obesity among adults aged >=18 Years,1P:No Pap Smear Test:Papanicolaou smear use among adult women aged 21–65 Years,3O:Physical Health:Physical health not good for >=14 days among adults aged >=18 Years,2B:Physical Inactivity:No leisure-time physical activity among adults aged >=18 Years,2B:Sleep < 7 hours:Sleeping less than 7 hours among adults aged >=18 Years,3O:Stroke:Stroke among adults aged >=18 Years,1P:Not Taking BP Medication:Taking medicine for high blood pressure control among adults aged >=18 Years with high blood pressure,3O:All Teeth Lost:All teeth lost among adults aged >=65 Years,CityFIPS,PopulationCount,health_score,city_ascii,state_name,lat,lng,county_fips,county_name,cz,czname,statename,stateabbrv_x,life_q1_f,life_q2_f,life_q3_f,life_q4_f,life_q1_m,life_q2_m,life_q3_m,life_q4_m,expectancy_f,expectancy_m,"4L:Life Expectancy Top 25%:Life expectancy in years, Top Quartile Income","4L:Life Expectancy Bottom 25%:Life expectancy in years, Bottom Quartile Income",4L:Life Expectancy Average: Average Life Expectancy in Years,disparity_m,disparity_f,disparity_avg,FIPS,stateabbrv_y,health_rank
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1
0,California,San Ramon,33.1,15.8,17.4,2.8,5.4,17.7,1.8,30.2,68.2,66.5,3.1,7.3,8.8,21.4,6.2,5.3,21.2,29.9,19.1,8.1,16.8,14.6,7.2,13.3,33.2,1.5,32.7,4.9,0668378,72148,99,San Ramon,California,37.779927,-121.978015,06013,Contra Costa,37800,San Francisco,California,CA,84.56,85.18,86.35,88.11,79.71,81.49,83.31,85.79,86.1,82.6,87.0,82.1,84.4,6.1,3.5,4.9,06013,CA,1
1,California,Irvine,35.1,12.8,16.6,3.0,4.8,24.4,1.8,34.4,71.9,67.3,3.2,7.0,9.5,26.6,6.3,7.3,20.4,29.7,17.8,8.9,15.0,20.0,7.7,14.4,31.0,1.6,32.8,5.9,0636770,212375,95,Irvine,California,33.669465,-117.823111,06059,Orange,38300,Los Angeles,California,CA,84.94,84.93,85.97,87.61,80.51,81.42,82.80,85.53,85.9,82.6,86.6,82.7,84.2,5.0,2.7,3.9,06059,CA,2
2,California,Mountain View,36.7,13.4,17.4,3.0,5.3,21.3,2.0,33.6,70.3,67.2,3.4,7.1,9.0,24.4,6.4,7.8,20.3,27.7,21.5,8.5,18.4,16.1,7.7,14.9,31.7,1.7,34.1,6.0,0649670,74066,95,Mountain View,California,37.386052,-122.083851,06085,Santa Clara,37500,San Jose,California,CA,85.70,85.77,86.49,88.26,81.18,82.22,83.97,86.16,86.6,83.4,87.2,83.4,85.0,5.0,2.6,3.8,06085,CA,3
0,California,San Ramon,33.1,15.8,17.4,2.8,5.4,17.7,1.8,30.2,68.2,66.5,3.1,7.3,8.8,21.4,6.2,5.3,21.2,29.9,19.1,8.1,16.8,14.6,7.2,13.3,33.2,1.5,32.7,4.9,0668378,72148,99,San Ramon,California,37.779927,-121.978015,06013,Contra Costa,37800,San Francisco,California,CA,84.56,85.18,86.35,88.11,79.71,81.49,83.31,85.79,86.1,82.6,87.0,82.1,84.4,6.1,3.5,4.9,06013,CA,1
1,California,Irvine,35.1,12.8,16.6,3.0,4.8,24.4,1.8,34.4,71.9,67.3,3.2,7.0,9.5,26.6,6.3,7.3,20.4,29.7,17.8,8.9,15.0,20.0,7.7,14.4,31.0,1.6,32.8,5.9,0636770,212375,95,Irvine,California,33.669465,-117.823111,06059,Orange,38300,Los Angeles,California,CA,84.94,84.93,85.97,87.61,80.51,81.42,82.80,85.53,85.9,82.6,86.6,82.7,84.2,5.0,2.7,3.9,06059,CA,2
2,California,Mountain View,36.7,13.4,17.4,3.0,5.3,21.3,2.0,33.6,70.3,67.2,3.4,7.1,9.0,24.4,6.4,7.8,20.3,27.7,21.5,8.5,18.4,16.1,7.7,14.9,31.7,1.7,34.1,6.0,0649670,74066,95,Mountain View,California,37.386052,-122.083851,06085,Santa Clara,37500,San Jose,California,CA,85.70,85.77,86.49,88.26,81.18,82.22,83.97,86.16,86.6,83.4,87.2,83.4,85.0,5.0,2.6,3.8,06085,CA,3
3,California,Sunnyvale,35.8,13.4,15.3,3.0,5.3,21.4,2.0,34.7,70.4,67.0,3.5,6.8,8.8,25.6,6.9,8.2,21.1,28.6,21.8,8.2,16.9,18.4,7.7,15.7,32.9,1.7,32.7,6.2,0677000,140081,94,Sunnyvale,California,37.368830,-122.036350,06085,Santa Clara,37500,San Jose,California,CA,85.70,85.77,86.49,88.26,81.18,82.22,83.97,86.16,86.6,83.4,87.2,83.4,85.0,5.0,2.6,3.8,06085,CA,4
4,California,Pleasanton,34.0,18.8,17.4,3.6,6.3,17.2,2.0,29.9,64.1,67.4,4.0,8.1,9.4,23.5,6.8,5.7,23.5,30.1,21.5,8.7,17.3,14.6,8.2,14.3,29.9,1.9,29.3,6.1,0657792,70285,93,Pleasanton,California,37.662431,-121.874679,06001,Alameda,37800,San Francisco,California,CA,84.56,85.18,86.35,88.11,79.71,81.49,83.31,85.79,86.1,82.6,87.0,82.1,84.4,6.1,3.5,4.9,06001,CA,5
5,Massachusetts,Cambridge,26.8,14.8,19.0,3.4,4.5,27.0,1.8,26.3,66.6,68.2,3.1,9.7,12.5,26.8,5.5,6.4,20.4,26.7,16.2,10.5,20.0,15.5,7.0,19.9,33.6,1.6,29.2,12.4,2511000,105162,93,Cambridge,Massachusetts,42.375097,-71.105608,25017,Middlesex,20500,Boston,Massachusetts,MA,83.18,84.25,86.02,87.92,78.15,80.49,82.94,85.63,85.3,81.8,86.8,80.7,83.6,7.5,4.7,6.1,25017,MA,6
6,Colorado,Boulder,44.9,15.8,19.6,3.7,4.7,35.3,1.9,35.5,61.4,63.6,3.4,9.0,13.4,28.0,4.1,10.7,17.5,25.5,27.1,10.1,15.9,18.6,7.9,13.7,25.2,1.7,36.5,8.6,0807850,97385,92,Boulder,Colorado,40.014986,-105.270546,08013,Boulder,28900,Denver,Colorado,CO,82.53,84.81,86.40,88.43,77.78,81.14,83.83,86.51,85.5,82.3,87.5,80.2,83.9,8.7,5.9,7.3,08013,CO,7
