# Replication of Spielman et al.â€™s 2020 Evaluation of the Social Vulnerability Index: Analysis Plan


In [1]:
.659**2 + .136**2 + (-.398)**2 + .160**2 + (-.064)**2 + (.568)**2 + (-.177)**2 + .06**2

0.9984299999999999

In [1]:
import pandas as pd
pd.options.display.max_colwidth = 1000

In [2]:
# Variable names
variables = [
    "B01002_001E", # median age
    "B03002_001E", # total population of respondents to race/ethnicity
    "B03002_004E", # total black
    "B03002_005E", # total native american
    "B03002_006E", # total asian
    "B03002_012E", # total latinx
    "B06001_002E", # total under 5
    "B09020_001E", # total above 65 
    "B01003_001E", # total population 
    "B25008_001E", # total population in occupied housing units
    "B25002_002E", # total occupied housing units 
    "B25003_003E", # total renter occupied housing units
    "B25002_001E", # total housing units for which occupancy status is known
    "B09020_021E", # total 65+ living in group quarters
    "B01001_026E", # total female
    "B11001_006E", # total female-headed family households
    "B11001_001E", # total households 
    "B25002_003E", # total vacant housing units
    "B19025_001E", # aggregate household income    
    "B23022_025E", # total male unemployed for last 12 months
    "B23022_049E", # total female unemployed for last 12 months
    "B23022_001E", # total for unemployment by sex stats
    "B17021_002E", # total pop below poverty level
    "B17021_001E", # total pop for which poverty info available 
    "B25024_010E", # number of mobile home housing units in structure
    "B25024_001E", # total units in structure
    "C24010_038E", # total female employed
    "C24010_001E", # total for which sex and occupation known
    "B19055_002E", # total households with social security income
    "B19055_001E", # total households for which social security income status known 
    "B09002_002E", # total children in married couple families
    "B09002_001E", # total children by family type and age
    "B19001_017E", # total households over 200k income
    "B06007_005E", # total speak spanish, speak english less than very well
    "B06007_008E", # total speak another language, speak english less than very well
    "B06007_001E", # total speak another language 
    "B16010_002E", # total less than high school
    "B16010_001E", # total for which education, employment, language at home known 
    "C24050_002E", # total in extractive industries
    "C24050_001E", # total for which industry known 
    "C24050_029E", # total in service occupations
    "B08201_002E", # total households no vehicle available
    "B08201_001E", # total households for which vehicle status and family size known 
    "B25064_001E", # median gross rent
    "B25077_001E"  # median home value
]

# Aliases
aliases = [
    "median age",
    "total population of respondents to race/ethnicity",
    "total Black population",
    "total Native American population",
    "total Asian population",
    "total Latinx population",
    "total population under 5 years of age",
    "total population over 65 years of age",
    "total population",
    "total population in occupied housing units",
    "total occupied housing units",
    "total renter occupied housing units",
    "total housing units for which occupancy status is known",
    "total 65+ living in group quarters",
    "total female population",
    "total female-headed family households",
    "total households for which household type is known",
    "total vacant housing units",
    "aggregate household income",
    "total males unemployed for last 12 months",
    "total females unemployed for last 12 months",
    "total population for which unemployment and sex cross-tabulations known",
    "total population below poverty level",
    "total population for which poverty information available",
    "number of mobile home housing units in structure",
    "total housing units in structure",
    "total female employed",
    "total population for which sex and occupation known",
    "total households with social security income",
    "total households for which social security income status known",
    "total children in married couple families",
    "total children for which family type and age are known",
    "total households with over 200k income",
    "total Spanish-speakers who speak english less than very well",
    "total people who speak another language and speak English less than very well",
    "total population with known language spoken at home and English ability",
    "total population with less than a high school graduate education",
    "total for which education, employment, language at home known",
    "total population in extractive industries",
    "total population for which industry known",
    "total people in service occupations",
    "total households with no available vehicle",
    "total households for which vehicle status and family size known",
    "median gross rent",
    "median home value"
]

**Census data website:**
- 1-year 2005-2021: https://www.census.gov/data/developers/data-sets/acs-1year.html
- 3-year 2007-2013: https://www.census.gov/data/developers/data-sets/acs-3year.html
- 5-year 2009-2021: https://www.census.gov/data/developers/data-sets/acs-5year.html

**TEMPORAL EXTENT APPROACH**

1-year ACS datasets:
- https://api.census.gov/data/2009/acs/acs1/variables.html
- https://api.census.gov/data/2010/acs/acs1/variables.html
- https://api.census.gov/data/2011/acs/acs1/variables.html
- https://api.census.gov/data/2012/acs/acs1/variables.html
- https://api.census.gov/data/2013/acs/acs1/variables.html

3-year ACS datasets:
- I'm ignoring 2007 and 2008 because the 5-year starts in 2009 
- https://api.census.gov/data/2009/acs/acs3/variables.html
- https://api.census.gov/data/2010/acs/acs3/variables.html DOES NOT WORK... FOR SOME REASON DETAILED TABLES AIN'T INCLUDED HERE: https://www.census.gov/data/developers/data-sets/acs-3year.2010.html#list-tab-1929707922
- https://api.census.gov/data/2011/acs/acs3/variables.html
- https://api.census.gov/data/2012/acs/acs3/variables.html
- https://api.census.gov/data/2013/acs/acs3/variables.html

5-year ACS datasets:
- https://api.census.gov/data/2009/acs/acs5/variables.html
- https://api.census.gov/data/2010/acs/acs5/variables.html
- https://api.census.gov/data/2011/acs/acs5/variables.html
- https://api.census.gov/data/2012/acs/acs5/variables.html
- https://api.census.gov/data/2013/acs/acs5/variables.html
- I'm ignoring 2014-2021 because the 3-year ends in 2013
- How many independent 5-year samples could we get? Just 3, for example 2009, 2014, 2019

**TIME SERIES APPROACH**
- https://api.census.gov/data/2005/acs/acs1/variables.html
- https://api.census.gov/data/2006/acs/acs1/variables.html
- https://api.census.gov/data/2007/acs/acs1/variables.html
- https://api.census.gov/data/2008/acs/acs1/variables.html
- https://api.census.gov/data/2009/acs/acs1/variables.html
- https://api.census.gov/data/2010/acs/acs1/variables.html
- https://api.census.gov/data/2011/acs/acs1/variables.html
- https://api.census.gov/data/2012/acs/acs1/variables.html
- https://api.census.gov/data/2013/acs/acs1/variables.html
- https://api.census.gov/data/2014/acs/acs1/variables.html
- https://api.census.gov/data/2015/acs/acs1/variables.html
- https://api.census.gov/data/2016/acs/acs1/variables.html
- https://api.census.gov/data/2017/acs/acs1/variables.html
- https://api.census.gov/data/2018/acs/acs1/variables.html
- https://api.census.gov/data/2019/acs/acs1/variables.html
- https://api.census.gov/data/2020/acs/acs1/variables.html DOES NOT WORK -- NOT RELEASED BC OF COVID
- https://api.census.gov/data/2021/acs/acs1/variables.html DID NOT INCLUDE BELOW BC 2020 DOESN'T WORK

In [3]:
# try to get this to run online
# perhaps I should do an equivalency check on the variable Label and Concept as well (combine for definition?)
html_list = [
    "https://api.census.gov/data/2005/acs/acs1/variables.html",
    "https://api.census.gov/data/2006/acs/acs1/variables.html",
    "https://api.census.gov/data/2007/acs/acs1/variables.html",
    "https://api.census.gov/data/2008/acs/acs1/variables.html",
    "https://api.census.gov/data/2009/acs/acs1/variables.html",
    "https://api.census.gov/data/2010/acs/acs1/variables.html",
    "https://api.census.gov/data/2011/acs/acs1/variables.html",
    "https://api.census.gov/data/2012/acs/acs1/variables.html",
    "https://api.census.gov/data/2013/acs/acs1/variables.html",
    "https://api.census.gov/data/2014/acs/acs1/variables.html",
    "https://api.census.gov/data/2015/acs/acs1/variables.html",
    "https://api.census.gov/data/2016/acs/acs1/variables.html",
    "https://api.census.gov/data/2017/acs/acs1/variables.html",
    "https://api.census.gov/data/2018/acs/acs1/variables.html",
    "https://api.census.gov/data/2019/acs/acs1/variables.html",
    "https://api.census.gov/data/2009/acs/acs3/variables.html",
    "https://api.census.gov/data/2011/acs/acs3/variables.html",
    "https://api.census.gov/data/2012/acs/acs3/variables.html",
    "https://api.census.gov/data/2013/acs/acs3/variables.html",
    "https://api.census.gov/data/2009/acs/acs5/variables.html",
    "https://api.census.gov/data/2010/acs/acs5/variables.html",
    "https://api.census.gov/data/2011/acs/acs5/variables.html",
    "https://api.census.gov/data/2013/acs/acs5/variables.html"    
]

ref_var_list = pd.read_html("https://api.census.gov/data/2012/acs/acs5/variables.html")[0]
ref_meta = pd.DataFrame( {"Name": variables,
                          "Alias": aliases} )
ref_meta = ref_meta.merge(ref_var_list, on = "Name", how = "left")[["Name", "Label", "Concept", "Alias"]]

ref_meta["Definition_reference"] = ref_meta["Concept"] + ' ' + ref_meta["Label"]
ref_meta["Label_reference"] = ref_meta["Label"]
ref_meta = ref_meta.drop( ["Concept", "Label"], axis = 1 )

ref_meta[['Definition_reference']] = ref_meta[['Definition_reference']].replace([r"(?<!\d)\d{4}(?!\d)", '!', ':', ' --'], '', regex=True)
ref_meta['Definition_reference'] = ref_meta['Definition_reference'].str.lower()
ref_meta[['Label_reference']] = ref_meta[['Label_reference']].replace([r"(?<!\d)\d{4}(?!\d)", '!', ':'], '', regex=True)
ref_meta['Label_reference'] = ref_meta['Label_reference'].str.lower()


for link in html_list:
    var_list = pd.read_html(link)[0]
    acs_meta = pd.DataFrame( {"Name": variables,
                              "Alias": aliases} )
    acs_meta = acs_meta.merge(var_list, on = "Name", how = "left")[["Name", "Label", "Concept", "Alias"]]
    
    
    acs_meta["Definition"] = acs_meta["Concept"] + ' ' + acs_meta["Label"]
    
    if acs_meta["Concept"].isnull().sum() == 45:
        acs_meta[['Label']] = acs_meta[['Label']].replace([r"(?<!\d)\d{4}(?!\d)", '!', ':'], '', regex=True)
        acs_meta['Label'] = acs_meta['Label'].str.lower()
    else:
        acs_meta[['Definition']] = acs_meta[['Definition']].replace([r"(?<!\d)\d{4}(?!\d)", '!', ':', ' --'], '', regex=True)
        acs_meta['Definition'] = acs_meta['Definition'].str.lower()

    joined_meta = acs_meta.merge(ref_meta, on = ["Name", "Alias"])
    
    if acs_meta["Concept"].isnull().sum() == 45:
        eq = ~joined_meta['Label_reference'].eq(joined_meta['Label'])
        issues = joined_meta.loc[eq][['Name', 'Alias', 'Label_reference', 'Label']]
    else:
        eq = ~joined_meta['Definition_reference'].eq(joined_meta['Definition'])
        issues = joined_meta.loc[eq][['Name', 'Alias', 'Definition_reference', 'Definition']]

    if len(issues) > 0:
        print("--------------------------------------------------------------------------------------------------------------------------------\nVariable definitions in the HTML link", link, "have the following discrepancies with variable definitions in the 2012 5-year ACS link:")
        display(issues)
    else:
        print("--------------------------------------------------------------------------------------------------------------------------------\nThe following link contains all the right vars:", link)

--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2005/acs/acs1/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,
19,B23022_025E,total males unemployed for last 12 months,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotalmaledid not work in the past 12 months,
20,B23022_049E,total females unemployed for last 12 months,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotalfemaledid not work in the past 12 months,
21,B23022_001E,total population for which unemployment and sex cross-tabulations known,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotal,
22,B17021_002E,total population below poverty level,poverty status of individuals in the past 12 months by living arrangement estimatetotalincome in the past 12 months below poverty level,poverty status in the past 12 months of individuals by household type estimatetotalincome in the past 12 months below poverty level
23,B17021_001E,total population for which poverty information available,poverty status of individuals in the past 12 months by living arrangement estimatetotal,poverty status in the past 12 months of individuals by household type estimatetotal
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2006/acs/acs1/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,
19,B23022_025E,total males unemployed for last 12 months,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotalmaledid not work in the past 12 months,
20,B23022_049E,total females unemployed for last 12 months,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotalfemaledid not work in the past 12 months,
21,B23022_001E,total population for which unemployment and sex cross-tabulations known,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotal,
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2007/acs/acs1/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,
19,B23022_025E,total males unemployed for last 12 months,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotalmaledid not work in the past 12 months,
20,B23022_049E,total females unemployed for last 12 months,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotalfemaledid not work in the past 12 months,
21,B23022_001E,total population for which unemployment and sex cross-tabulations known,sex by work status in the past 12 months by usual hours worked per week in the past 12 months by weeks worked in the past 12 months for the population 16 to 64 years estimatetotal,
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2008/acs/acs1/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2009/acs/acs1/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"


--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2010/acs/acs1/variables.html
--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2011/acs/acs1/variables.html
--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2012/acs/acs1/variables.html
--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2013/acs/acs1/variables.html
----------------------------------------------------------------

Unnamed: 0,Name,Alias,Definition_reference,Definition
15,B11001_006E,total female-headed family households,"household type (including living alone) estimatetotalfamily householdsother familyfemale householder, no husband present","household type (including living alone) estimatetotalfamily householdsother familyfemale householder, no spouse present"


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2009/acs/acs3/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"


--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2011/acs/acs3/variables.html
--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2012/acs/acs3/variables.html
--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2013/acs/acs3/variables.html
--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2009/acs/acs5/variables.html have the following discrepancies with variable definitions in the 2012 5-y

Unnamed: 0,Name,Alias,Definition_reference,Definition
6,B06001_002E,total population under 5 years of age,place of birth by age in the united states estimatetotalunder 5 years,
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,
26,C24010_038E,total female employed,sex by occupation for the civilian employed population 16 years and over estimatetotalfemale,"sex by occupation for the civilian employed population 16 years and over estimatetotalmaleproduction, transportation, and material moving occupationstransportation and material moving occupationsmaterial moving workers"
33,B06007_005E,total Spanish-speakers who speak english less than very well,"place of birth by language spoken at home and ability to speak english in the united states estimatetotalspeak spanishspeak english less than ""very well""",
34,B06007_008E,total people who speak another language and speak English less than very well,"place of birth by language spoken at home and ability to speak english in the united states estimatetotalspeak other languagesspeak english less than ""very well""",
35,B06007_001E,total population with known language spoken at home and English ability,place of birth by language spoken at home and ability to speak english in the united states estimatetotal,
36,B16010_002E,total population with less than a high school graduate education,educational attainment and employment status by language spoken at home for the population 25 years and over estimatetotalless than high school graduate,
37,B16010_001E,"total for which education, employment, language at home known",educational attainment and employment status by language spoken at home for the population 25 years and over estimatetotal,
38,C24050_002E,total population in extractive industries,"industry by occupation for the civilian employed population 16 years and over estimatetotalagriculture, forestry, fishing and hunting, and mining",


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2010/acs/acs5/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,


--------------------------------------------------------------------------------------------------------------------------------
Variable definitions in the HTML link https://api.census.gov/data/2011/acs/acs5/variables.html have the following discrepancies with variable definitions in the 2012 5-year ACS link:


Unnamed: 0,Name,Alias,Definition_reference,Definition
7,B09020_001E,total population over 65 years of age,relationship by household type (including living alone) for the population 65 years and over estimatetotal,
13,B09020_021E,total 65+ living in group quarters,relationship by household type (including living alone) for the population 65 years and over estimatetotalin group quarters,


--------------------------------------------------------------------------------------------------------------------------------
The following link contains all the right vars: https://api.census.gov/data/2013/acs/acs5/variables.html
