# Broadband Upload Speeds in VT & NH

_By Carter Stowell, Feb 2019_  

This notebook provides a summary analysis of internet upload speeds by county across Vermont and New Hampshire.  

__Why look at upload speeds?__  

Businesses and entrepreneurs rely on strong upload and download speeds for basic operations and communications, such as video conferencing (Zoom, GoToMeeting, Google Hangouts, Skype), hard drive backups, cloud applications (Google Docs, Dropbox, iCloud), voice over IP (VoIP), telephone service, and attachments on outgoing email, to name a few. By contrast, home internet users care most about download speeds to satisfy typical consumer needs, such as watching movies on-demand. Lines blur in the case of remotely located employees, and businesses seeking workers and offices in well-connected areas.  

__How good is broadband in VT and NH overall with respect to other states?__  

In its 2018 analysis ["Internet Access Rankings"](https://www.usnews.com/news/best-states/rankings/infrastructure/internet-access) (U.S. News & World Report, 2018), Vermont and New Hampshire ranked 13th and 14th overall based on composite measures of broadband internet access. However, lower rankings of 22nd for VT and 34th for NH were given for access to ultra-fast internet of at least 1 gigabit speeds.  

__Which counties across VT and NH provide the best upload speeds?__  

To provide this analysis, the following script acquires, cleans and blends two data sources, one for county-level population data from the U.S. Census, and another for broadband availability, specifically Maximum Upload Speed by County. The output dataset(s) are used to develop mapping applications in CARTO. A derived measure of "Population-weighted upload speed" supports a per-capita comparison analysis by county, potentially useful when considering shared-bandwidth technologies like DSL.  

Weighted upload speeds can be calculated in various ways. Is a larger population better or worse? Should the weighted metric support comparisons between counties in different states, or only within a given state? The following are two of many possible weighting calculations. The first is a max upload speed (Mbps) per capita based on county population size (county_pop_size) which can be used to highlight counties with smaller populations and higher upload speeds. The second is based on max upload speed (Mbps) by the county percentage of state population (county_pop_pct) which would best support within-state but not between-state comparisons:

> $max\_upload\_wt = max\_upload / county\_pop\_size$

or

> $max\_upload\_wt\_pct = max\_upload / county\_pop\_pct$

Between these two, per capita weighting by population size is preferred for between-state comparisons of upload speeds relative to county populuations.  

**Goals & Significance.**  This small execerise could be expanded to a scoring algorithm(s) for identifying  locations based on multiple criteria. This could be enhanced by modeling covariate relationships that take into account measures of people and place--both objective and subjective--that can collectively represent opportunities for rural innovation.

## Data Sources

- US Census American Community Survey (ACS) 5-year, 2013-2017
- Broadband data from FCC Form 477
  - VT: https://transition.fcc.gov/form477/BroadbandData/Fixed/Dec17/Version%201/VT-Fixed-Dec2017.zip
  - NH: https://transition.fcc.gov/form477/BroadbandData/Fixed/Dec17/Version%201/NH-Fixed-Dec2017.zip
- VT broadband internet, maps for comparison, see https://publicservice.vermont.gov/content/broadband-availability

## Setup
__Import Python packages used for analysis__  

Learn more about the CensusData package for Python at https://pypi.org/project/CensusData/

In [1]:
import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)

In [2]:
import censusdata  # interface to US Census data

## Import Census Data

Notes

- `33` is the state code for New Hampshire
- `50` is the state code for Vermont
- `050` is the summary level code for Counties. Read more in [CensusData package documentation](https://jtleider.github.io/censusdata/geographies.html).
- `B01001_001` is the table UniqueID for population size. Sourced from [Table Cells in ACS Summary docs](https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.html).

In [3]:
# Vermont population by county
vtco = censusdata.download('acs5', 2015,
                             censusdata.censusgeo([('state', '50'), ('county', '*')]),
                             ['B01001_001E'])
vtco = vtco.assign(max_upload_speed = np.nan)
vtco = vtco.rename(columns={'B01001_001E': 'population_size'})

In [4]:
vtco.sort_values('population_size', ascending=False, inplace=True)
vtco.head(30)

Unnamed: 0,population_size,max_upload_speed
"Chittenden County, Vermont: Summary level: 050, state:50> county:007",159711,
"Rutland County, Vermont: Summary level: 050, state:50> county:021",60530,
"Washington County, Vermont: Summary level: 050, state:50> county:023",59132,
"Windsor County, Vermont: Summary level: 050, state:50> county:027",56150,
"Franklin County, Vermont: Summary level: 050, state:50> county:011",48418,
"Windham County, Vermont: Summary level: 050, state:50> county:025",43858,
"Addison County, Vermont: Summary level: 050, state:50> county:001",36943,
"Bennington County, Vermont: Summary level: 050, state:50> county:003",36589,
"Caledonia County, Vermont: Summary level: 050, state:50> county:005",31012,
"Orange County, Vermont: Summary level: 050, state:50> county:017",28929,


In [17]:
vtco.head(30)

Unnamed: 0,population_size,max_upload_speed
"Chittenden County, Vermont: Summary level: 050, state:50> county:007",159711,
"Rutland County, Vermont: Summary level: 050, state:50> county:021",60530,
"Washington County, Vermont: Summary level: 050, state:50> county:023",59132,
"Windsor County, Vermont: Summary level: 050, state:50> county:027",56150,
"Franklin County, Vermont: Summary level: 050, state:50> county:011",48418,
"Windham County, Vermont: Summary level: 050, state:50> county:025",43858,
"Addison County, Vermont: Summary level: 050, state:50> county:001",36943,
"Bennington County, Vermont: Summary level: 050, state:50> county:003",36589,
"Caledonia County, Vermont: Summary level: 050, state:50> county:005",31012,
"Orange County, Vermont: Summary level: 050, state:50> county:017",28929,


### Import VT Broadband data

In [57]:
vt_fcc477 = pd.read_csv('https://transition.fcc.gov/form477/BroadbandData/Fixed/Dec17/Version%201/VT-Fixed-Dec2017.zip')
nh_fcc477 = pd.read_csv('https://transition.fcc.gov/form477/BroadbandData/Fixed/Dec17/Version%201/NH-Fixed-Dec2017.zip')

In [30]:
vt_fcc477.head()

Unnamed: 0,LogRecNo,Provider_Id,FRN,ProviderName,DBAName,HoldingCompanyName,HocoNum,HocoFinal,StateAbbr,BlockCode,TechCode,Consumer,MaxAdDown,MaxAdUp,Business,MaxCIRDown,MaxCIRUp
0,220639,34287,4335584,MCI Communications Corporation,MCI,Verizon Communications Inc.,131425,Verizon Communications Inc.,VT,500019603001003,30,0,0.0,0.0,1,1.5,1.5
1,220640,34287,4335584,MCI Communications Corporation,MCI,Verizon Communications Inc.,131425,Verizon Communications Inc.,VT,500019608002021,30,0,0.0,0.0,1,1.5,1.5
2,220641,34287,4335584,MCI Communications Corporation,MCI,Verizon Communications Inc.,131425,Verizon Communications Inc.,VT,500039704002021,30,0,0.0,0.0,1,1.5,1.5
3,220642,34287,4335584,MCI Communications Corporation,MCI,Verizon Communications Inc.,131425,Verizon Communications Inc.,VT,500059575001069,30,0,0.0,0.0,1,3.0,3.0
4,220643,34287,4335584,MCI Communications Corporation,MCI,Verizon Communications Inc.,131425,Verizon Communications Inc.,VT,500059575002021,30,0,0.0,0.0,1,0.77,0.77


In [35]:
str(vt_fcc477['BlockCode'][0])[2:5]

'001'

In [58]:
# define lambda function to extract county code from block code
# county code is the 3rd through 5th chars of block code
fn_county_in_block = lambda x: str(x)[2:5]

In [59]:
# add column with CountyCode
vt_fcc477['CountyCode'] = vt_fcc477['BlockCode'].apply(fn_county_in_block)
nh_fcc477['CountyCode'] = nh_fcc477['BlockCode'].apply(fn_county_in_block)

In [60]:
nh_fcc477.head()

Unnamed: 0,LogRecNo,Provider_Id,FRN,ProviderName,DBAName,HoldingCompanyName,HocoNum,HocoFinal,StateAbbr,BlockCode,TechCode,Consumer,MaxAdDown,MaxAdUp,Business,MaxCIRDown,MaxCIRUp,CountyCode
0,10360,34057,3784063,"MetroCast Cablevision of New Hampshire, LLC",Metrocast,Harron Communications LP,130591,Harron Communications LP,NH,330019651001005,42,1,105.0,10.0,1,150.0,10.0,1
1,10361,34057,3784063,"MetroCast Cablevision of New Hampshire, LLC",Metrocast,Harron Communications LP,130591,Harron Communications LP,NH,330019651001006,42,1,105.0,10.0,1,150.0,10.0,1
2,10362,34057,3784063,"MetroCast Cablevision of New Hampshire, LLC",Metrocast,Harron Communications LP,130591,Harron Communications LP,NH,330019651001007,42,1,105.0,10.0,1,150.0,10.0,1
3,10363,34057,3784063,"MetroCast Cablevision of New Hampshire, LLC",Metrocast,Harron Communications LP,130591,Harron Communications LP,NH,330019651001008,42,1,105.0,10.0,1,150.0,10.0,1
4,10364,34057,3784063,"MetroCast Cablevision of New Hampshire, LLC",Metrocast,Harron Communications LP,130591,Harron Communications LP,NH,330019651001013,42,1,105.0,10.0,1,150.0,10.0,1


In [62]:
# Do VT county codes seem OK? How many uniques? Expecting 14 counties. OK
vt_fcc477['CountyCode'].describe()

count     183407
unique        14
top          027
freq       22306
Name: CountyCode, dtype: object

In [63]:
# Do NH county codes seem OK? How many uniques?  Expecting 10 counties. OK
nh_fcc477['CountyCode'].describe()

count     274631
unique        10
top          011
freq       56297
Name: CountyCode, dtype: object

In [50]:
# columns needed for upload analysis
select_cols = ['StateAbbr','CountyCode','MaxAdUp','MaxCIRUp']

# dictionary of aggregations
aggregations = {
    'MaxAdUp': ['max'],
    'MaxCIRUp': ['max']  # agg fcn to perform
}

Data descriptions, excerpt from FCC's [Explanation of Broadband Deployment Data](https://www.fcc.gov/general/explanation-broadband-deployment-data)
- **Consumer**: (0/1) where 1 = Provider can or does offer consumer/mass market/residential service in the block
- **MaxAdDown**: Maximum advertised downstream speed/bandwidth offered by the provider in the block for Consumer service
- **MaxAdUp**: Maximum advertised upstream speed/bandwidth offered by the provider in the block for Consumer service
- **Business**: (0/1) where 1 = Provider can or does offer business/government service in the block
- **MaxCIRDown**: Maximum contractual downstream bandwidth offered by the provider in the block for Business service (filer directed to report 0 if the contracted service is sold on a "best efforts" basis without a guaranteed data-throughput rate)
- **MaxCIRUp**: Maximum contractual upstream bandwidth offered by the provider in the block for Business service (filer directed to report 0 if the contracted service is sold on a "best efforts" basis without a guaranteed data-throughput rate)

In [64]:
# new dataframe
vt_uploads = vt_fcc477[select_cols].groupby(['StateAbbr','CountyCode']).agg(aggregations)
nh_uploads = nh_fcc477[select_cols].groupby(['StateAbbr','CountyCode']).agg(aggregations)

In [65]:
vt_uploads   # MaxAdUp occasionally lower than MaxCIRUp

Unnamed: 0_level_0,Unnamed: 1_level_0,MaxAdUp,MaxCIRUp
Unnamed: 0_level_1,Unnamed: 1_level_1,max,max
StateAbbr,CountyCode,Unnamed: 2_level_2,Unnamed: 3_level_2
VT,1,1000.0,1000.0
VT,3,1000.0,1000.0
VT,5,40.0,1000.0
VT,7,1000.0,1000.0
VT,9,50.0,100.0
VT,11,1000.0,1000.0
VT,13,1000.0,1000.0
VT,15,1000.0,1000.0
VT,17,700.0,1000.0
VT,19,75.0,1000.0


In [67]:
nh_uploads   # MaxAdUp typically lower than MaxCIRUp

Unnamed: 0_level_0,Unnamed: 1_level_0,MaxAdUp,MaxCIRUp
Unnamed: 0_level_1,Unnamed: 1_level_1,max,max
StateAbbr,CountyCode,Unnamed: 2_level_2,Unnamed: 3_level_2
NH,1,120.0,1000.0
NH,3,75.0,1000.0
NH,5,200.0,1000.0
NH,7,75.0,1000.0
NH,9,50.0,1000.0
NH,11,1000.0,1000.0
NH,13,400.0,1000.0
NH,15,1000.0,1000.0
NH,17,1000.0,1000.0
NH,19,120.0,1000.0


In [23]:
# Total population as sum of counties
vt_totalpop = vtco.population_size.sum()  # total of counties
print('VT total population: ', format(vt_totalpop, ','))

VT total population:  626,604


In [6]:
# Same as VT above, with location data abstracted to a variable, moving toward reusable code.

# Represents the Census geography for all counties in New Hampshire
loca_nhco = censusdata.censusgeo([('state', '33'), ('county', '*')])

# Download table(s) by county using 5-year ACS from 2013-2017
nhco = censusdata.download('acs5', 2015, loca_nhco,
                           ['B01001_001E'])  # table of total population for each geography

# Rename column to be human-readable
nhco = nhco.rename(columns={'B01001_001E': 'population_size'})

# Create empty column for max_upload_speed
nhco = nhco.assign(max_upload_speed = np.nan)

In [7]:
# List NH counties sorted by population_size
nhco.sort_values('population_size', ascending=False, inplace=True)
nhco.head(30)

Unnamed: 0,population_size,max_upload_speed
"Hillsborough County, New Hampshire: Summary level: 050, state:33> county:011",403972,
"Rockingham County, New Hampshire: Summary level: 050, state:33> county:015",299006,
"Merrimack County, New Hampshire: Summary level: 050, state:33> county:013",147262,
"Strafford County, New Hampshire: Summary level: 050, state:33> county:017",125273,
"Grafton County, New Hampshire: Summary level: 050, state:33> county:009",89341,
"Cheshire County, New Hampshire: Summary level: 050, state:33> county:005",76430,
"Belknap County, New Hampshire: Summary level: 050, state:33> county:001",60399,
"Carroll County, New Hampshire: Summary level: 050, state:33> county:003",47513,
"Sullivan County, New Hampshire: Summary level: 050, state:33> county:019",43135,
"Coos County, New Hampshire: Summary level: 050, state:33> county:007",31870,


In [21]:
# Total population as sum of counties
nh_totalpop = nhco.population_size.sum()  # total of counties
print('NH total population: ', format(nh_totalpop, ','))

NH total population:  1,324,201


In [24]:
nhco['pct_of_totalpop'] = nhco['population_size']/nh_totalpop

In [25]:
nhco.head(30)

Unnamed: 0,population_size,max_upload_speed,pct_of_totalpop
"Hillsborough County, New Hampshire: Summary level: 050, state:33> county:011",403972,,0.31
"Rockingham County, New Hampshire: Summary level: 050, state:33> county:015",299006,,0.23
"Merrimack County, New Hampshire: Summary level: 050, state:33> county:013",147262,,0.11
"Strafford County, New Hampshire: Summary level: 050, state:33> county:017",125273,,0.09
"Grafton County, New Hampshire: Summary level: 050, state:33> county:009",89341,,0.07
"Cheshire County, New Hampshire: Summary level: 050, state:33> county:005",76430,,0.06
"Belknap County, New Hampshire: Summary level: 050, state:33> county:001",60399,,0.05
"Carroll County, New Hampshire: Summary level: 050, state:33> county:003",47513,,0.04
"Sullivan County, New Hampshire: Summary level: 050, state:33> county:019",43135,,0.03
"Coos County, New Hampshire: Summary level: 050, state:33> county:007",31870,,0.02


## Appendix

#### County-Level Maps

- Broadband Speeds and Availability in the United States, http://www.governing.com/gov-data/broadband-speeds-availability.html

#### Potential data sources

- M-Lab (https://www.measurementlab.net/data/) is an open-source project jointly run by Google, Princeton University, and several other public entities. M-Lab data are used by BroadBandNow, as reported at https://broadbandnow.com/report/us-states-internet-coverage-speed-2018/

- OOKLA speed test data are used by the FCC, according to BroadBandNow. https://www.speedtest.net/reports/united-states/2018/fixed/

#### Useful Websites

- https://broadbandnow.com/Vermont
- https://broadbandnow.com/New-Hampshire
- [DSLreports.com - Good, Bad, Ugly](http://www.dslreports.com/gbu) for consumer ISP reviews of Satellite, Cable, Fiber, Mobile. For example, [Comcast Xfinity upload speeds by plan](http://www.dslreports.com/faq/15643).

#### Related Articles

- [*Internet Access Rankings*](https://www.usnews.com/news/best-states/rankings/infrastructure/internet-access) by U.S. News & World Report, 2018.
- [*Gov. Sununu signs broadband infrastructure bill into law*](https://www.sentinelsource.com/news/local/gov-sununu-signs-broadband-infrastructure-bill-into-law/article_4c07060c-d209-54fc-96fb-049ffc0a8a7e.html) by William Holt, Keene Sentinel, Jun 1, 2018
- [*Best Internet Plans & Providers in Vermont*](https://www.whistleout.com/Internet/Guides/best-internet-providers-in-vermont) by Ella Wagner, WhistleOut, Oct 19, 2018.
- [*How to compare Internet service providers — by upload speed*](https://www.usatoday.com/story/tech/columnist/2016/06/26/how-compare-internet-service-providers-upload-speed/86361172/) by Rob Pegoraro, Special for USA TODAY, June 26, 2016.


============================================================================

### Exploring examples from CensusData package

Based on example at https://jtleider.github.io/censusdata/example1.html

To download data, we need to identify the relevant tables containing the variables of interest to us. 

In [9]:
censusdata.search('acs5', 2015, 'label', 'unemploy')[160:162]

[('B23024_023E',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Income in the past 12 months at or above poverty level:!!With a disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_023M',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Margin of Error for!!Income in the past 12 months at or above poverty level:!!With a disability:!!In labor force:!!Civilian:!!Unemployed')]

Once we have identified a table of interest, we can use censusdata.printtable to show all variables included in the table:

In [10]:
censusdata.printtable(censusdata.censustable('acs5', 2015, 'B23025'))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B23025_001E  | B23025.  Employment Status for | Total:                                                   | int  
B23025_002E  | B23025.  Employment Status for | In labor force:                                          | int  
B23025_003E  | B23025.  Employment Status for | !! In labor force: Civilian labor force:                 | int  
B23025_004E  | B23025.  Employment Status for | !! !! In labor force: Civilian labor force: Employed     | int  
B23025_005E  | B23025.  Employment Status for | !! !! In labor force: Civilian labor force: Unemployed   | int  
B23025_006E  | B23025.  Employment Status for | !! In labor force: Armed Forces                          | int  
B23025_007E  | B23025.  Employment Status for | Not in labor force                           

After identifying relevant variables, we then need to identify the geographies of interest. We are interested in counties in VH and NH.  First we look for the geographic identifier (FIPS code) for each state, then the identifiers for counties.

In [11]:
censusdata.geographies(censusdata.censusgeo([('state', '*')]), 'acs5', 2015)

{'Alabama': censusgeo((('state', '01'),)),
 'Alaska': censusgeo((('state', '02'),)),
 'Arizona': censusgeo((('state', '04'),)),
 'Arkansas': censusgeo((('state', '05'),)),
 'California': censusgeo((('state', '06'),)),
 'Colorado': censusgeo((('state', '08'),)),
 'Connecticut': censusgeo((('state', '09'),)),
 'Delaware': censusgeo((('state', '10'),)),
 'District of Columbia': censusgeo((('state', '11'),)),
 'Florida': censusgeo((('state', '12'),)),
 'Georgia': censusgeo((('state', '13'),)),
 'Hawaii': censusgeo((('state', '15'),)),
 'Idaho': censusgeo((('state', '16'),)),
 'Illinois': censusgeo((('state', '17'),)),
 'Indiana': censusgeo((('state', '18'),)),
 'Iowa': censusgeo((('state', '19'),)),
 'Kansas': censusgeo((('state', '20'),)),
 'Kentucky': censusgeo((('state', '21'),)),
 'Louisiana': censusgeo((('state', '22'),)),
 'Maine': censusgeo((('state', '23'),)),
 'Maryland': censusgeo((('state', '24'),)),
 'Massachusetts': censusgeo((('state', '25'),)),
 'Michigan': censusgeo((('stat

__What are all county codes in Vermont?__

In [12]:
censusdata.geographies(censusdata.censusgeo([('state', '50'), ('county', '*')]), 'acs5', 2015)

{'Addison County, Vermont': censusgeo((('state', '50'), ('county', '001'))),
 'Bennington County, Vermont': censusgeo((('state', '50'), ('county', '003'))),
 'Caledonia County, Vermont': censusgeo((('state', '50'), ('county', '005'))),
 'Chittenden County, Vermont': censusgeo((('state', '50'), ('county', '007'))),
 'Essex County, Vermont': censusgeo((('state', '50'), ('county', '009'))),
 'Franklin County, Vermont': censusgeo((('state', '50'), ('county', '011'))),
 'Grand Isle County, Vermont': censusgeo((('state', '50'), ('county', '013'))),
 'Lamoille County, Vermont': censusgeo((('state', '50'), ('county', '015'))),
 'Orange County, Vermont': censusgeo((('state', '50'), ('county', '017'))),
 'Orleans County, Vermont': censusgeo((('state', '50'), ('county', '019'))),
 'Rutland County, Vermont': censusgeo((('state', '50'), ('county', '021'))),
 'Washington County, Vermont': censusgeo((('state', '50'), ('county', '023'))),
 'Windham County, Vermont': censusgeo((('state', '50'), ('count

__What are all county codes in New Hampshire?__

In [13]:
censusdata.geographies(censusdata.censusgeo([('state', '33'), ('county', '*')]), 'acs5', 2015)

{'Belknap County, New Hampshire': censusgeo((('state', '33'), ('county', '001'))),
 'Carroll County, New Hampshire': censusgeo((('state', '33'), ('county', '003'))),
 'Cheshire County, New Hampshire': censusgeo((('state', '33'), ('county', '005'))),
 'Coos County, New Hampshire': censusgeo((('state', '33'), ('county', '007'))),
 'Grafton County, New Hampshire': censusgeo((('state', '33'), ('county', '009'))),
 'Hillsborough County, New Hampshire': censusgeo((('state', '33'), ('county', '011'))),
 'Merrimack County, New Hampshire': censusgeo((('state', '33'), ('county', '013'))),
 'Rockingham County, New Hampshire': censusgeo((('state', '33'), ('county', '015'))),
 'Strafford County, New Hampshire': censusgeo((('state', '33'), ('county', '017'))),
 'Sullivan County, New Hampshire': censusgeo((('state', '33'), ('county', '019')))}

Now that we have identified the variables and geographies of interest, we can download the data using `censusdata.download` and compute variables for the percent unemployed and the percent with no high school degree:

In [14]:
cookbg = censusdata.download('acs5', 2015,
                             censusdata.censusgeo([('state', '17'), ('county', '031'), ('block group', '*')]),
                             ['B23025_003E', 'B23025_005E', 'B15003_001E', 'B15003_002E', 'B15003_003E',
                              'B15003_004E', 'B15003_005E', 'B15003_006E', 'B15003_007E', 'B15003_008E',
                              'B15003_009E', 'B15003_010E', 'B15003_011E', 'B15003_012E', 'B15003_013E',
                              'B15003_014E', 'B15003_015E', 'B15003_016E'])
cookbg['percent_unemployed'] = cookbg.B23025_005E / cookbg.B23025_003E * 100
cookbg['percent_nohs'] = (cookbg.B15003_002E + cookbg.B15003_003E + cookbg.B15003_004E
                          + cookbg.B15003_005E + cookbg.B15003_006E + cookbg.B15003_007E + cookbg.B15003_008E
                          + cookbg.B15003_009E + cookbg.B15003_010E + cookbg.B15003_011E + cookbg.B15003_012E
                          + cookbg.B15003_013E + cookbg.B15003_014E +
                          cookbg.B15003_015E + cookbg.B15003_016E) / cookbg.B15003_001E * 100
cookbg = cookbg[['percent_unemployed', 'percent_nohs']]
cookbg.describe()

Unnamed: 0,percent_unemployed,percent_nohs
count,3983.0,3984.0
mean,12.0,15.19
std,10.09,13.23
min,0.0,0.0
25%,4.86,4.75
50%,9.24,11.66
75%,16.28,22.46
max,91.86,77.43


Next, we show the 30 block groups in Cook County with the highest rate of unemployment, and the percent with no high school degree in those block groups.

In [15]:
cookbg.sort_values('percent_unemployed', ascending=False).head(30)

Unnamed: 0,percent_unemployed,percent_nohs
"Block Group 1, Census Tract 8357, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:835700> block group:1",91.86,0.0
"Block Group 2, Census Tract 6805, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:680500> block group:2",66.27,19.54
"Block Group 3, Census Tract 5103, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:510300> block group:3",64.07,16.97
"Block Group 2, Census Tract 6809, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:680900> block group:2",61.46,42.33
"Block Group 1, Census Tract 4913, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:491300> block group:1",56.4,14.64
"Block Group 5, Census Tract 2315, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:231500> block group:5",55.58,44.72
"Block Group 3, Census Tract 8346, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:834600> block group:3",54.96,17.85
"Block Group 2, Census Tract 6706, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:670600> block group:2",54.13,9.57
"Block Group 2, Census Tract 8386, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:838600> block group:2",53.78,48.41
"Block Group 5, Census Tract 4910, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:491000> block group:5",53.57,38.23


Finally, we show the correlation between these two variables across all Cook County block groups:

In [16]:
cookbg.corr()

Unnamed: 0,percent_unemployed,percent_nohs
percent_unemployed,1.0,0.29
percent_nohs,0.29,1.0


There is a weak correlation of 0.29 between percent unemployed and percent no high school.