Example 1: Downloading Block Group Data and Exporting to CSV
============================================================

As a first example, let's suppose we're interested in unemployment and high school dropout rates
for block groups in Cook County, Illinois, which contains Chicago, IL.

We begin by importing the censusdata and pandas modules, and setting some display options in pandas for
nicer output:

In [1]:
import pandas as pd
import censusdata
pd.set_option('display.expand_frame_repr', False)
pd.set_option('display.precision', 2)

To download data, we need to identify the relevant tables containing the variables of interest to us.
One way to do this would be to refer to the ACS documentation, in particular the Table Shells
(https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.html). Alternatively, it is possible to do this from within Python. `censusdata.search` will search for given text patterns. The downside to this is output can be voluminous, as in the following searches, as ACS frequently provides a large number of different tabulations related to a given topic area. Below, we limit the output to the relevant variables:

In [2]:
censusdata.search('acs5', '2015', 'label', 'unemploy')[160:170]

[('B23024_023E',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Income in the past 12 months at or above poverty level:!!With a disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_023M',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Margin of Error for!!Income in the past 12 months at or above poverty level:!!With a disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_030E',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Income in the past 12 months at or above poverty level:!!No disability:!!In labor force:!!Civilian:!!Unemployed'),
 ('B23024_030M',
  'B23024.  Poverty Status in the Past 12 Months by Disability Status by Employment Status for the Population 20 to 64 Years',
  'Margin of Error for!!Income in the pas

In [3]:
censusdata.search('acs5', '2015', 'concept', 'education')[730:790]

[('B15002_035E',
  'B15002.  Sex by Educational Attainment for the Population 25 Years and over',
  'Female:!!Doctorate degree'),
 ('B15002_035M',
  'B15002.  Sex by Educational Attainment for the Population 25 Years and over',
  'Margin of Error for!!Female:!!Doctorate degree'),
 ('B15003_001E',
  'B15003.  Educational Attainment for the Population 25 Years and Over',
  'Total:'),
 ('B15003_001M',
  'B15003.  Educational Attainment for the Population 25 Years and Over',
  'Margin of Error for!!Total:'),
 ('B15003_002E',
  'B15003.  Educational Attainment for the Population 25 Years and Over',
  'No schooling completed'),
 ('B15003_002M',
  'B15003.  Educational Attainment for the Population 25 Years and Over',
  'Margin of Error for!!No schooling completed'),
 ('B15003_003E',
  'B15003.  Educational Attainment for the Population 25 Years and Over',
  'Nursery school'),
 ('B15003_003M',
  'B15003.  Educational Attainment for the Population 25 Years and Over',
  'Margin of Error for!!Nu

(Please note that searching Census variables and printing out a single table rely on previously downloaded information from the Census API, because otherwise every time we did this we would have to download data for all variables.) Once we have identified a table of interest, we can use `censusdata.printtable` to show all variables
included in the table:

In [4]:
censusdata.printtable(censusdata.censustable('acs5', '2015', 'B23025'))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B23025_001E  | B23025.  Employment Status for | Total:                                                   | int  
B23025_002E  | B23025.  Employment Status for | In labor force:                                          | int  
B23025_003E  | B23025.  Employment Status for | !! In labor force: Civilian labor force:                 | int  
B23025_004E  | B23025.  Employment Status for | !! !! In labor force: Civilian labor force: Employed     | int  
B23025_005E  | B23025.  Employment Status for | !! !! In labor force: Civilian labor force: Unemployed   | int  
B23025_006E  | B23025.  Employment Status for | !! In labor force: Armed Forces                          | int  
B23025_007E  | B23025.  Employment Status for | Not in labor force                           

In [5]:
censusdata.printtable(censusdata.censustable('acs5', '2015', 'B15003'))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B15003_001E  | B15003.  Educational Attainmen | Total:                                                   | int  
B15003_002E  | B15003.  Educational Attainmen | No schooling completed                                   | int  
B15003_003E  | B15003.  Educational Attainmen | Nursery school                                           | int  
B15003_004E  | B15003.  Educational Attainmen | Kindergarten                                             | int  
B15003_005E  | B15003.  Educational Attainmen | 1st grade                                                | int  
B15003_006E  | B15003.  Educational Attainmen | 2nd grade                                                | int  
B15003_007E  | B15003.  Educational Attainmen | 3rd grade                                    

After identifying relevant variables, we then need to identify the geographies of interest. We are interested in block groups in Cook County, Illinois, so first we look for the geographic identifier (FIPS code)
for Illinois, then the identifiers for all counties with Illinois to find Cook County:

In [6]:
censusdata.geographies(censusdata.censusgeo([('state', '*')]), 'acs5', '2015')

{'Alabama': censusgeo((('state', '01'),)),
 'Alaska': censusgeo((('state', '02'),)),
 'Arizona': censusgeo((('state', '04'),)),
 'Arkansas': censusgeo((('state', '05'),)),
 'California': censusgeo((('state', '06'),)),
 'Colorado': censusgeo((('state', '08'),)),
 'Connecticut': censusgeo((('state', '09'),)),
 'Delaware': censusgeo((('state', '10'),)),
 'District of Columbia': censusgeo((('state', '11'),)),
 'Florida': censusgeo((('state', '12'),)),
 'Georgia': censusgeo((('state', '13'),)),
 'Hawaii': censusgeo((('state', '15'),)),
 'Idaho': censusgeo((('state', '16'),)),
 'Illinois': censusgeo((('state', '17'),)),
 'Indiana': censusgeo((('state', '18'),)),
 'Iowa': censusgeo((('state', '19'),)),
 'Kansas': censusgeo((('state', '20'),)),
 'Kentucky': censusgeo((('state', '21'),)),
 'Louisiana': censusgeo((('state', '22'),)),
 'Maine': censusgeo((('state', '23'),)),
 'Maryland': censusgeo((('state', '24'),)),
 'Massachusetts': censusgeo((('state', '25'),)),
 'Michigan': censusgeo((('stat

In [7]:
censusdata.geographies(censusdata.censusgeo([('state', '17'), ('county', '*')]), 'acs5', '2015')

{'Adams County, Illinois': censusgeo((('state', '17'), ('county', '001'))),
 'Alexander County, Illinois': censusgeo((('state', '17'), ('county', '003'))),
 'Bond County, Illinois': censusgeo((('state', '17'), ('county', '005'))),
 'Boone County, Illinois': censusgeo((('state', '17'), ('county', '007'))),
 'Brown County, Illinois': censusgeo((('state', '17'), ('county', '009'))),
 'Bureau County, Illinois': censusgeo((('state', '17'), ('county', '011'))),
 'Calhoun County, Illinois': censusgeo((('state', '17'), ('county', '013'))),
 'Carroll County, Illinois': censusgeo((('state', '17'), ('county', '015'))),
 'Cass County, Illinois': censusgeo((('state', '17'), ('county', '017'))),
 'Champaign County, Illinois': censusgeo((('state', '17'), ('county', '019'))),
 'Christian County, Illinois': censusgeo((('state', '17'), ('county', '021'))),
 'Clark County, Illinois': censusgeo((('state', '17'), ('county', '023'))),
 'Clay County, Illinois': censusgeo((('state', '17'), ('county', '025')))

Now that we have identified the variables and geographies of interest, we can download the data using `censusdata.download` and compute variables for the percent unemployed and the percent with no high school degree:

In [8]:
cookbg = censusdata.download('acs5', '2015',
                             censusdata.censusgeo([('state', '17'), ('county', '031'), ('block group', '*')]),
                             ['B23025_003E', 'B23025_005E', 'B15003_001E', 'B15003_002E', 'B15003_003E',
                              'B15003_004E', 'B15003_005E', 'B15003_006E', 'B15003_007E', 'B15003_008E',
                              'B15003_009E', 'B15003_010E', 'B15003_011E', 'B15003_012E', 'B15003_013E',
                              'B15003_014E', 'B15003_015E', 'B15003_016E'])
cookbg['percent_unemployed'] = cookbg.B23025_005E / cookbg.B23025_003E * 100
cookbg['percent_nohs'] = (cookbg.B15003_002E + cookbg.B15003_003E + cookbg.B15003_004E
                          + cookbg.B15003_005E + cookbg.B15003_006E + cookbg.B15003_007E + cookbg.B15003_008E
                          + cookbg.B15003_009E + cookbg.B15003_010E + cookbg.B15003_011E + cookbg.B15003_012E
                          + cookbg.B15003_013E + cookbg.B15003_014E +
                          cookbg.B15003_015E + cookbg.B15003_016E) / cookbg.B15003_001E * 100
cookbg = cookbg[['percent_unemployed', 'percent_nohs']]
cookbg.describe()

Unnamed: 0,percent_unemployed,percent_nohs
count,3983.0,3984.0
mean,12.0,15.19
std,10.09,13.23
min,0.0,0.0
25%,4.86,4.75
50%,9.24,11.66
75%,16.28,22.46
max,91.86,77.43


Next, we show the 30 block groups in Cook County with the highest rate of unemployment, and the percent with no high school degree in those block groups.

In [9]:
cookbg.sort_values('percent_unemployed', ascending=False).head(30)

Unnamed: 0,percent_unemployed,percent_nohs
"Block Group 1, Census Tract 8357, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:835700> block group:1",91.86,0.0
"Block Group 2, Census Tract 6805, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:680500> block group:2",66.27,19.54
"Block Group 3, Census Tract 5103, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:510300> block group:3",64.07,16.97
"Block Group 2, Census Tract 6809, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:680900> block group:2",61.46,42.33
"Block Group 1, Census Tract 4913, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:491300> block group:1",56.4,14.64
"Block Group 5, Census Tract 2315, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:231500> block group:5",55.58,44.72
"Block Group 3, Census Tract 8346, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:834600> block group:3",54.96,17.85
"Block Group 2, Census Tract 6706, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:670600> block group:2",54.13,9.57
"Block Group 2, Census Tract 8386, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:838600> block group:2",53.78,48.41
"Block Group 5, Census Tract 4910, Cook County, Illinois: Summary level: 150, state:17> county:031> tract:491000> block group:5",53.57,38.23


Finally, we show the correlation between these two variables across all Cook County block groups:

In [10]:
cookbg.corr()

Unnamed: 0,percent_unemployed,percent_nohs
percent_unemployed,1.0,0.29
percent_nohs,0.29,1.0
