## Finding a package

We ended up going with the censusdata package. We found a clear write up about it at  https://towardsdatascience.com/accessing-census-data-with-python-3e2f2b56e20d and found it's functions clear and useful. We originally looked at the census package, but this required an api to access the data that often failed and the functions to get the data were less clear.

In [1]:
import pandas as pd
import censusdata

## Searching for relevant tables

This package contains a search function that allows us to search a data set for relevant tables.

In this search we looked at the American Community Survey 5-year estimates. This is a more in depth but slightly less accurate version of the 10 year census and is conducted on a rolling basis.

In this search we look at this data set for 2019 and look for tables related to marriage.

In [6]:
tables = censusdata.search('acs5', 2019,'concept', 'marriage')
print(len(tables))
tables[:3]

56


[('B12007A_001E',
  'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE)',
  'Estimate!!Median age at first marriage --!!Male'),
 ('B12007A_002E',
  'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE)',
  'Estimate!!Median age at first marriage --!!Female'),
 ('B12007B_001E',
  'MEDIAN AGE AT FIRST MARRIAGE (BLACK OR AFRICAN AMERICAN ALONE)',
  'Estimate!!Median age at first marriage --!!Male')]

This outputs a list of tuples with the code name, table name, and variable.

We then looked at a list of just the table names to be able to more easily look for the table we want.

In [5]:
table_names = list(set([row[1] for row in tables[:len(tables)-1]]))
table_names.sort()
table_names

['MARRIAGES ENDING IN WIDOWHOOD IN THE LAST YEAR BY SEX BY MARITAL STATUS FOR THE POPULATION 15 YEARS AND OVER',
 'MARRIAGES IN THE LAST YEAR BY SEX BY MARITAL STATUS FOR THE POPULATION 15 YEARS AND OVER',
 'MEDIAN AGE AT FIRST MARRIAGE',
 'MEDIAN AGE AT FIRST MARRIAGE (AMERICAN INDIAN AND ALASKA NATIVE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (ASIAN ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (BLACK OR AFRICAN AMERICAN ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (HISPANIC OR LATINO)',
 'MEDIAN AGE AT FIRST MARRIAGE (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (SOME OTHER RACE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (TWO OR MORE RACES)',
 'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE, NOT HISPANIC OR LATINO)',
 'MEDIAN DURATION OF CURRENT MARRIAGE IN YEARS BY SEX BY MARITAL STATUS FOR THE MARRIED POPULATION 15 YEARS AND OVER']

## Getting data froma specific table

Copy and paste from above list to get more info

In [8]:
desc = 'MEDIAN AGE AT FIRST MARRIAGE'
for item in tables:
    if item[1] == desc:
        print(item)
        code = item[0][:6]
code

('B12007_001E', 'MEDIAN AGE AT FIRST MARRIAGE', 'Estimate!!Median age at first marriage --!!Male')
('B12007_002E', 'MEDIAN AGE AT FIRST MARRIAGE', 'Estimate!!Median age at first marriage --!!Female')


'B12007'

A cleaner look at all the subtables to get the variable names

In [10]:
censusdata.printtable(censusdata.censustable('acs5', 2019, 'B12007'))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B12007_001E  | MEDIAN AGE AT FIRST MARRIAGE   | !! !! Estimate Median age at first marriage -- Male      | float
B12007_002E  | MEDIAN AGE AT FIRST MARRIAGE   | !! !! Estimate Median age at first marriage -- Female    | float
-------------------------------------------------------------------------------------------------------------------


Pull the data

In [11]:
censusdata.download('acs5', 2019,
                   censusdata.censusgeo([('state', '*')]),
                    ['B12007_001E', 'B12007_002E'])

Unnamed: 0,B12007_001E,B12007_002E
"Alabama: Summary level: 040, state:01",28.5,26.7
"Alaska: Summary level: 040, state:02",29.2,26.4
"Arizona: Summary level: 040, state:04",29.9,27.8
"Arkansas: Summary level: 040, state:05",27.2,25.7
"California: Summary level: 040, state:06",30.8,29.0
"Colorado: Summary level: 040, state:08",29.7,27.7
"Delaware: Summary level: 040, state:10",30.4,29.0
"District of Columbia: Summary level: 040, state:11",30.9,30.6
"Connecticut: Summary level: 040, state:09",31.1,29.5
"Florida: Summary level: 040, state:12",30.7,28.9
