# Looking at changing family structures

To do this, we need to use data from the census, which the first part of our project was dedicated to retrieving.

## Finding a package

We ended up going with the censusdata package. We found a clear write up about it at  https://towardsdatascience.com/accessing-census-data-with-python-3e2f2b56e20d and found its functions clear and useful. We originally looked at the census package, but this required an API key to access the data that often failed and the functions to get the data were less clear.

In [1]:
import pandas as pd
import censusdata

## Searching for relevant tables

This package contains a search function that allows us to search a data set for relevant tables.

In this search we looked at the American Community Survey 5-year estimates. This is a more in depth but slightly less accurate version of the 10 year census and is conducted on a rolling basis.

In this search we look at this data set for 2019 and look for tables related to marriage.

In [2]:
tables = censusdata.search('acs5', 2019,'concept', 'marriage')

This outputs a list of tuples with the code name, table name, and variable.

We then looked at a list of just the table names to be able to more easily look for the table we want.

In [3]:
table_names = list(set([row[1] for row in tables[:len(tables)-1]]))
table_names.sort()
table_names

['MARRIAGES ENDING IN WIDOWHOOD IN THE LAST YEAR BY SEX BY MARITAL STATUS FOR THE POPULATION 15 YEARS AND OVER',
 'MARRIAGES IN THE LAST YEAR BY SEX BY MARITAL STATUS FOR THE POPULATION 15 YEARS AND OVER',
 'MEDIAN AGE AT FIRST MARRIAGE',
 'MEDIAN AGE AT FIRST MARRIAGE (AMERICAN INDIAN AND ALASKA NATIVE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (ASIAN ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (BLACK OR AFRICAN AMERICAN ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (HISPANIC OR LATINO)',
 'MEDIAN AGE AT FIRST MARRIAGE (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (SOME OTHER RACE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (TWO OR MORE RACES)',
 'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE, NOT HISPANIC OR LATINO)',
 'MEDIAN DURATION OF CURRENT MARRIAGE IN YEARS BY SEX BY MARITAL STATUS FOR THE MARRIED POPULATION 15 YEARS AND OVER']

## Getting data froma specific table

We one we pick the name we needed to find the code

In [4]:
name = 'MEDIAN AGE AT FIRST MARRIAGE'
for item in tables:
    if item[1] == name:
        code = item[0][:6]

Then we can use the printtable function to get a look at the structure of the table.

In [5]:
censusdata.printtable(censusdata.censustable('acs5', 2019, code))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B12007_001E  | MEDIAN AGE AT FIRST MARRIAGE   | !! !! Estimate Median age at first marriage -- Male      | float
B12007_002E  | MEDIAN AGE AT FIRST MARRIAGE   | !! !! Estimate Median age at first marriage -- Female    | float
-------------------------------------------------------------------------------------------------------------------


Then we can download the variables we choose into a pandas data frame by using the codes above.

In [6]:
marriage_2019 = censusdata.download('acs5', 2019,
                   censusdata.censusgeo([('state', '*')]),
                    ['B12007_001E', 'B12007_002E'])
marriage_2019.head()

Unnamed: 0,B12007_001E,B12007_002E
"Alabama: Summary level: 040, state:01",28.5,26.7
"Alaska: Summary level: 040, state:02",29.2,26.4
"Arizona: Summary level: 040, state:04",29.9,27.8
"Arkansas: Summary level: 040, state:05",27.2,25.7
"California: Summary level: 040, state:06",30.8,29.0


## Streamlining this process

To reformat this table and others, we wrote the following function. We can follow the same process to find other tables and variables of interest, and then plug that information into this function to get a nicer table.

In [2]:
"""
This function downloads the specified tables from the specified years and reformats them

Inputs: codes - a list of the table codes
        names - what to rename the variables for each table code
        years - years to get a table from
        
Output: one dataframe with the requested data compiling the different years
"""
def get_tables(codes, names, years):
    tables = []
    for year in years:
        #Get table
        df = censusdata.download('acs5', year,
                   censusdata.censusgeo([('state', '*')]),
                    codes)
        
        #Rename columns
        name_dict = dict(zip(codes, names))
        name_dict['index'] = 'State'
        df = df.reset_index() #Turns row names into row
        df = df.rename(columns = name_dict)
        
        #Shorten states column to state name
        df = df.astype({'State':'str'})
        df['State'] = df['State'].str.split(':').str.get(0) 
        
        #Add column for year
        df['Year'] = year
        
        tables.append(df)
    return pd.concat(tables)

In [30]:
marriage = get_tables(['B12007_001E', 'B12007_002E'], 
                      ['Male age', 'Female age'], 
                      [2009, 2014, 2019])
marriage.head()

Unnamed: 0,State,Male age,Female age,Year
0,Alaska,27.2,25.2,2009
1,Alabama,26.8,25.3,2009
2,Arkansas,25.8,24.3,2009
3,Arizona,27.8,25.8,2009
4,California,28.8,26.8,2009


In [16]:
household_type = get_tables(['B11001_001E', 'B11001_002E','B11001_003E','B11001_004E','B11001_007E','B11001_008E','B11001_009E'], 
                            ['Total', 'Total Family','Married-couple Family', 'Single Householder, no spouse','Total Nonfamily','Nonfamily Living Alone','Nonfamily Not Alone'], 
                            [2009, 2014, 2019])
household_type

Unnamed: 0,State,Total,Total Family,Married-couple Family,"Single Householder, no spouse",Total Nonfamily,Nonfamily Living Alone,Nonfamily Not Alone,Year
0,Alaska,234779,159319,118716,40603,75460,57718,17742,2009
1,Alabama,1819441,1236035,894351,341684,583406,508317,75089,2009
2,Arkansas,1109635,754486,563199,191287,355149,305252,49897,2009
3,Arizona,2248170,1492544,1115833,376711,755626,603300,152326,2009
4,California,12187191,8333690,6085094,2248596,3853501,2993951,859550,2009
...,...,...,...,...,...,...,...,...,...
47,Washington,2848396,1841954,1430460,411494,1006442,759370,247072,2019
48,West Virginia,732585,473856,356024,117832,258729,217699,41030,2019
49,Wisconsin,2358156,1482213,1148844,333369,875943,696118,179825,2019
50,Wyoming,230101,148652,119353,29299,81449,64997,16452,2019


In [19]:
household_mc = household_type[["State", "Married-couple Family", "Total"]]
household_mc

Unnamed: 0,State,Married-couple Family,Total
0,Alaska,118716,234779
1,Alabama,894351,1819441
2,Arkansas,563199,1109635
3,Arizona,1115833,2248170
4,California,6085094,12187191
...,...,...,...
47,Washington,1430460,2848396
48,West Virginia,356024,732585
49,Wisconsin,1148844,2358156
50,Wyoming,119353,230101


In [22]:
mc_percentage = household_type["Married-couple Family"] / household_type["Total"]
year = household_type["Year"]
state = household_type["State"]

household_mc_percentage = pd.DataFrame({
    'State': state,
    'Married-couple Family Percentage': mc_percentage,
    'Year': year
})

In [5]:
divorces = get_tables(['B12503_001E','B12503_003E','B12503_005E','B12503_006E','B12503_008E','B12503_010E','B12503_011E'],
                      ['Total','Male Never Married', 'Male Married; Divorced Last Year', 'Male Married; Not Divorced Last Year','Female Never Married','Female Married; Divorced Last Year','Female Married; Not Divorced Last Year'],
                      [2012, 2019])

In [31]:
div_percentage = (divorces["Male Married; Divorced Last Year"] + divorces["Female Married; Divorced Last Year"]) / divorces["Total"]
nm_percentage = (divorces["Male Never Married"] + divorces["Female Never Married"]) / divorces["Total"]
year = divorces["Year"]
state = divorces["State"]

divorce_percentage = pd.DataFrame({
    'State': state, 
    'Percentage Never Married': nm_percentage,
    'Percentage Divorced in Last Year': div_percentage,
    'Year': year
})

In [6]:
divorces

Unnamed: 0,State,Total,Male Never Married,Male Married; Divorced Last Year,Male Married; Not Divorced Last Year,Female Never Married,Female Married; Divorced Last Year,Female Married; Not Divorced Last Year,Year
0,Alabama,3844391,584355,22564,1234437,517693,26151,1459191,2012
1,Alaska,556204,105333,3389,181330,74461,3330,188361,2012
2,Arizona,5056561,876661,24077,1595535,711047,27608,1821633,2012
3,Arkansas,2325562,330745,13975,784152,276122,16462,904106,2012
4,California,29700084,5778554,112136,8772035,4819717,129478,10088164,2012
...,...,...,...,...,...,...,...,...,...
47,Washington,6031108,1042361,23783,1934478,826429,26890,2177167,2019
48,West Virginia,1512469,232652,7665,502692,183487,7214,578759,2019
49,Wisconsin,4734360,825525,15745,1497749,695245,15502,1684594,2019
50,Wyoming,466549,72140,2069,163131,53486,2687,173036,2019


## Data visualization

Now we can make charts from the data.

In [6]:
from matplotlib import pyplot as plt
import plotly.io as pio
from plotly import express as px

In [7]:
import plotly.figure_factory as ff

In [8]:
fips = pd.read_csv("https://gist.githubusercontent.com/dantonnoriega/bf1acd2290e15b91e6710b6fd3be0a53/raw/11d15233327c8080c9646c7e1f23052659db251d/us-state-ansi-fips.csv")
fips["State"] = fips["stname"]
fips = fips.drop(columns=['stname'])

fips.loc[len(fips.index)] = [72, 'PR', 'Puerto Rico']


fips

Unnamed: 0,st,stusps,State
0,1,AL,Alabama
1,2,AK,Alaska
2,4,AZ,Arizona
3,5,AR,Arkansas
4,6,CA,California
5,8,CO,Colorado
6,9,CT,Connecticut
7,10,DE,Delaware
8,11,DC,District of Columbia
9,12,FL,Florida


In [32]:
marriage = pd.merge(marriage, fips, on = ["State"])
household = pd.merge(household_mc_percentage, fips, on = ["State"])
divorce = pd.merge(divorce_percentage, fips, on = ["State"])
divorce

Unnamed: 0,State,Percentage Never Married,Percentage Divorced in Last Year,Year,st,stusps
0,Alabama,0.286664,0.012672,2012,1,AL
1,Alabama,0.308004,0.009545,2019,1,AL
2,Alaska,0.323252,0.012080,2012,2,AK
3,Alaska,0.341092,0.008258,2019,2,AK
4,Arizona,0.313990,0.010221,2012,4,AZ
...,...,...,...,...,...,...
99,Wisconsin,0.321220,0.006600,2019,55,WI
100,Wyoming,0.259772,0.012571,2012,56,WY
101,Wyoming,0.269266,0.010194,2019,56,WY
102,Puerto Rico,0.379767,0.008400,2012,72,PR


In [10]:
locations = marriage[" stusps"].tolist()

for i in range(len(locations)):
    if len(locations[i]) == 3:
        locations[i] = locations[i][1:3]

locations

['AK',
 'AK',
 'AK',
 'AL',
 'AL',
 'AL',
 'AR',
 'AR',
 'AR',
 'AZ',
 'AZ',
 'AZ',
 'CA',
 'CA',
 'CA',
 'CO',
 'CO',
 'CO',
 'CT',
 'CT',
 'CT',
 'DC',
 'DC',
 'DC',
 'DE',
 'DE',
 'DE',
 'FL',
 'FL',
 'FL',
 'GA',
 'GA',
 'GA',
 'HI',
 'HI',
 'HI',
 'IA',
 'IA',
 'IA',
 'ID',
 'ID',
 'ID',
 'IL',
 'IL',
 'IL',
 'IN',
 'IN',
 'IN',
 'KS',
 'KS',
 'KS',
 'KY',
 'KY',
 'KY',
 'LA',
 'LA',
 'LA',
 'MA',
 'MA',
 'MA',
 'MD',
 'MD',
 'MD',
 'ME',
 'ME',
 'ME',
 'MI',
 'MI',
 'MI',
 'MN',
 'MN',
 'MN',
 'MO',
 'MO',
 'MO',
 'MS',
 'MS',
 'MS',
 'MT',
 'MT',
 'MT',
 'NC',
 'NC',
 'NC',
 'ND',
 'ND',
 'ND',
 'NE',
 'NE',
 'NE',
 'NH',
 'NH',
 'NH',
 'NJ',
 'NJ',
 'NJ',
 'NM',
 'NM',
 'NM',
 'NV',
 'NV',
 'NV',
 'NY',
 'NY',
 'NY',
 'OH',
 'OH',
 'OH',
 'OK',
 'OK',
 'OK',
 'OR',
 'OR',
 'OR',
 'PA',
 'PA',
 'PA',
 'PR',
 'PR',
 'PR',
 'RI',
 'RI',
 'RI',
 'SC',
 'SC',
 'SC',
 'SD',
 'SD',
 'SD',
 'TN',
 'TN',
 'TN',
 'TX',
 'TX',
 'TX',
 'UT',
 'UT',
 'UT',
 'VA',
 'VA',
 'VA',
 'VT',
 'VT',

In [36]:
locations_2 = divorce[" stusps"].tolist()

for i in range(len(locations_2)):
    if len(locations_2[i]) == 3:
        locations_2[i] = locations_2[i][1:3]

locations_2

['AL',
 'AL',
 'AK',
 'AK',
 'AZ',
 'AZ',
 'AR',
 'AR',
 'CA',
 'CA',
 'CO',
 'CO',
 'DE',
 'DE',
 'DC',
 'DC',
 'CT',
 'CT',
 'FL',
 'FL',
 'GA',
 'GA',
 'ID',
 'ID',
 'HI',
 'HI',
 'IL',
 'IL',
 'IN',
 'IN',
 'IA',
 'IA',
 'KS',
 'KS',
 'KY',
 'KY',
 'LA',
 'LA',
 'ME',
 'ME',
 'MD',
 'MD',
 'MA',
 'MA',
 'MI',
 'MI',
 'MN',
 'MN',
 'MS',
 'MS',
 'MO',
 'MO',
 'MT',
 'MT',
 'NE',
 'NE',
 'NV',
 'NV',
 'NH',
 'NH',
 'NJ',
 'NJ',
 'NM',
 'NM',
 'NY',
 'NY',
 'NC',
 'NC',
 'ND',
 'ND',
 'OH',
 'OH',
 'OK',
 'OK',
 'OR',
 'OR',
 'PA',
 'PA',
 'RI',
 'RI',
 'SC',
 'SC',
 'SD',
 'SD',
 'TN',
 'TN',
 'TX',
 'TX',
 'VT',
 'VT',
 'UT',
 'UT',
 'VA',
 'VA',
 'WA',
 'WA',
 'WV',
 'WV',
 'WI',
 'WI',
 'WY',
 'WY',
 'PR',
 'PR']

In [52]:
fig = px.choropleth(locations = locations, 
                    locationmode = "USA-states", 
                    color = marriage["Male age"],
                    title = "Median Age of First Marriage (Male)",
                    scope="usa",
                    animation_frame = marriage["Year"],
                    range_color = [min(marriage["Male age"]), max(marriage["Male age"])])
fig.show()

In [53]:
from plotly.io import write_html
write_html(fig, "median_age_male_choro.html")

In [54]:
fig = px.choropleth(locations = locations, 
                    locationmode = "USA-states", 
                    color = marriage["Female age"],
                    title = "Median Age of First Marriage (Female)",
                    scope="usa",
                    animation_frame = marriage["Year"],
                    range_color = [min(marriage["Female age"]), max(marriage["Female age"])])
fig.show()

In [55]:
from plotly.io import write_html
write_html(fig, "median_age_female_choro.html")

In [56]:
fig = px.choropleth(locations = locations, 
                    locationmode = "USA-states", 
                    color = household["Married-couple Family Percentage"],
                    title = "Percentage of Population in Married-Couple Families",
                    scope="usa",
                    animation_frame = household["Year"],
                    range_color = [min(household["Married-couple Family Percentage"]), max(household["Married-couple Family Percentage"])])
fig.show()

In [57]:
from plotly.io import write_html
write_html(fig, "married_couple_choro.html")

In [58]:
fig = px.choropleth(locations = locations_2, 
                    locationmode = "USA-states", 
                    color = divorce["Percentage Divorced in Last Year"],
                    title = "Percentage of Population Divorced in the Last Year",
                    scope="usa",
                    animation_frame = divorce["Year"],
                    range_color = [0, 0.03])
fig.show()

In [59]:
from plotly.io import write_html
write_html(fig, "divorces_choro.html")

In [60]:
fig = px.choropleth(locations = locations_2, 
                    locationmode = "USA-states", 
                    color = divorce["Percentage Never Married"],
                    title = "Percentage of Population Never Married",
                    scope="usa",
                    animation_frame = divorce["Year"],
                    range_color = [min(divorce["Percentage Never Married"]), max(divorce["Percentage Never Married"])])
fig.show()

In [61]:
from plotly.io import write_html
write_html(fig, "never_married_choro.html")