# Looking at changing family structures

To do this, we need to use data from the census, which the first part of our project was dedicated to retrieving.

## Finding a package

We ended up going with the censusdata package. We found a clear write up about it at  https://towardsdatascience.com/accessing-census-data-with-python-3e2f2b56e20d and found its functions clear and useful. We originally looked at the census package, but this required an API key to access the data that often failed and the functions to get the data were less clear.

In [1]:
import pandas as pd
import censusdata

## Searching for relevant tables

This package contains a search function that allows us to search a data set for relevant tables.

In this search we looked at the American Community Survey 5-year estimates. This is a more in depth but slightly less accurate version of the 10 year census and is conducted on a rolling basis.

In this search we look at this data set for 2019 and look for tables related to marriage.

In [2]:
tables = censusdata.search('acs5', 2019,'concept', 'marriage')

This outputs a list of tuples with the code name, table name, and variable.

We then looked at a list of just the table names to be able to more easily look for the table we want.

In [3]:
table_names = list(set([row[1] for row in tables[:len(tables)-1]]))
table_names.sort()
table_names

['MARRIAGES ENDING IN WIDOWHOOD IN THE LAST YEAR BY SEX BY MARITAL STATUS FOR THE POPULATION 15 YEARS AND OVER',
 'MARRIAGES IN THE LAST YEAR BY SEX BY MARITAL STATUS FOR THE POPULATION 15 YEARS AND OVER',
 'MEDIAN AGE AT FIRST MARRIAGE',
 'MEDIAN AGE AT FIRST MARRIAGE (AMERICAN INDIAN AND ALASKA NATIVE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (ASIAN ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (BLACK OR AFRICAN AMERICAN ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (HISPANIC OR LATINO)',
 'MEDIAN AGE AT FIRST MARRIAGE (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (SOME OTHER RACE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (TWO OR MORE RACES)',
 'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE)',
 'MEDIAN AGE AT FIRST MARRIAGE (WHITE ALONE, NOT HISPANIC OR LATINO)',
 'MEDIAN DURATION OF CURRENT MARRIAGE IN YEARS BY SEX BY MARITAL STATUS FOR THE MARRIED POPULATION 15 YEARS AND OVER']

## Getting data froma specific table

We one we pick the name we needed to find the code

In [4]:
name = 'MEDIAN AGE AT FIRST MARRIAGE'
for item in tables:
    if item[1] == name:
        code = item[0][:6]

Then we can use the printtable function to get a look at the structure of the table.

In [5]:
censusdata.printtable(censusdata.censustable('acs5', 2019, code))

Variable     | Table                          | Label                                                    | Type 
-------------------------------------------------------------------------------------------------------------------
B12007_001E  | MEDIAN AGE AT FIRST MARRIAGE   | !! !! Estimate Median age at first marriage -- Male      | float
B12007_002E  | MEDIAN AGE AT FIRST MARRIAGE   | !! !! Estimate Median age at first marriage -- Female    | float
-------------------------------------------------------------------------------------------------------------------


Then we can download the variables we choose into a pandas data frame by using the codes above.

In [6]:
marriage_2019 = censusdata.download('acs5', 2019,
                   censusdata.censusgeo([('state', '*')]),
                    ['B12007_001E', 'B12007_002E'])
marriage_2019.head()

Unnamed: 0,B12007_001E,B12007_002E
"Alabama: Summary level: 040, state:01",28.5,26.7
"Alaska: Summary level: 040, state:02",29.2,26.4
"Arizona: Summary level: 040, state:04",29.9,27.8
"Arkansas: Summary level: 040, state:05",27.2,25.7
"California: Summary level: 040, state:06",30.8,29.0


## Streamlining this process

To reformat this table and others, we wrote the following function. We can follow the same process to find other tables and variables of interest, and then plug that information into this function to get a nicer table.

In [7]:
"""
This function downloads the specified tables from the specified years and reformats them

Inputs: codes - a list of the table codes
        names - what to rename the variables for each table code
        years - years to get a table from
        
Output: one dataframe with the requested data compiling the different years
"""
def get_tables(codes, names, years):
    tables = []
    for year in years:
        #Get table
        df = censusdata.download('acs5', year,
                   censusdata.censusgeo([('state', '*')]),
                    codes)
        
        #Rename columns
        name_dict = dict(zip(codes, names))
        name_dict['index'] = 'State'
        df = df.reset_index() #Turns row names into row
        df = df.rename(columns = name_dict)
        
        #Shorten states column to state name
        df = df.astype({'State':'str'})
        df['State'] = df['State'].str.split(':').str.get(0) 
        
        #Add column for year
        df['Year'] = year
        
        tables.append(df)
    return pd.concat(tables)

In [8]:
marriage = get_tables(['B12007_001E', 'B12007_002E'], ['Male age', 'Female age'], [2009, 2014, 2019])
marriage.head()

Unnamed: 0,State,Male age,Female age,Year
0,Alaska,27.2,25.2,2009
1,Alabama,26.8,25.3,2009
2,Arkansas,25.8,24.3,2009
3,Arizona,27.8,25.8,2009
4,California,28.8,26.8,2009


In [9]:
household_type = get_tables(['B11001_001E', 'B11001_002E','B11001_003E','B11001_004E','B11001_007E','B11001_008E','B11001_009E'], ['Total', 'Total Family','Married-couple Family', 'Single Householder, no spouse','Total Nonfamily','Nonfamily Living Alone','Nonfamily Not Alone'], [2009, 2014, 2019])

In [10]:
divorces = get_tables(['B12503_001E','B12503_003E','B12503_005E','B12503_006E','B12503_008E','B12503_010E','B12503_011E'],['Total','Male Never Married', 'Male Married; Divorced Last Year', 'Male Married; Not Divorced Last Year','Female Never Married','Female Married; Divorced Last Year','Female Married; Not Divorced Last Year'],[2012, 2019])

## Data visualization

Now we can make charts from the data.

In [11]:
from matplotlib import pyplot as plt
import plotly.io as pio
from plotly import express as px

### 1. Median Age of Marriage in California

In [12]:
mar_cal = marriage[marriage['State'] == "California"]
mar_cal

Unnamed: 0,State,Male age,Female age,Year
4,California,28.8,26.8,2009
4,California,29.9,27.9,2014
4,California,30.8,29.0,2019


In [24]:
fig = px.scatter(data_frame = mar_cal, 
                x = "Year", 
                y = ["Male age", "Female age"],
                title = "Median Age of First Marriage (CA)",
                trendline = "ols", # ordinary least squares regression trendline
                width = 800,
                height = 600)

fig.show()

In [None]:
from plotly.io import write_html
write_html(fig, "geo_scatter.html")

### 2. Frequency of Different Household Types in California

In [14]:
household_cal = household_type[household_type['State']=="California"]
household_cal

Unnamed: 0,State,Total,Total Family,Married-couple Family,"Single Householder, no spouse",Total Nonfamily,Nonfamily Living Alone,Nonfamily Not Alone,Year
4,California,12187191,8333690,6085094,2248596,3853501,2993951,859550,2009
4,California,12617280,8666286,6195938,2470348,3950994,3041390,909604,2014
4,California,13044266,8958436,6491236,2467200,4085830,3106104,979726,2019


In [15]:
mc_percentage = household_cal['Married-couple Family'] / household_cal['Total']
sh_percentage = household_cal['Single Householder, no spouse'] / household_cal['Total']
nla_percentage = household_cal['Nonfamily Living Alone'] / household_cal['Total']
nna_percentage = household_cal['Nonfamily Not Alone'] / household_cal['Total']
year = household_cal['Year']

household_percentage = pd.DataFrame({
    'Married-couple Family': mc_percentage,
    'Single Householder': sh_percentage, 
    'Nonfamily Living Alone': nla_percentage, 
    'Nonfamily Not Alone': nna_percentage,
    'Year': year
})

household_percentage

Unnamed: 0,Married-couple Family,Single Householder,Nonfamily Living Alone,Nonfamily Not Alone,Year
4,0.499302,0.184505,0.245664,0.070529,2009
4,0.491068,0.195791,0.24105,0.072092,2014
4,0.497631,0.189141,0.23812,0.075108,2019


In [16]:
household_percentage = household_percentage.round(decimals = 4)

In [22]:
fig = px.bar(household_percentage,
             x="Year", 
             y=["Married-couple Family","Single Householder", "Nonfamily Living Alone","Nonfamily Not Alone"],  
             title="Household Types (CA)")
fig.show()

In [23]:
from plotly.io import write_html
write_html(fig, "household_type.html")

### 3. Divorces in the Last Year in California

In [18]:
divorces_cal = divorces[divorces['State']=='California']
divorces_cal

Unnamed: 0,State,Total,Male Never Married,Male Married; Divorced Last Year,Male Married; Not Divorced Last Year,Female Never Married,Female Married; Divorced Last Year,Female Married; Not Divorced Last Year,Year
4,California,29700084,5778554,112136,8772035,4819717,129478,10088164,2012
4,California,31788280,6343459,95968,9255935,5389443,110737,10592738,2019


In [19]:
never_married_total = divorces_cal["Male Never Married"] + divorces_cal["Female Never Married"]
div_last_year = divorces_cal["Male Married; Divorced Last Year"] + divorces_cal["Female Married; Divorced Last Year"]
married_not_div = divorces_cal["Male Married; Not Divorced Last Year"] + divorces_cal["Female Married; Not Divorced Last Year"]
total = divorces_cal["Total"]
year = divorces_cal["Year"]

divorces_cal_totals = pd.DataFrame({
    'Total': total,
    'Never Married Total': never_married_total,
    'Ever Married; Divorced Last Year': div_last_year, 
    'Ever Married; Did Not Divorce Last Year': married_not_div, 
    'Year': year
})


In [20]:
never_married_per = divorces_cal_totals["Never Married Total"] / divorces_cal_totals["Total"]
married_div_per = divorces_cal_totals["Ever Married; Divorced Last Year"] / divorces_cal_totals["Total"]
married_not_div_per = divorces_cal_totals["Ever Married; Did Not Divorce Last Year"] / divorces_cal_totals["Total"]

divorces_cal_percentages = pd.DataFrame({
    'Never Married': never_married_per,
    'Ever Married; Divorced Last Year': married_div_per, 
    'Ever Married; Did Not Divorce Last Year': married_not_div_per, 
    'Year': year
})

divorces_cal_percentages
divorces_cal_percentages = divorces_cal_percentages.round(decimals = 4)
divorces_cal_percentages

Unnamed: 0,Never Married,Ever Married; Divorced Last Year,Ever Married; Did Not Divorce Last Year,Year
4,0.3568,0.0081,0.635,2012
4,0.3691,0.0065,0.6244,2019


In [21]:
fig = px.scatter(divorces_cal_percentages, 
                x = "Year", 
                y = ["Never Married", "Ever Married; Divorced Last Year", "Ever Married; Did Not Divorce Last Year"],
                title = "Divorces in the Past Year in CA",
                trendline = "ols", # ordinary least squares regression trendline
                width = 800,
                height = 600)

fig.show()