# Final Project Ideas 1: Transportation to Work

The American Community Survey (ACS) includes questions on all kinds
of demographic, social, and economic data. Through various groups, we
can get at the data sliced and diced in all kinds of different ways.

In this notebook, we will discover that there are many groups that have
to do with how people get to work. We'll look briefly at some of this data
and then propose a number of possible project you could choose to do to 
use this data along with some of the tools you learned about in the lessons
and practiced in the exercises.

## Background: ACS Data on Transportation to Work

In [1]:
import censusdis.data as ced
from censusdis.datasets import ACS5

import pandas as pd

pd.set_option("max_colwidth", 500)
pd.set_option("display.max_rows", 100)

### What Groups Deal with Transportation to Work?

As we are about to see, there are a lot of different groups that combine data
on how people get to work with other data.

In [2]:
df_transportation_to_work_groups = ced.variables.search_groups(
    ACS5, 2022, pattern="transportation to work", case=False
)

In [3]:
df_transportation_to_work_groups

Unnamed: 0,DATASET,YEAR,GROUP,DESCRIPTION
0,acs/acs5,2022,B08006,Sex of Workers by Means of Transportation to Work
1,acs/acs5,2022,B08101,Means of Transportation to Work by Age
2,acs/acs5,2022,B08103,Median Age by Means of Transportation to Work
3,acs/acs5,2022,B08105A,Means of Transportation to Work (White Alone)
4,acs/acs5,2022,B08105B,Means of Transportation to Work (Black or African American Alone)
5,acs/acs5,2022,B08105C,Means of Transportation to Work (American Indian and Alaska Native Alone)
6,acs/acs5,2022,B08105D,Means of Transportation to Work (Asian Alone)
7,acs/acs5,2022,B08105E,Means of Transportation to Work (Native Hawaiian and Other Pacific Islander Alone)
8,acs/acs5,2022,B08105F,Means of Transportation to Work (Some Other Race Alone)
9,acs/acs5,2022,B08105G,Means of Transportation to Work (Two or More Races)


### A Slight Diversion on "for Workplace Geography"

You may have noticed in the groups above that there are often pairs of groups with similar
names, but one ends in "for Workplace Geography." This means that when the data is aggregated
by geography, such as by state, county, etc..., those groups aggregate the data by where 
people work. The groups without this suffix on their names aggregate it by where people live.

Let's look at an example to make this clear. First, compare the two groups `B08301` and `B08601`.
They measure the same thing, but the former aggregates by where people live and the latter by
where they work.

In [4]:
df_transportation_to_work_groups[
    df_transportation_to_work_groups["GROUP"].isin(["B08301", "B08601"])
]

Unnamed: 0,DATASET,YEAR,GROUP,DESCRIPTION
26,acs/acs5,2022,B08301,Means of Transportation to Work
52,acs/acs5,2022,B08601,Means of Transportation to Work for Workplace Geography


The two groups have essentially identical variables except they are aggregated differently.

In [5]:
ced.variables.search(ACS5, 2022, group_name="B08301")

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,2022,acs/acs5,B08301,B08301_001E,Estimate!!Total:,,
1,2022,acs/acs5,B08301,B08301_002E,"Estimate!!Total:!!Car, truck, or van:",,
2,2022,acs/acs5,B08301,B08301_003E,"Estimate!!Total:!!Car, truck, or van:!!Drove alone",,
3,2022,acs/acs5,B08301,B08301_004E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:",,
4,2022,acs/acs5,B08301,B08301_005E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 2-person carpool",,
5,2022,acs/acs5,B08301,B08301_006E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 3-person carpool",,
6,2022,acs/acs5,B08301,B08301_007E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 4-person carpool",,
7,2022,acs/acs5,B08301,B08301_008E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 5- or 6-person carpool",,
8,2022,acs/acs5,B08301,B08301_009E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 7-or-more-person carpool",,
9,2022,acs/acs5,B08301,B08301_010E,Estimate!!Total:!!Public transportation (excluding taxicab):,,


In [6]:
ced.variables.search(ACS5, 2022, group_name="B08601")

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,2022,acs/acs5,B08601,B08601_001E,Estimate!!Total:,,
1,2022,acs/acs5,B08601,B08601_002E,"Estimate!!Total:!!Car, truck, or van:",,
2,2022,acs/acs5,B08601,B08601_003E,"Estimate!!Total:!!Car, truck, or van:!!Drove alone",,
3,2022,acs/acs5,B08601,B08601_004E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:",,
4,2022,acs/acs5,B08601,B08601_005E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 2-person carpool",,
5,2022,acs/acs5,B08601,B08601_006E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 3-person carpool",,
6,2022,acs/acs5,B08601,B08601_007E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 4-person carpool",,
7,2022,acs/acs5,B08601,B08601_008E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 5- or 6-person carpool",,
8,2022,acs/acs5,B08601,B08601_009E,"Estimate!!Total:!!Car, truck, or van:!!Carpooled:!!In 7-or-more-person carpool",,
9,2022,acs/acs5,B08601,B08601_010E,Estimate!!Total:!!Public transportation (excluding taxicab):,,


Now let's do a query.

In [7]:
from censusdis.states import NJ
from censusdis.places.new_jersey import NEWARK_CITY

In [8]:
ced.download(
    ACS5, 2020, ["NAME", "B08301_001E", "B08601_001E"], state=NJ, place=NEWARK_CITY
)

Unnamed: 0,STATE,PLACE,NAME,B08301_001E,B08601_001E
0,34,51000,"Newark city, New Jersey",115068,163511


The `B08301_001E` column tells us that 115,068 people who live in Newark take some form of transportation to work. 
The `B08601_001E` column tells us that 163,511 people take some form of transportation to a workplace in Newark.

Some of these are the same people, and some are not. People may commute to or from Newark.

On the other hand, the people who work at home and live in Newark are exactly the same 
people who work at home and work in Newark, so the numbers should be the same.

In [9]:
ced.download(
    ACS5, 2020, ["NAME", "B08301_021E", "B08601_021E"], state=NJ, place=NEWARK_CITY
)

Unnamed: 0,STATE,PLACE,NAME,B08301_021E,B08601_021E
0,34,51000,"Newark city, New Jersey",3053,3053


# Project Ideas

Using the groups of variables above, we can answer a lot of different questions.
Here are some that you might like to explore in your project. You can dig deep on 
one or tackle several related questions. They are in no particular order.

1. What is the rate of bicycle commuting by state?
   - For each state you will need to know how many people commute by bicycle and
     how many total commuters there are.
   - Plot the results on a map.
2. In (pick the state of your choice) is there a difference between the types of
   transportation used by low-income workers and high-income workers to get to work?
   - First, answer the question at the state level.
   - Second, answer the question at the county level.
3. In (pick the state or county of your choice), who is most likely to use public transit
   to get to work?
   - what age group is most likely to use public transit to get to work?
   - what gender is most likely to use public transit to get to work?
   - what race or ethnicity is most likely to use public transit to get to work?