# Solutions to Exercise 4

This notebook contains solutions to [Exercise 4](../Exercise%204.ipynb).

<br><br><br><br>

In [1]:
import censusdis.data as ced
import censusdis.maps as cem
from censusdis.datasets import CBP
import censusdis.states as states

import pandas as pd
import matplotlib.pyplot as plt

pd.set_option("max_colwidth", 500)

<br><br><br><br>

## a) How many people were employed in motor vehicle manufacturing in 2022, by state?

Use data from the `CBP` data set to answer this question.

First we have to find the right NAICS2017 code. We can do this by walking down the tree or by
downloading the whole tree and doing a text search.

### Option 1: Walk the tree

In [2]:
# Find the NAICS2017 code for "motor vehicle manufacturing".
df_us_2 = ced.download(
    CBP,
    2022,
    ["NAICS2017", "NAICS2017_LABEL", "NAME", "EMP"],
    us='*',
    query_filter={'INDLEVEL': 2}
)

In [3]:
df_us_2

Unnamed: 0,INDLEVEL,US,NAICS2017,NAICS2017_LABEL,NAME,EMP
0,2,1,00,Total for all sectors,United States,135748407
1,2,1,11,"Agriculture, forestry, fishing and hunting",United States,168634
2,2,1,21,"Mining, quarrying, and oil and gas extraction",United States,508023
3,2,1,22,Utilities,United States,645214
4,2,1,23,Construction,United States,7361847
5,2,1,31-33,Manufacturing,United States,12188330
6,2,1,42,Wholesale trade,United States,6143230
7,2,1,44-45,Retail trade,United States,15922438
8,2,1,48-49,Transportation and warehousing,United States,6108121
9,2,1,51,Information,United States,3634012


In [4]:
df_us_3 = ced.download(
    CBP,
    2022,
    ["NAICS2017", "NAICS2017_LABEL", "NAME", "EMP"],
    us='*',
    query_filter={'INDLEVEL': 3, 'NAICS2017': '3*'}
)

In [5]:
df_us_3

Unnamed: 0,INDLEVEL,US,NAICS2017,NAICS2017.1,NAICS2017_LABEL,NAME,EMP
0,3,1,311,311,Food manufacturing,United States,1652378
1,3,1,312,312,Beverage and tobacco product manufacturing,United States,280521
2,3,1,313,313,Textile mills,United States,83740
3,3,1,314,314,Textile product mills,United States,105699
4,3,1,315,315,Apparel manufacturing,United States,70315
5,3,1,316,316,Leather and allied product manufacturing,United States,25106
6,3,1,321,321,Wood product manufacturing,United States,446052
7,3,1,322,322,Paper manufacturing,United States,355323
8,3,1,323,323,Printing and related support activities,United States,389036
9,3,1,324,324,Petroleum and coal products manufacturing,United States,100614


In [6]:
df_us_4 = ced.download(
    CBP,
    2022,
    ["NAICS2017", "NAICS2017_LABEL", "NAME", "EMP"],
    us='*',
    query_filter={'INDLEVEL': 4, 'NAICS2017': '336*'}
)

In [7]:
df_us_4

Unnamed: 0,INDLEVEL,US,NAICS2017,NAICS2017.1,NAICS2017_LABEL,NAME,EMP
0,4,1,3361,3361,Motor vehicle manufacturing,United States,271838
1,4,1,3362,3362,Motor vehicle body and trailer manufacturing,United States,178268
2,4,1,3363,3363,Motor vehicle parts manufacturing,United States,572859
3,4,1,3364,3364,Aerospace product and parts manufacturing,United States,421482
4,4,1,3365,3365,Railroad rolling stock manufacturing,United States,24409
5,4,1,3366,3366,Ship and boat building,United States,146282
6,4,1,3369,3369,Other transportation equipment manufacturing,United States,39881


So we have found motor vehicle manufacturing and the code is 3361.

### Search all nodes in the tree

This is the alternate approach in which we download the whole tree
and then search the text.

In [8]:
df_us = ced.download(
    CBP,
    2022,
    ["NAICS2017", "NAICS2017_LABEL", "NAME", "EMP"],
    us='*',
)

In [9]:
df_us[df_us['NAICS2017_LABEL'].str.contains('motor vehicle manufacturing', case=False)]

Unnamed: 0,US,NAICS2017,NAICS2017_LABEL,NAME,EMP
807,1,3361,Motor vehicle manufacturing,United States,271838
808,1,33611,Automobile and light duty motor vehicle manufacturing,United States,233453


### Now try 3361 for all states

In [10]:
df_states = ced.download(
    CBP,
    2022,
    ["NAICS2017_LABEL", "NAME", "EMP"],
    state="*",
    query_filter={"NAICS2017": "3361"},
)

In [11]:
df_states

Unnamed: 0,NAICS2017,STATE,NAICS2017_LABEL,NAME,EMP
0,3361,1,Motor vehicle manufacturing,Alabama,18355
1,3361,4,Motor vehicle manufacturing,Arizona,2103
2,3361,17,Motor vehicle manufacturing,Illinois,15284
3,3361,55,Motor vehicle manufacturing,Wisconsin,2274
4,3361,5,Motor vehicle manufacturing,Arkansas,23
5,3361,6,Motor vehicle manufacturing,California,26220
6,3361,8,Motor vehicle manufacturing,Colorado,116
7,3361,12,Motor vehicle manufacturing,Florida,1464
8,3361,13,Motor vehicle manufacturing,Georgia,4563
9,3361,16,Motor vehicle manufacturing,Idaho,149


<br><br><br><br>

## b) What are the top 10 states?

In [12]:
df_states.nlargest(10, "EMP")

Unnamed: 0,NAICS2017,STATE,NAICS2017_LABEL,NAME,EMP
14,3361,26,Motor vehicle manufacturing,Michigan,41251
5,3361,6,Motor vehicle manufacturing,California,26220
10,3361,18,Motor vehicle manufacturing,Indiana,23400
13,3361,21,Motor vehicle manufacturing,Kentucky,22926
23,3361,39,Motor vehicle manufacturing,Ohio,22432
0,3361,1,Motor vehicle manufacturing,Alabama,18355
29,3361,48,Motor vehicle manufacturing,Texas,15461
27,3361,45,Motor vehicle manufacturing,South Carolina,15358
2,3361,17,Motor vehicle manufacturing,Illinois,15284
28,3361,47,Motor vehicle manufacturing,Tennessee,11690
