# TASK 3: LAUS data


Steps 1-9 were done manually -- resulting in 4 data files stored under folder `unemployment_data/`. In this notebook we pull in these data files and perform steps 10 and 11. 
 
Navigate to [https://www.bls.gov/lau/data.htm](https://www.bls.gov/lau/data.htm)
under databases, Local Area Unemployment Statistics, select “Multiscreen” (the yellow icon)
 
1. (Screen 1 of 6) For each of the following groups, select the states (we're doing them all except the alst 2 -- Census Regions & Puerto Rico).
    - states 1-20:
    - states 21-35: Kentucky -- New Mexico
    - states 36 - 50: NY -- Vermont 
    - states 51 - 56: Virginia -- Wyoming

2. (Screen 2 of 6) select “F Counties and Equivalents”; click next form
 
3. (Screen 3 of 6) Select all counties; click next form
 
4. (Screen 4 of 6) select “03 Unemployment rate”; select next form
 
5. (Screen 5 of 6) check the box marked “Not Seasonally adjusted”; select next form
 
6. (Screen 6 of 6) Select all series IDS available and select retrieve data
 
 
7.   On screen labeled “Databases, Tables, & Calculators by Subject”: In the upper right hand of screen select the hyperlinked “More formatting options”
 
    - specify year range 2003-2020
    - Under “Select view of the data” change to “Multi-series table”, leave original data value checked
 
    - Select retrieve data
 
8. Download as an .xlsx file

9. Repeat steps 1-8 until all the states in step 1 are covered.
10. Then all state-county-month-years from 2003-2020 will be accounted for.  Aggregate into 1 database
 
11. The series ID contains the FIPS code – it is the first 5 numbers after the letters “LAUCN”; optional to extract

## Step 10 & 11: Aggregate the files into one file, add FIPS column

In [30]:
import pandas as pd
import glob
import os

path = './unemployment_data/' # use your path
all_files = glob.glob(os.path.join(path , "*.xlsx"))

print(all_files)

li = []

for filename in all_files:
    # df = pd.read_table(filename)
    df=pd.read_excel(filename, header = 3)
    df['FIPS'] = df[['Series ID']].apply(lambda x: x.str[5:10])
    
    # display(df[['FIPS', 'Series ID']])
    df['FIPS'] = df[['FIPS']]
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

frame

['./unemployment_data/SeriesReport-21-35.xlsx', './unemployment_data/SeriesReport-1-20.xlsx', './unemployment_data/SeriesReport-51-56.xlsx', './unemployment_data/SeriesReport-states-36-50.xlsx']


  warn("Workbook contains no default style, apply openpyxl's default")
  warn("Workbook contains no default style, apply openpyxl's default")
  warn("Workbook contains no default style, apply openpyxl's default")
  warn("Workbook contains no default style, apply openpyxl's default")


Unnamed: 0,Series ID,Jan\n2003,Feb\n2003,Mar\n2003,Apr\n2003,May\n2003,Jun\n2003,Jul\n2003,Aug\n2003,Sep\n2003,...,Apr\n2020,May\n2020,Jun\n2020,Jul\n2020,Aug\n2020,Sep\n2020,Oct\n2020,Nov\n2020,Dec\n2020,FIPS
0,LAUCN210010000000003,6.2,6.5,6.1,5.3,5.3,6.2,7.5,5.1,4.7,...,16.5,12.9,6.0,6.2,5.0,4.9,4.4,4.2,4.7,21001
1,LAUCN210030000000003,7.2,7.1,6.8,5.9,6.9,7.5,6.6,6.9,6.8,...,13.7,10.7,4.8,5.2,3.9,3.7,3.5,3.1,3.7,21003
2,LAUCN210050000000003,5.0,5.2,5.1,4.6,4.5,4.9,5.7,4.3,4.1,...,15.0,11.8,5.4,5.6,4.4,4.5,3.9,3.6,3.9,21005
3,LAUCN210070000000003,7.9,8.4,8.1,7.8,9.2,8.6,7.3,7.1,7.8,...,15.1,11.2,5.1,5.7,4.5,4.3,4.0,3.9,4.6,21007
4,LAUCN210090000000003,5.6,5.7,5.4,5.4,5.0,5.7,6.6,5.2,4.4,...,18.6,14.5,6.3,6.7,5.4,5.4,5.3,4.9,5.6,21009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3141,LAUCN500190000000003,8.8,8.8,8.2,8.1,5.4,5.6,5.3,5.1,4.8,...,19.4,13.9,9.3,8.3,6.2,5.7,3.9,4.5,5.5,50019
3142,LAUCN500210000000003,4.8,4.6,4.9,5.1,5.2,5.5,5.0,4.8,4.8,...,17.6,12.7,9.8,8.2,6.1,5.9,4.1,4.5,5.1,50021
3143,LAUCN500230000000003,6.0,5.6,5.3,4.9,4.0,4.6,4.3,3.8,4.0,...,12.4,9.1,7.0,6.1,4.3,4.0,2.7,3.1,3.7,50023
3144,LAUCN500250000000003,3.8,3.5,3.7,4.4,4.2,4.6,4.1,3.9,4.1,...,15.4,12.8,10.3,8.9,6.7,6.4,4.4,5.0,5.5,50025


### Let's also clean up the title columns

To get rid of the "\n" characters, as well as bring the new FIPS column to the front

In [35]:
# import pip
# pip.main(["install", "openpyxl"])

frame.columns = frame.columns.str.replace("\n", " ", regex=True)

# make the fips column first 
cols = frame.columns.tolist()
cols = cols[-1:] + cols[:-1]
frame = frame[cols]

frame

Unnamed: 0,FIPS,Series ID,Jan 2003,Feb 2003,Mar 2003,Apr 2003,May 2003,Jun 2003,Jul 2003,Aug 2003,...,Mar 2020,Apr 2020,May 2020,Jun 2020,Jul 2020,Aug 2020,Sep 2020,Oct 2020,Nov 2020,Dec 2020
0,21001,LAUCN210010000000003,6.2,6.5,6.1,5.3,5.3,6.2,7.5,5.1,...,6.7,16.5,12.9,6.0,6.2,5.0,4.9,4.4,4.2,4.7
1,21003,LAUCN210030000000003,7.2,7.1,6.8,5.9,6.9,7.5,6.6,6.9,...,4.9,13.7,10.7,4.8,5.2,3.9,3.7,3.5,3.1,3.7
2,21005,LAUCN210050000000003,5.0,5.2,5.1,4.6,4.5,4.9,5.7,4.3,...,4.9,15.0,11.8,5.4,5.6,4.4,4.5,3.9,3.6,3.9
3,21007,LAUCN210070000000003,7.9,8.4,8.1,7.8,9.2,8.6,7.3,7.1,...,6.5,15.1,11.2,5.1,5.7,4.5,4.3,4.0,3.9,4.6
4,21009,LAUCN210090000000003,5.6,5.7,5.4,5.4,5.0,5.7,6.6,5.2,...,5.5,18.6,14.5,6.3,6.7,5.4,5.4,5.3,4.9,5.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3141,50019,LAUCN500190000000003,8.8,8.8,8.2,8.1,5.4,5.6,5.3,5.1,...,5.2,19.4,13.9,9.3,8.3,6.2,5.7,3.9,4.5,5.5
3142,50021,LAUCN500210000000003,4.8,4.6,4.9,5.1,5.2,5.5,5.0,4.8,...,2.6,17.6,12.7,9.8,8.2,6.1,5.9,4.1,4.5,5.1
3143,50023,LAUCN500230000000003,6.0,5.6,5.3,4.9,4.0,4.6,4.3,3.8,...,2.4,12.4,9.1,7.0,6.1,4.3,4.0,2.7,3.1,3.7
3144,50025,LAUCN500250000000003,3.8,3.5,3.7,4.4,4.2,4.6,4.1,3.9,...,2.2,15.4,12.8,10.3,8.9,6.7,6.4,4.4,5.0,5.5


In [32]:
len(frame.FIPS.unique())
# there are 3146 fips codes represented in the data; Puerto Rico has 78 counties.

3146

## Export the file 

In [37]:
frame.to_csv("clean/laus_data_2003-2020.csv", index=False)

In [38]:
frame

Unnamed: 0,FIPS,Series ID,Jan 2003,Feb 2003,Mar 2003,Apr 2003,May 2003,Jun 2003,Jul 2003,Aug 2003,...,Mar 2020,Apr 2020,May 2020,Jun 2020,Jul 2020,Aug 2020,Sep 2020,Oct 2020,Nov 2020,Dec 2020
0,21001,LAUCN210010000000003,6.2,6.5,6.1,5.3,5.3,6.2,7.5,5.1,...,6.7,16.5,12.9,6.0,6.2,5.0,4.9,4.4,4.2,4.7
1,21003,LAUCN210030000000003,7.2,7.1,6.8,5.9,6.9,7.5,6.6,6.9,...,4.9,13.7,10.7,4.8,5.2,3.9,3.7,3.5,3.1,3.7
2,21005,LAUCN210050000000003,5.0,5.2,5.1,4.6,4.5,4.9,5.7,4.3,...,4.9,15.0,11.8,5.4,5.6,4.4,4.5,3.9,3.6,3.9
3,21007,LAUCN210070000000003,7.9,8.4,8.1,7.8,9.2,8.6,7.3,7.1,...,6.5,15.1,11.2,5.1,5.7,4.5,4.3,4.0,3.9,4.6
4,21009,LAUCN210090000000003,5.6,5.7,5.4,5.4,5.0,5.7,6.6,5.2,...,5.5,18.6,14.5,6.3,6.7,5.4,5.4,5.3,4.9,5.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3141,50019,LAUCN500190000000003,8.8,8.8,8.2,8.1,5.4,5.6,5.3,5.1,...,5.2,19.4,13.9,9.3,8.3,6.2,5.7,3.9,4.5,5.5
3142,50021,LAUCN500210000000003,4.8,4.6,4.9,5.1,5.2,5.5,5.0,4.8,...,2.6,17.6,12.7,9.8,8.2,6.1,5.9,4.1,4.5,5.1
3143,50023,LAUCN500230000000003,6.0,5.6,5.3,4.9,4.0,4.6,4.3,3.8,...,2.4,12.4,9.1,7.0,6.1,4.3,4.0,2.7,3.1,3.7
3144,50025,LAUCN500250000000003,3.8,3.5,3.7,4.4,4.2,4.6,4.1,3.9,...,2.2,15.4,12.8,10.3,8.9,6.7,6.4,4.4,5.0,5.5
