# Notebook for creating the figures and data for "S Jobs Streak: Now Second Longest in 86 years"

This notebook provides the code for replicating the figures and data from the article "[US Jobs Streak: Now Second Longest in 86 Years](https://rickecon.substack.com/p/us-jobs-streak-now-second-longest)" by [Richard W. Evans](https://sites.google.com/site/rickecon) (GitHub: [@rickecon](https://github.com/rickecon), X: [@RickEcon](https://twitter.com/RickEcon), Substack: [Econosseur](https://rickecon.substack.com/)). A GitHub repository for the analyses in this article has been created at https://github.com/OpenSourceEcon/USempl-Streaks-2025-01.

## 0. Preliminaries: Different ways to replicate the analyses

### 0.1. (Easiest, least flexible option) Install the usempl-plots package from PyPI.org

The easiest way to run the analyses in the article, "US Labor Market: The Resilient 'Streak'er" is to download the [`usempl-plots`](https://pypi.org/project/usempl-plots/) Python package from PyPi.org. This is the approach we use in this notebook.

One drawback to this approach is that you are limited to the options in the functions and modules in the `usempl-plots` package. But these options are sufficient for the analyses in the article.

The approach described in Section 0.2 is more flexible and is likely the most preferred option for someone who knows how to program in Python and who wants to customize the output.

### 0.2. (Hardest, most flexible option) Fork and clone the repository, create the environment, and install the package from your hard drive

This approach allows you to update and customize the functions and modules in the `usempl-plots` package. This is also the best way to contribute fixes and updates back to the `usempl-plots` package repository (https://github.com/OpenSourceEcon/usempl-plots) to improve and expand it (see Section 0.3).

1. Make sure you have Python installed on your computer.
    - Navigate to your terminal and type: `python --version`
    - If you do not have Python, download the free Anaconda distribution of Python from Anaconda.com. Follow the instructions at their download page, https://www.anaconda.com/download. This will require you to give them your email address.
2. Make sure you have Git version control software on your computer.
    - Navigate to your terminal and type: `git version`
    - If nothing comes up, install Git on your computer by following the correct instructions for your operating system at https://git-scm.com/book/en/v2/Getting-Started-Installing-Git.
    - After Git is installed, set up some basic configuration settings such as your name: `git config --global user.name "Your Name"`; your email: `git config --global user.email yourname@example.com`; and an easy editor that you can use from your terminal: `git config --global core.editor vim`.
3. If you don't already have one, sign up for a GitHub account by going to https://github.com/, selecting "Sign Up", and following the instructions.
    - I recommend that you choose a GitHub handle that is relatively short (less than 10 characters) and isn't too wild. This is the handle by which you will be known across all of your opensource interactions. For example, my GitHub handle is [@rickecon](https://github.com/rickecon). Other people versions of their first and/or last name.
4. Fork the `usempl-plots` GitHub repository, which simply means that you are making a copy of it in the cloud on your personal GitHub account. In your internet browser, go to the URL https://github.com/OpenSourceEcon/usempl-plots and click the "Fork" button in the upper-right area of the screen. In the "Owner*" dropdown on that page, make sure your personal GitHub account is selected. This will make a copy of the repository on your GitHub account in the cloud.
5. Clone your forked repository in the cloud to your local computer. Cloning is simply the Git functionality term for downloading the code in the repository to your local computer in a way that sets it up as a Git repository on you machine in which the Git software tracks any changes you make.
    - Go to your terminal and navigate to a folder where you want to place a Git repository. Make sure this is not in a Google Drive, Dropbox, or iCloud folder that is copied back and forth from the web.
    - Once you are in that directory on your computer, type the following. Note that you are copying the code from your account's copy (fork) of the repository: `git clone https://github.com/[YourGitHubHandle]/usempl-plots.git`
6. Navigate to the new directory on your local hard drive for this new Git repsitory: `cd usempl-plots.git`
7. Create a conda environment for this repository. This is a set of packages and versions that is constant across operating systems and hardware: `conda env create -f environment.yml`
8. Activate the conda environment: `conda activate usempl-plots-dev`
9. Install the `usempl-plots` package from your hard drive: `pip install -e .`

Now you are ready to run the analyses below, and you don't need the step at the beginning of Section 1 that executes the command: `!pip install usempl-plots`.

### 0.3. Contributing to the usempl-plots package

If you follow the approach of Section 0.2 of forking the [`usempl-plots` repository](https://github.com/OpenSourceEcon/usempl-plots) (https://github.com/OpenSourceEcon/usempl-plots), you might find errors or inefficiencies in the code. Or you may find augmentations that make the code more useful or expand the scope of its functionalities.

I encourage you to ask any questions about the code or make any suggested changes by either submitting an issue to the GitHub repository (https://github.com/OpenSourceEcon/usempl-plots/issues) or submitting a pull request of code changes (https://github.com/OpenSourceEcon/usempl-plots/pulls).

In [1]:
# Import packages
import pandas as pd
import numpy as np
import os
from usempl_plots import usempl_streaks
from usempl_plots.tseries_payems import gen_payems_tseries
from usempl_plots.usempl_npp import usempl_npp
from usempl_plots.usempl_industry import usempl_ind_chg
from bokeh.io import output_notebook

In [2]:
repo_path = (
    "/Users/richardevans/Docs/Economics/OSE/Substack/USempl-Streaks-2025-01"
)
image_dir = os.path.join(repo_path, "images")
data_dir = os.path.join(repo_path, "data")

## 1. Create statistics in paragraphs before Figure 1

## 2. Create Figure 1 (Time series of US total nonfarm employment)

In [5]:
output_notebook()
# fig1_title_str = None
fig1_title_str = (
    "Figure 1. US Total Monthly Nonfarm Payroll Employment (PAYEMS), 1919-2024"
)
end_date_str="2025-01-17"
fig1, beg_date_str, end_date_str2 = gen_payems_tseries(
    end_date=end_date_str,
    fig_title_str=fig1_title_str,
    save_plot=image_dir
)

Beginning date of U.S. employment series is 1919-07-01
End date of U.S. employment series is 2024-12-01
PAYEMS data downloaded on 2025-01-17 and has most recent PAYEMS data month of 2024-12-01.


## 1. Create Figure 1 (US employment streaks), Figure 2 (US employment streaks scatterplot), and Table 1 (Top 7 streaks)

In [2]:
# Import packages
from usempl_plots import usempl_streaks
from usempl_plots.tseries_payems import gen_payems_tseries
from usempl_plots.usempl_npp import usempl_npp
from usempl_plots.usempl_industry import usempl_ind_chg
from bokeh.io import output_notebook

The following code will print Figures 1 and 2. And the output for Table 1 is printed out in the output from executing the following cell as part of the `usempl_streaks()` function.

In [3]:
output_notebook()
fig_lst, beg_date_str, end_date_str = usempl_streaks()

Beginning date of U.S. employment series is 1939-01-01
End date of U.S. employment series is 2024-04-01
PAYEMS data downloaded on 2024-05-03 and has most recent PAYEMS data month of 2024-04-01.
Number of streaks: 83

Print a table of streaks with +40 months or +10,000,000 jobs.

    strk_num start_date end_date  months_in_streak  total_emp_gain  \
80        81    2010-10  2020-02               113      21969000.0   
59        60    1986-07  1990-06                48      10702000.0   
76        77    2003-09  2007-06                46       7919000.0   
53        54    1975-07  1979-03                45      12958000.0   
82        83    2021-01  2024-04                40      15768000.0   
4          5    1940-08  1943-04                33      10705000.0   
81        82    2020-05  2020-11                 7      12340000.0   

    avg_monthly_emp_gain  
80          1.944159e+05  
59          2.229583e+05  
76          1.721522e+05  
53          2.879556e+05  
82          3.942000e+05

## 2. Create Figure 3 (Time series of PAYEMS US nonfarm employment)

In [4]:
fig3, beg_date_str, end_date_str = gen_payems_tseries()

Beginning date of U.S. employment series is 1919-07-01
End date of U.S. employment series is 2024-04-01
PAYEMS data downloaded on 2024-05-03 and has most recent PAYEMS data month of 2024-04-01.


## 3. Create Figure 4 (Normalized peak plot)

In [5]:
fig4, end_date_str = usempl_npp()

You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  usempl_df["PAYEMS"].iloc[:242] = (
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  usempl_df["PAYEMS"].iloc[:24

End date of U.S. employment series is 2024-04-01
peak_val 0 is 31324.0 on date 1929-07-01 (Beg. rec. month: Aug 1929 )
peak_val 1 is 31011.0 on date 1937-07-01 (Beg. rec. month: May 1937 )
peak_val 2 is 41897.0 on date 1945-02-01 (Beg. rec. month: Feb 1945 )
peak_val 3 is 45294.0 on date 1948-09-01 (Beg. rec. month: Nov 1948 )
peak_val 4 is 50536.0 on date 1953-07-01 (Beg. rec. month: Jul 1953 )
peak_val 5 is 53128.0 on date 1957-08-01 (Beg. rec. month: Aug 1957 )
peak_val 6 is 54813.0 on date 1960-04-01 (Beg. rec. month: Apr 1960 )
peak_val 7 is 71451.0 on date 1970-03-01 (Beg. rec. month: Dec 1969 )
peak_val 8 is 78636.0 on date 1974-07-01 (Beg. rec. month: Nov 1973 )
peak_val 9 is 90994.0 on date 1980-03-01 (Beg. rec. month: Jan 1980 )
peak_val 10 is 91601.0 on date 1981-07-01 (Beg. rec. month: Jul 1981 )
peak_val 11 is 109857.0 on date 1990-06-01 (Beg. rec. month: Jul 1990 )
peak_val 12 is 132786.0 on date 2001-02-01 (Beg. rec. month: Mar 2001 )
peak_val 13 is 138397.0 on date 2008

## 4. Calculate average time (in months) between recessions since 1945

From the [recession_data.csv](https://github.com/OpenSourceEcon/usempl-plots/blob/main/usempl_plots/data/recession_data.csv) data, which is taken from the [NBER Business Cycle Dating Committee peak and trough data](https://www.nber.org/research/data/us-business-cycle-expansions-and-contractions), we have the following economic peaks and troughs.

| Peak date  | Trough date | months to next recession|
| :---: | :---:  | :---: |
| 1945-02 | 1945-10 |  37 |
| 1948-11 | 1949-10 |  45 |
| 1953-07 | 1954-05 |  39 |
| 1957-08 | 1958-04 |  24 |
| 1960-04 | 1961-02 | 106 |
| 1969-12 | 1970-11 |  36 |
| 1973-11 | 1975-03 |  58 |
| 1980-01 | 1980-07 |  12 |
| 1981-07 | 1982-11 |  92 |
| 1990-07 | 1991-03 | 120 |
| 2001-03 | 2001-11 |  97 |
| 2007-12 | 2009-06 | 128 |
| 2020-02 | 2020-04 |  48 |


We calculate the months between recessions as the number of months between the current recession trough (end of current recession) and next recession peak (beginning of next recession). For example, the number of months between the 1945-02 to 1945-10 recession and the 1948-11 to 1949-10 recession is 37 months (1948-11 peak minus 1945-10 trough).

The months to next recession in the last row of the table is calculated as the month of the most recent employment data 2024-04 minus the trough of the last recession 2020-04.

In [6]:
mths_btw_lst = [37, 45, 39, 24, 106, 36, 58, 12, 92, 120, 97, 128, 48]
avg_months = sum(mths_btw_lst) / len(mths_btw_lst)
print("Average months between recessions: {:.1f}".format(avg_months))

Average months between recessions: 64.8


## 5. Create Table 2, job growth by industry
The output from the following command produces both the changes in the 

In [7]:
usempl_ind_chg()

Total jobs created from Sep. 2003 to Apr. 2024:  28034000

Percent change in jobs from Sep. 2003 to Apr. 2024: 0    21.522894
Name: pctchg_Sep03_Apr24, dtype: float64

                                     Industry      Sep03      Apr24  \
0                               Total nonfarm  130252000  158286000   
1                               Total private  108748000  135015000   
2                             Goods Producing   21700000   21821000   
3                          Mining and Logging     570000     641000   
4                                Construction    6783000    8219000   
5                               Manufacturing   14347000   12961000   
6                   Private Service Providing   87048000  113194000   
7                             Wholesale Trade    5537400    6169700   
8                                Retail Trade   14911800   15677900   
9              Transportation and Warehousing    4162100    6575800   
10                                  Utilities     5

  int(


Unnamed: 0,Industry,Sep03,Apr24,Total change,Pct change,diff_Sep03_Apr24,pctchg_Sep03_Apr24
0,Total nonfarm,130252000,158286000,28034000,0.215229,28034000,21.522894
1,Total private,108748000,135015000,26267000,0.24154,26267000,24.154007
2,Goods Producing,21700000,21821000,121000,0.005576,121000,0.557604
3,Mining and Logging,570000,641000,71000,0.124561,71000,12.45614
4,Construction,6783000,8219000,1436000,0.211706,1436000,21.170573
5,Manufacturing,14347000,12961000,-1386000,-0.096606,-1386000,-9.660556
6,Private Service Providing,87048000,113194000,26146000,0.300363,26146000,30.036302
7,Wholesale Trade,5537400,6169700,632300,0.114187,632300,11.418716
8,Retail Trade,14911800,15677900,766100,0.051375,766100,5.137542
9,Transportation and Warehousing,4162100,6575800,2413700,0.579924,2413700,57.99236
