In [1]:
# import libraries
import pandas as pd
import numpy as np
# import seaborn as sns
# import matplotlib.pyplot as plt

## Poject: <a style="color:purple;">**Macroeconomic Analysis of the US Economy**</a> - part 2

#### Author: @**engine**, Date: November 2024

### Abstract:
TODO

### 1. **Defining the problem and main assumptions**

In the **first part** of this study, looking at the distribution of funds from the US budget, we did not see good prospects in the future. It is not invested in the development of the USA, but in the wars, their consequences and the interest on the loans taken.
The share of interest payments on borrowed loans is also very large. The almost direct linear dependence of the expansion of the economy on inflation (printing money) and borrowing was also shown. The latter, since the dollar is a world currency, means printing money again. As a result of all that has been said so far, we can safely summarize that the American economy is addicted to printing new money. The trade balance is strongly negative with a worsening trend. Science is not a priority. If for some reason the printing of new money stops and/or the dollars of the world go "home" to the USA, we are waiting for "Great Depression - 2".

From here, we can ask ourselves the following question: <a style="color:purple;">**Can we use ML to predict approximately when the US economy will go into recession?**</a><br>The **second part** of the present study will be based on the answer to this question.

There are **2 main theoretical assumptions** at play:<br>
1. U.S. recessions exhibit markers / early warning signs.<br>
There exist plenty of recession “signals” in the form of individual economic or market data series. While individually, these signals have limited information value, they may become more useful when combined together.
2. Future recessions will be similar to historical recessions.<br>
This assumption is a lot more shaky, but can be mitigated by choosing features that maintain significance despite the changing economic landscape. For example, focusing on manufacturing data may have been relevant historically, but may be less relevant going forward as the world goes digital.

**What Have Others Tried?**

 - **Guggenheim Partners** has 2 recession related indicators: a Recession Probability Model and a Recession Dashboard, both driven by a combination of economic and market indicators. They try to predict recession probabilities across 3 different time frames.
 - **New York FED** predicts recession probabilities. Its limitations are that it only provides a 12-month forecast, and it only relies on 1 variable (the spread between the 10-year and 3-month Treasury rates).
 - **Rabobank** has a recession probability model, but it is also only based on 1 variable (the spread between the 10-year and 1-year Treasury rates), and only covers one time period (17-months).
 - **Wells Fargo Economics** has a few recession probability models that use a combination of economic and market data. However, they only limit their forecast to a 6-month horizon. [1]

**Model Benchmarking / Comparison**

Ideally, I would compare my model performance to each of the alternatives above. Currently, I cannot do this for the following reasons:

 - **Guggenheim model**: Model performance data is not publicly released.
 - **New York FED** model: Upon closer inspection, their model is built to answer the question “what is the probability that the U.S. will be in a recession X months from now?”, whereas my model is built to answer the question “what is the probability that the U.S. will be in a recession within the next X months?”.
 - **Rabobank model**: The same reason. Additionally, the Rabobank model covers a 17-month time period, whereas my model covers 6-month and 12-month time periods.
 - **Wells Fargo Economics** model: Model performance data is not publicly released.

### 2. **Preparing the Data**

#### 2.1 Criteria according to which the data for the present study were collected.

Some things I had to consider when getting the data:

 - **Economic data are released at different frequencies** (weekly, monthly, quarterly, etc.). To time-match data points, I settled for only using data that could be sampled monthly. As a result, all predictions must be conducted at a monthly frequency. This means that GDP data, as it comes out on a quarterly basis, will be approximated to a monthly basis.
 - **Varying data history lengths**. Some data has been released since 1919, while other data only goes back a few years. This means I had to exclude potentially useful data that just didn’t have enough history.
 - **Speaking of history**, I needed enough data to encompass as many recessions as I could. In the end, the full data set included 8 recessions since 1966.
 - **Economic data gets revised often**. FRED (Federal Reserve Economic Data) does not provide the original figures. It only provides the revised figures (no matter how far after-the-fact those revisions are made).
 - **Rare-event prediction**: Recessions are rare.
 - **Small data set**: Because I am using economic data (which is updated at a frequency of months or quarters), I will only have a few hundred data points to work with.
 - For practical reasons, I used mostly **public domain data** available through **[FRED](https://fred.stlouisfed.org/)** and **[Yahoo Finance](https://finance.yahoo.com/)**. I did not use potentially useful data that is stuck behind a paywall, such as [The Conference Board Leading Economic Index](https://www.conference-board.org/topics/us-leading-indicators).

#### 2.2 Feature Selection.

Some project-specific considerations for feature selection: <br> 
- **Curse of Dimensionality**. Since the data set is so small (only a few hundred data points), one cannot include too many features in the final model. Otherwise, the model will fail to generalize to out-of-sample data. Therefore, features must be carefully chosen for the incremental value that each feature provides.
- **Domain knowledge is key**. Since the underlying process is a complex time series, automated feature selection methods have a high risk of over-fitting to prior data. Therefore, feature selection must be guided by a solid understanding of economic fundamentals.

**A general outline for feature-selection process**:

 - Define the data set on which to perform exploratory data analysis (Jan 1966 to Apr 1968, May 1968 to Sep 1977, Oct 1977 to Mar 1972) to ensure no intersection with cross-validation periods.
 - Organize potential features into buckets, based on economic / theoretical characteristics (domain knowledge).
 - Plot pairwise correlations between each individual feature and each output type (6-month ahead, 12-month ahead, 24-month ahead) using the exploratory data set only (no peeking ahead!).
 - Move sequentially from feature bucket to feature bucket, such that each bucket has at least one feature in the final data set. For tie-breakers, pick features that have low correlations to features that have already been “accepted” into the final data set.

Here is how it went:

First, a sneak peek at the final feature list. Note that only 8 features made it to the final list:

<table>
  <caption>
    Feature table
  </caption>
  <thead style="background-color: purple; text-align: left">
    <tr>
      <th scope="col" style="color: white">Type</th>
      <th scope="col" style="color: white">Base indicator</th>
      <th scope="col" style="color: white">Index</th>     
      <th scope="col" style="color: white">Modification</th>
    </tr>
  </thead>
  <tbody style="text-align: left">
    <tr>
      <th scope="row">Market</th>
      <td>Personal Consumption Expenditures</td>
      <td>PCEPI</td>
      <td>12 month change</td>
    </tr>
    <tr>
      <th scope="row">Market</th>
      <td>Industrial Production Index</td>
      <td>INDPRO</td>
      <td>12 month change</td>
    </tr>
    <tr>
      <th scope="row">Stock Market</th>
      <td>S&P 500 Index</td>
      <td>SPY500</td>
      <td>12 month change</td>
    </tr>
    <tr>
      <th scope="row">Bond Market</th>
      <td>10 year Treasury Bond</td>
      <td>TR10</td>
      <td>12 month change</td>
    </tr>
    <tr>
      <th scope="row">Bond Market</th>
      <td>Slope of the yield curve</td>
      <td>T10YFF</td>
      <td>12 month change</td>
    </tr>
      <tr>
      <th scope="row">Economic activity</th>
      <td>Real Gross Domestic Product</td>
      <td>GDPC1</td>
      <td>12 month change</td>
    </tr>
    <tr>
      <th scope="row">Unemployment</th>
      <td>Monthly Unemployment Rate</td>
      <td>UNRATE</td>
      <td>12 month change</td>
    </tr>
    <tr>
      <th scope="row">Economic activity</th>
      <td>Composite Leading Indicator</td>
      <td>CLI</td>
      <td>12 month change</td>
    </tr>
  </tbody></table>

1. <a style="color:purple;">**Real Gross Domestic Product**:</a><br>

The inclusion of Real Gross Domestic Product (GDP) in the ML model for predicting recessions in the US economy can be justified in several ways:<br>
**Economic Indicator:** GDP is one of the most important indicators of the health of the economy and is widely used to assess economic development. Changes in the value of GDP can be indicative of changes in economic activity and warn of an impending recession.<br>
**Correlation with other factors:** GDP is correlated with many other economic indicators such as unemployment, inflation, investment and consumption. Including GDP in the model can help capture these complex interactions and improve the predictive power of the model.<br>
**Long-Term Trends:** Research shows that changes in the value of GDP can be harbingers of future economic recessions. Including this key indicator in the model can help detect long-term trends and cycles in the economy.<br>
**Financial Market:** Investors often use GDP data to make investment decisions. GDP forecasts based on the ML model can be useful for investors in assessing risk and profit opportunities.

2. <a style="color:purple;">**Industrial Production Index**:</a><br>

The Industrial Production Index measures the output of the industrial sector, which is a key driver of economic growth. Changes in industrial production can signal shifts in overall economic activity and provide early warnings of potential recessions.

3. <a style="color:purple;">**Monthly Unemployment Rate**:</a><br>

The monthly unemployment rate is a key indicator of labor market conditions and can provide insights into the overall health of the economy. Rising unemployment rates can signal economic weakness and potential recessions.

4. <a style="color:purple;">**S&P 500 Index**:</a><br>

The S&P 500 Index is a widely followed benchmark for the performance of the US stock market. Changes in the S&P 500 Index can reflect investor sentiment and expectations for future economic growth.

5. <a style="color:purple;">**US 10-year Treasury Bond**:</a><br>

The US 10-year Treasury bond yield is often seen as a safe-haven asset and can provide insights into investor expectations for future economic conditions. Changes in the yield curve can signal shifts in market sentiment and potential economic risks.

6. <a style="color:purple;">**Slope of the yield curve**:</a><br>

The slope of the yield curve, particularly the difference between short-term and long-term interest rates, can provide insights into market expectations for future economic conditions. Inverted yield curves, where short-term rates are higher than long-term rates, have historically been a reliable predictor of recessions.

7. <a style="color:purple;">**Personal Consumption Expenditures**:</a><br>

Personal consumption expenditures are a key component of GDP and can provide insights into consumer spending patterns and overall economic activity. Changes in personal consumption expenditures can signal shifts in consumer confidence and potential risks for the economy.

8. <a style="color:purple;">**Composite Leading Indicator**:</a><br>

The CLI includes various components that may include new orders, building permits, stock prices, consumer confidence and other indicators that are sensitive to economic changes. Composite leading indicators are designed to provide early warnings of potential turning points in the business cycle. These indicators combine multiple economic variables to create a comprehensive measure of economic activity and can help to identify potential risks for the economy.

In principle, more features can be included in the model, but there is no time continuous data for them since January 1962. Which would hurt the accuracy of the model.

#### 2.3 Data collection.

1. **Real Gross Domestic Product**:

From this site **[FEDERAL RESERVE BANK of ST. LOUIS](https://fred.stlouisfed.org/series/GDPC1)** we download information for the period of our study: **Real Gross Domestic Product (GDPC1)**. [2]
Units: <a style="color:purple;">**Billions of Dollars**</a>, Seasonally Adjusted Annual Rate.
Frequency: Quarterly.


In [21]:
gdp = pd.read_csv('data/GDPC1.csv') # Read the CSV file
gdp.columns = ["date", "gdp"]
gdp['gdp'] = gdp['gdp'] * 1_000_000_000 # convert to dollars
gdp.head() # Show all data.
# __________ DataFrame (DF) with real data! __________

Unnamed: 0,date,gdp
0,1962-01-01,3758147000000.0
1,1962-04-01,3792149000000.0
2,1962-07-01,3838776000000.0
3,1962-10-01,3851421000000.0
4,1963-01-01,3893482000000.0


### 5. **Conclusions**

1. **TODO**

### Resources:
1. **Recession Prediction using Machine Learning** https://towardsdatascience.com/recession-prediction-using-machine-learning-de6eee16ca94
2. **Real Gross Domestic Product** https://fred.stlouisfed.org/series/GDPC1
3. **...**
4. **...**
5. **...**
6. **...**
7. **...**
8. **...**
9. **...**
10. **...**
11. **...**
12. **...**
13. **...**
14. **...**