# Data Collection for Economic Downturn Detection

This notebook handles the collection of all economic data sources needed for the recession prediction model. It pulls data from multiple sources and combines them into a unified dataset for analysis.

## Data Sources

1. **Federal Reserve Economic Data (FRED)**: Core economic indicators like GDP, unemployment, inflation, consumer sentiment
2. **National Bureau of Economic Research (NBER)**: Official recession dates and periods
3. **University of Michigan**: Additional consumer sentiment surveys and expectations

## Data Coverage

**Data Cutoff Date**: May 2024

We collect data from January 1970 through May 2024, covering 8 recession periods and multiple economic cycles. This gives us enough historical data for model training while including recent economic conditions.

## Requirements

- FRED API key (set in .env file as FRED_API_KEY)
- Internet connection for data fetching
- Sufficient disk space for data storage

In [1]:
# Initialize notebook with all necessary imports and setup
from notebook_utils import init_notebook
init_notebook()

# Import required libraries
import os
import sys
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import logging
from dotenv import load_dotenv
from fredapi import Fred
import warnings
warnings.filterwarnings('ignore')

# Import the econ_downturn functions we need
from econ_downturn import (
    get_fred_data, get_nber_data, get_all_data, get_umich_data,
    setup_logger, load_environment
)

# Set up logging
logger = setup_logger('data_collection')

print("Data collection notebook initialized successfully!")
print(f"Current working directory: {os.getcwd()}")

Initializing notebook environment...
✓ Added c:\Users\Admin\economic-downturn-detector\Copy of Economic Downturn\economic-downturn-detector\matt-version-downturn-detector\src to Python path
✓ econ_downturn package imported successfully
✓ Notebook environment configured
✓ Environment variables loaded

Available data paths:
  fred_dir: c:\Users\Admin\economic-downturn-detector\Copy of Economic Downturn\economic-downturn-detector\matt-version-downturn-detector\data\fred
  nber_dir: c:\Users\Admin\economic-downturn-detector\Copy of Economic Downturn\economic-downturn-detector\matt-version-downturn-detector\data\nber
  processed_dir: c:\Users\Admin\economic-downturn-detector\Copy of Economic Downturn\economic-downturn-detector\matt-version-downturn-detector\data\processed
  fred_all_indicators: c:\Users\Admin\economic-downturn-detector\Copy of Economic Downturn\economic-downturn-detector\matt-version-downturn-detector\data\fred\all_indicators.csv
  nber_recession_indicator: c:\Users\Admin\e

## 1. FRED API Setup and Validation

Let's check that the FRED API key is set up correctly and test the connection.

In [2]:
# Check FRED API key
fred_api_key = os.getenv('FRED_API_KEY')

if not fred_api_key:
    print("FRED API key not found!")
    print("Please set the FRED_API_KEY environment variable in your .env file.")
    print("You can get a free API key from: https://fred.stlouisfed.org/")
    sys.exit(1)
else:
    print("FRED API key found")
    
# Test FRED API connection
try:
    fred = Fred(api_key=fred_api_key)
    # Test with a simple series
    test_data = fred.get_series('UNRATE', limit=1)
    print("FRED API connection successful")
    print(f"Latest unemployment rate: {test_data.iloc[-1]:.1f}% ({test_data.index[-1].strftime('%Y-%m')})")
except Exception as e:
    print(f"FRED API connection failed: {e}")
    sys.exit(1)

FRED API key found
FRED API connection successful
Latest unemployment rate: 3.4% (1948-01)


## 2. Define Data Collection Parameters

Set the date range and output directories for data collection.

In [3]:
# Define data collection parameters
START_DATE = '1970-01-01'
END_DATE = '2024-05-31'  # Data cutoff date

# Create output directories
DATA_DIR = '../data'
FRED_DIR = os.path.join(DATA_DIR, 'fred')
NBER_DIR = os.path.join(DATA_DIR, 'nber')
UMICH_DIR = os.path.join(DATA_DIR, 'umich')
PROCESSED_DIR = os.path.join(DATA_DIR, 'processed')

# Create directories if they don't exist
for directory in [FRED_DIR, NBER_DIR, UMICH_DIR, PROCESSED_DIR]:
    os.makedirs(directory, exist_ok=True)
    
print(f"Data collection period: {START_DATE} to {END_DATE}")
print(f"Output directories created in: {DATA_DIR}")

Data collection period: 1970-01-01 to 2024-05-31
Output directories created in: ../data


## 3. Fetch FRED Economic Indicators

Get the main economic indicators from the Federal Reserve Economic Data (FRED) database.

In [4]:
print("Fetching FRED economic indicators...")
print("This may take a few minutes depending on your internet connection.")

# Fetch FRED data using the existing function
fred_data = get_fred_data(
    api_key=fred_api_key,
    start_date=START_DATE,
    end_date=END_DATE,
    output_dir=FRED_DIR
)

if fred_data is not None:
    print(f"FRED data collected successfully!")
    print(f"   Shape: {fred_data.shape}")
    print(f"   Date range: {fred_data.index.min()} to {fred_data.index.max()}")
    print(f"   Indicators: {list(fred_data.columns)}")
else:
    print("Failed to fetch FRED data")
    sys.exit(1)

Fetching FRED economic indicators...
This may take a few minutes depending on your internet connection.
2025-06-11 18:22:55,258 - econ_downturn.data.fred - INFO - Fetching GDP (Series ID: GDPC1)
2025-06-11 18:22:55,432 - econ_downturn.data.fred - INFO - Successfully fetched GDP with 218 observations
2025-06-11 18:22:55,433 - econ_downturn.data.fred - INFO - Fetching UNEMPLOYMENT (Series ID: UNRATE)
2025-06-11 18:22:55,726 - econ_downturn.data.fred - INFO - Successfully fetched UNEMPLOYMENT with 653 observations
2025-06-11 18:22:55,727 - econ_downturn.data.fred - INFO - Fetching CPI (Series ID: CPIAUCSL)
2025-06-11 18:22:55,945 - econ_downturn.data.fred - INFO - Successfully fetched CPI with 653 observations
2025-06-11 18:22:55,946 - econ_downturn.data.fred - INFO - Fetching FED_FUNDS (Series ID: FEDFUNDS)
2025-06-11 18:22:56,177 - econ_downturn.data.fred - INFO - Successfully fetched FED_FUNDS with 653 observations
2025-06-11 18:22:56,178 - econ_downturn.data.fred - INFO - Fetching YIE

## 4. Fetch NBER Recession Data

Get official recession dates from the National Bureau of Economic Research.

In [5]:
print("Fetching NBER recession data...")

# Fetch NBER data using the existing function
nber_data = get_nber_data(
    start_date=START_DATE,
    end_date=END_DATE,
    output_dir=NBER_DIR
)

if nber_data is not None:
    print(f"NBER recession data collected successfully!")
    print(f"   Shape: {nber_data.shape}")
    print(f"   Date range: {nber_data.index.min()} to {nber_data.index.max()}")
    print(f"   Recession periods: {nber_data['recession'].sum()} months")
    print(f"   Non-recession periods: {(nber_data['recession'] == 0).sum()} months")
else:
    print("Failed to fetch NBER data")
    sys.exit(1)

Fetching NBER recession data...
2025-06-11 18:22:59,437 - econ_downturn.data.nber - INFO - Creating NBER recession data
2025-06-11 18:22:59,445 - econ_downturn.data.nber - INFO - Successfully created data for 8 recessions
2025-06-11 18:22:59,464 - econ_downturn.data.nber - INFO - Saved recession dates to ../data\nber\recession_dates.csv
2025-06-11 18:22:59,471 - econ_downturn.data.nber - INFO - Saved recession indicator to ../data\nber\recession_indicator.csv
2025-06-11 18:22:59,472 - econ_downturn.data.nber - INFO - NBER data processing completed successfully
NBER recession data collected successfully!
   Shape: (653, 1)
   Date range: 1970-01-31 00:00:00 to 2024-05-31 00:00:00
   Recession periods: 84 months
   Non-recession periods: 569 months


## 5. Fetch University of Michigan Consumer Sentiment Data

Get additional consumer sentiment data from the University of Michigan via FRED.

In [6]:
print("Fetching University of Michigan Consumer Sentiment data...")

# Use the existing get_umich_data function
from econ_downturn import get_umich_data

umich_data = get_umich_data(
    api_key=fred_api_key,
    start_date=START_DATE,
    end_date=END_DATE,
    output_dir=UMICH_DIR
)

if umich_data is not None:
    print(f"UMich sentiment data collected successfully!")
    print(f"   Shape: {umich_data.shape}")
    print(f"   Date range: {umich_data.index.min()} to {umich_data.index.max()}")
    print(f"   Indicators: {list(umich_data.columns)}")
else:
    print("Failed to fetch UMich data")
    sys.exit(1)

Fetching University of Michigan Consumer Sentiment data...
2025-06-11 18:22:59,488 - econ_downturn.data.umich - INFO - Fetching SENTIMENT (Series ID: UMCSENT)
2025-06-11 18:22:59,863 - econ_downturn.data.umich - INFO - Successfully fetched SENTIMENT with 653 observations
2025-06-11 18:22:59,864 - econ_downturn.data.umich - INFO - Fetching INFLATION_EXPECTATION (Series ID: MICH)
2025-06-11 18:23:00,251 - econ_downturn.data.umich - INFO - Successfully fetched INFLATION_EXPECTATION with 557 observations
2025-06-11 18:23:00,256 - econ_downturn.data.umich - INFO - Saved SENTIMENT to ../data\umich\sentiment.csv
2025-06-11 18:23:00,261 - econ_downturn.data.umich - INFO - Saved INFLATION_EXPECTATION to ../data\umich\inflation_expectation.csv
2025-06-11 18:23:00,271 - econ_downturn.data.umich - INFO - Saved merged UMich data to ../data\umich\all_sentiment.csv
2025-06-11 18:23:00,272 - econ_downturn.data.umich - INFO - UMich sentiment data processing completed successfully
UMich sentiment data c

## 6. Integrate All Data Sources

Combine all the data sources we've collected into one dataset for analysis.

In [7]:
print("Integrating all data sources...")

# Use the existing get_all_data function to integrate FRED, NBER, and UMich data
try:
    integrated_data = get_all_data()
    print(f"Integrated data loaded successfully: {integrated_data.shape}")
except Exception as e:
    print(f"get_all_data() failed: {e}")
    print("Manually combining data sources...")
    
    # Manually combine the data if get_all_data fails
    data_sources = []
    
    if fred_data is not None:
        data_sources.append(fred_data)
        print(f"   Added FRED data: {fred_data.shape}")
    if nber_data is not None:
        data_sources.append(nber_data)
        print(f"   Added NBER data: {nber_data.shape}")
    if umich_data is not None:
        data_sources.append(umich_data)
        print(f"   Added UMich data: {umich_data.shape}")
    
    if data_sources:
        integrated_data = pd.concat(data_sources, axis=1)
        print(f"   Manually integrated data: {integrated_data.shape}")
    else:
        print("No data sources available for integration")
        sys.exit(1)

# Handle missing values with forward fill then backward fill
integrated_data = integrated_data.fillna(method='ffill').fillna(method='bfill')

# Save the integrated dataset
integrated_path = os.path.join(PROCESSED_DIR, 'integrated_data.csv')
integrated_data.to_csv(integrated_path)

print(f"\nFinal integrated dataset:")
print(f"   Shape: {integrated_data.shape}")
print(f"   Date range: {integrated_data.index.min()} to {integrated_data.index.max()}")
print(f"   Saved to: {integrated_path}")
print(f"   Columns: {list(integrated_data.columns)}")

Integrating all data sources...
2025-06-11 18:23:00,335 - econ_downturn.data.data_loader - INFO - Loaded FRED data with shape: (15510, 10)
2025-06-11 18:23:00,352 - econ_downturn.data.data_loader - INFO - Loaded NBER recession data with shape: (653, 1)
2025-06-11 18:23:00,366 - econ_downturn.data.data_loader - INFO - Loaded UMich data with shape: (653, 2)
2025-06-11 18:23:00,367 - econ_downturn.data.data_loader - INFO - Initialized merged dataset with 'FRED' data
2025-06-11 18:23:00,372 - econ_downturn.data.data_loader - INFO - Added 'NBER' data to merged dataset
2025-06-11 18:23:00,377 - econ_downturn.data.data_loader - INFO - Added 'UMICH' data to merged dataset
2025-06-11 18:23:00,378 - econ_downturn.data.data_loader - INFO - Merged dataset shape: (15657, 13)
2025-06-11 18:23:00,501 - econ_downturn.data.data_loader - INFO - Saved merged dataset to c:\Users\Admin\economic-downturn-detector\Copy of Economic Downturn\economic-downturn-detector\matt-version-downturn-detector\data\proces

## 7. Data Collection Summary

Summary of all the data we've collected.

In [8]:
print("\n" + "="*60)
print("DATA COLLECTION SUMMARY")
print("="*60)

print(f"\nCollection Period: {START_DATE} to {END_DATE}")
print(f"Total Data Points: {len(integrated_data)} time periods")
print(f"Total Indicators: {len(integrated_data.columns)} variables")

print("\nData Sources Collected:")

# FRED data summary
if fred_data is not None:
    print(f"   FRED Economic Indicators: {fred_data.shape[1]} indicators")
    print(f"      - GDP, Unemployment, CPI, Fed Funds Rate, etc.")
else:
    print(f"   FRED Economic Indicators: Failed")

# NBER data summary
if nber_data is not None:
    recession_months = nber_data['recession'].sum()
    total_months = len(nber_data)
    recession_pct = (recession_months / total_months) * 100
    print(f"   NBER Recession Data: {recession_months}/{total_months} recession months ({recession_pct:.1f}%)")
else:
    print(f"   NBER Recession Data: Failed")

# UMich data summary
if umich_data is not None:
    print(f"   UMich Consumer Sentiment: {umich_data.shape[1]} indicators")
    print(f"      - Consumer Sentiment, Current Conditions, Expectations, Inflation Expectations")
else:
    print(f"   UMich Consumer Sentiment: Failed")

print(f"\nData Storage:")
print(f"   Raw data saved in: {DATA_DIR}/[source]/")
print(f"   Integrated data saved in: {integrated_path}")

print(f"\nData collection completed successfully!")
print(f"   The integrated dataset is ready for feature engineering and analysis.")
print(f"   Next step: Run notebook 01_data_exploration.ipynb")


DATA COLLECTION SUMMARY

Collection Period: 1970-01-01 to 2024-05-31
Total Data Points: 15657 time periods
Total Indicators: 13 variables

Data Sources Collected:
   FRED Economic Indicators: 10 indicators
      - GDP, Unemployment, CPI, Fed Funds Rate, etc.
   NBER Recession Data: 84/653 recession months (12.9%)
   UMich Consumer Sentiment: 2 indicators
      - Consumer Sentiment, Current Conditions, Expectations, Inflation Expectations

Data Storage:
   Raw data saved in: ../data/[source]/
   Integrated data saved in: ../data\processed\integrated_data.csv

Data collection completed successfully!
   The integrated dataset is ready for feature engineering and analysis.
   Next step: Run notebook 01_data_exploration.ipynb
