# Economic Downturn Detector: Feature Engineering Optimization

This notebook focuses on refining the feature engineering process and optimizing the model by:

1. Exploring different lag periods for consumer sentiment
2. Creating interaction terms between consumer sentiment and economic indicators
3. Experimenting with different transformations of consumer sentiment data
4. Performing feature selection to identify the most predictive features
5. Comparing MDA results with and without consumer sentiment

In [None]:
# Import notebook utilities
from notebook_utils import (
    # Setup functions
    setup_notebook, load_data, display_data_info, save_figure,
    
    # Import from econ_downturn package
    engineer_features, normalize_data, apply_mda, create_discriminant_time_series,
    plot_indicator_with_recessions, plot_correlation_matrix,
    plot_feature_importance, plot_discriminant_time_series,
    
    # Advanced feature engineering functions
    engineer_features_with_custom_lags, create_interaction_terms,
    apply_sentiment_transformations, select_features
)

# Import other libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
from IPython.display import display

# Set up the notebook environment
setup_notebook()

## 1. Load and Prepare Data

First, let's load the data from all sources and merge them.

In [None]:
# Load all data using the utility function
merged_data = load_data(use_cached=False)  # Force reload from original sources

# Display information about the dataset
display_data_info(merged_data)

## 2. Explore Different Lag Periods for Consumer Sentiment

Let's create features with different lag periods for consumer sentiment.

In [None]:
# Engineer features with different lag periods for consumer sentiment
data_with_custom_lags = engineer_features_with_custom_lags(
    merged_data,
    sentiment_lags=[1, 3, 6, 12, 18, 24],
    other_lags=[1, 3, 6, 12]
)

# Save the dataset with custom lags
from econ_downturn import get_output_paths
output_paths = get_output_paths()
output_dir = output_paths['data_dir']
os.makedirs(output_dir, exist_ok=True)
custom_lags_path = os.path.join(output_dir, 'data_with_custom_sentiment_lags.csv')
data_with_custom_lags.to_csv(custom_lags_path)
print(f"Saved dataset with custom lags to {custom_lags_path}")

## 3. Create Interaction Terms Between Consumer Sentiment and Economic Indicators

Let's create interaction terms between consumer sentiment and key economic indicators.

In [None]:
# Create interaction terms
data_with_interactions = create_interaction_terms(data_with_custom_lags)

# Save the dataset with interaction terms
interactions_path = os.path.join(output_dir, 'data_with_interactions.csv')
data_with_interactions.to_csv(interactions_path)
print(f"Saved dataset with interaction terms to {interactions_path}")

## 4. Experiment with Different Transformations of Consumer Sentiment Data

Let's apply different transformations to the consumer sentiment data.

In [None]:
# Apply transformations to sentiment data
data_with_transformations = apply_sentiment_transformations(data_with_interactions)

# Save the dataset with transformations
transformations_path = os.path.join(output_dir, 'data_with_transformations.csv')
data_with_transformations.to_csv(transformations_path)
print(f"Saved dataset with transformations to {transformations_path}")

## 5. Perform Feature Selection to Identify the Most Predictive Features

Let's use different feature selection techniques to identify the most predictive features.

In [None]:
# Normalize the data
data_normalized, _ = normalize_data(data_with_transformations)

# Separate features and target
X = data_normalized.drop(columns=['recession'])
y = data_normalized['recession']

# Select features using ANOVA F-value
X_selected_anova, selected_features_anova = select_features(X, y, method='anova', k=20)

# Select features using Recursive Feature Elimination
X_selected_rfe, selected_features_rfe = select_features(X, y, method='rfe', k=20)

## 6. Compare MDA Results With and Without Consumer Sentiment

Let's compare the performance of MDA models with and without consumer sentiment features.

In [None]:
def compare_mda_models(data, with_sentiment=True):
    """
    Compare MDA models with and without consumer sentiment features.
    
    Parameters
    ----------
    data : pandas.DataFrame
        Input dataset
    with_sentiment : bool
        Whether to include sentiment features
        
    Returns
    -------
    dict
        MDA results
    """
    # Make a copy of the data
    df = data.copy()
    
    # Identify sentiment-related columns
    sentiment_cols = [col for col in df.columns if 'SENTIMENT' in col]
    
    # Remove sentiment features if not including them
    if not with_sentiment:
        df = df.drop(columns=sentiment_cols)
        print(f"Removed {len(sentiment_cols)} sentiment-related features")
    
    # Normalize the data
    df_normalized, _ = normalize_data(df)
    
    # Separate features and target
    X = df_normalized.drop(columns=['recession'])
    y = df_normalized['recession']
    
    # Apply MDA
    mda_results = apply_mda(X, y)
    
    # Print model performance metrics
    print(f"Accuracy: {mda_results['accuracy']:.4f}")
    print("\nConfusion Matrix:")
    print(mda_results['conf_matrix'])
    print("\nClassification Report:")
    print(mda_results['class_report'])
    print(f"\nCross-Validation Scores: {mda_results['cv_scores']}")
    print(f"Mean CV Score: {mda_results['cv_scores'].mean():.4f}")
    
    # Plot feature importances
    if mda_results['feature_importance'] is not None:
        fig = plot_feature_importance(mda_results['feature_importance'])
        plt.title(f"Feature Importance {'With' if with_sentiment else 'Without'} Sentiment", fontsize=16)
        plt.show()
        save_figure(fig, f"feature_importance_{'with' if with_sentiment else 'without'}_sentiment.png")
    
    # Create and plot discriminant time series
    discriminant_df = create_discriminant_time_series(mda_results['model'], X, y)
    fig = plot_discriminant_time_series(discriminant_df)
    plt.title(f"Discriminant Function {'With' if with_sentiment else 'Without'} Sentiment", fontsize=16)
    plt.show()
    save_figure(fig, f"discriminant_time_series_{'with' if with_sentiment else 'without'}_sentiment.png")
    
    return mda_results

# Compare MDA models with and without sentiment features
print("\nMDA Model WITH Sentiment Features:")
mda_with_sentiment = compare_mda_models(data_with_transformations, with_sentiment=True)

print("\nMDA Model WITHOUT Sentiment Features:")
mda_without_sentiment = compare_mda_models(data_with_transformations, with_sentiment=False)

## 7. Apply MDA to Selected Features

Let's apply MDA to the selected features to see if feature selection improves the model's performance.

In [None]:
# Add recession indicator to selected features
X_selected_anova_with_target = X_selected_anova.copy()
X_selected_anova_with_target['recession'] = y

X_selected_rfe_with_target = X_selected_rfe.copy()
X_selected_rfe_with_target['recession'] = y

# Apply MDA to ANOVA-selected features
print("\nMDA Model with ANOVA-Selected Features:")
mda_anova = compare_mda_models(X_selected_anova_with_target, with_sentiment=True)

# Apply MDA to RFE-selected features
print("\nMDA Model with RFE-Selected Features:")
mda_rfe = compare_mda_models(X_selected_rfe_with_target, with_sentiment=True)

## 8. Conclusion

Based on the feature engineering optimization, we have identified the most effective approaches for predicting recessions using consumer sentiment data.

Key findings:
1. Including consumer sentiment features significantly improves the model's performance
2. Longer lag periods for consumer sentiment (18-24 months) provide valuable leading indicators
3. Interaction terms between consumer sentiment and economic indicators capture important relationships
4. Feature selection helps identify the most predictive features and reduces dimensionality

Next steps:
1. Implement the optimized feature engineering process in the real-time monitoring system
2. Explore additional sentiment indices and their relationships with economic indicators
3. Develop a recession probability index based on the optimized model