# PIB Filtering and Trend-Cycle Decomposition
---

This notebook scans the ../data/raw/ directory for the **latest available GDP series** from IBGE (series can be downloaded using IBGE.ipynb notebook).

It automatically detects CSV files with filenames following the standard naming pattern, and for each unique series (identified by table and variable), it keeps only the most recent file.

These files are matched with IBGE metadata to present the user with an intuitive interface to select:

- Real or Nominal GDP  
- Seasonally Adjusted or Non-Adjusted  
- Quarterly or Annual Frequency  

Once a series is selected, the notebook loads and processes the data. The following transformations and filters are applied:

- **Natural Log** — computed using **NumPy**
- **First Difference** — computed using **pandas**
- **Percentage Change** — computed using **pandas**
- **Hodrick-Prescott Filter** — using **statsmodels**
- **Baxter-King Filter** — using **statsmodels**
- **Christiano-Fitzgerald Filter** — using **statsmodels**

All of these operations are implemented using well-established, trusted Python libraries for time series and econometric analysis:

Finally, the notebook provides an interactive plotting interface so you can visually explore trends, cycles, and transformations of the GDP series with ease.

This environment is ideal for filtering, comparing smoothing methods, and preparing data for macroeconomic analysis and visualization.


## Notebook Setup and Dependencies Loading
---

Run the cell below in order to load dependencies, metadata, and start the logging session.

In [1]:
# Importing external libraries and functions
import os
import re
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
import ipywidgets as widgets
import matplotlib.pyplot as plt

from datetime import datetime
from IPython.display import display, Markdown, clear_output

# Add the 'src' folder to the Python path so project-specific modules can be imported
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..", "src")))

# Import project-specific functions
from logger import start_logger
from ibge import load_ibge_series_metadata
from utils import compute_file_hash
from ui import file_explorer, raw_cleanup_widget

# Enable automatic reloading of modules when their source code changes
%reload_ext autoreload
%autoreload 2

# Define Session ID
session_type = "Filtering"
session_ID = datetime.now().strftime("%Y%m%d_%H%M%S")

# Setup Logging
log_file_name = f"../logs/{session_type}_{session_ID}.log"
logger_name = "root"
logger = start_logger(logger_name, log_file_name)

raw_cleanup_widget()

2025-03-31 21:38:15,807 - INFO - Logger started. File path: ../logs/Filtering_20250331_213815.log


VBox(children=(Button(button_style='danger', description='🧹 Delete old raw CSV and JSON files', style=ButtonSt…

## Select GDP data from available series
---

In [2]:
# Load metadata
df_ibge_series_metadata = load_ibge_series_metadata()
GDP_file_explorer_refs = file_explorer(df_ibge_series_metadata )

2025-03-31 21:38:18,073 - INFO - Loaded IBGE Metadata from file: ../data/metadata/ibge_series.json


VBox(children=(HTML(value='<h3>🔎 Analyze All Files by Source and Series</h3>'), Dropdown(description='Source:'…

## Filter data
---

In [4]:
# Function to get data from Widget selection
def get_data(selected_filename): 
    # Load the DataFrame
    df = pd.read_csv(selected_filename)

    # Convert columns if present
    if "data" in df.columns:
        df["data"] = pd.to_datetime(df["data"], errors="coerce")

    if "valor" in df.columns:
        df["valor"] = pd.to_numeric(df["valor"], errors="coerce")
    return df

# Get data acoridng to Widget Selection
df = get_data(GDP_file_explorer_refs["get_selected_file"]())
df.rename(columns={"valor": "gdp"}, inplace=True)
# Create log of gdp
df["log_gdp"] = df["gdp"].apply(lambda x: np.log(x) if x > 0 else np.nan)

# Create first difference of log_gdp
df["fdiff_cycle"] = df["log_gdp"].diff()
df["fdiff_trend"] = df["log_gdp"] - df["fdiff_cycle"]

# Create % change of log_gdp
df["pct_change_cycle"] = df["gdp"].pct_change()
df["pct_change_trend"] = df["log_gdp"] - df["pct_change_cycle"]

# HP Filter
df["hp_cycle"], df["hp_trend"]  = sm.tsa.filters.hpfilter(df["log_gdp"], 1600)

# BK Filter
df["bk_cycle"] = sm.tsa.filters.bkfilter(df["log_gdp"], 6, 32, 12)
df["bk_trend"] = df["log_gdp"] - df["bk_cycle"] 

# CF Filter
df["ck_cycle"], df["ck_trend"]  = sm.tsa.filters.cffilter(df["log_gdp"], 6,32,False)

# OLS Regression
#--------------------------
# Independent variable (x): time
x = df['data'].apply(lambda d: d.toordinal())
x = sm.add_constant(x)  # Adds intercept term

# Dependent variable (y): value
y = df['log_gdp']

# Fit model
model = sm.OLS(y, x).fit()

# Get all coefficients
coefficients = model.params

# Add predicted values (trend) to the DataFrame
df['OLS_trend'] = model.predict(x)

# Calculate the cycle (residual)
df['OLS_cycle'] = df['log_gdp'] - df['OLS_trend']

# Print summary
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                log_gdp   R-squared:                       0.899
Model:                            OLS   Adj. R-squared:                  0.898
Method:                 Least Squares   F-statistic:                     1014.
Date:                Mon, 31 Mar 2025   Prob (F-statistic):           1.45e-58
Time:                        21:40:11   Log-Likelihood:                 155.99
No. Observations:                 116   AIC:                            -308.0
Df Residuals:                     114   BIC:                            -302.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -40.1654      1.418    -28.332      0.0

### Plot filtered GDP data
---

In [5]:
# Create the widget to select columns (except 'data')
column_selector = widgets.SelectMultiple(
    options=[col for col in df.columns if col != "data"],
    description="Y Columns:",
    layout=widgets.Layout(width="400px", height="200px")
)

# Output area for the plot
plot_output = widgets.Output()

# Function to update the plot
def update_plot(change):
    with plot_output:
        clear_output()
        selected = list(column_selector.value)

        if not selected:
            print("Select at least one column to plot.")
            return

        # Plot
        sns.set_theme()
        sns.set_context("notebook")
        plt.figure(figsize=(12, 6))

        for col in selected:
            plt.plot(df["data"], df[col], label=col)

        plt.xlabel("Date")
        plt.ylabel("Value")
        plt.title("Selected Columns Over Time")
        plt.legend()
        sns.despine()
        plt.tight_layout()
        plt.show()

# Connect widget to function
column_selector.observe(update_plot, names="value")

# Display UI
display(widgets.HTML("<b>Select columns to plot (X axis is always 'data'):</b> Use CTRL or CMD to select multiple rows"))
display(
    widgets.HBox([
        column_selector,
        plot_output
    ])
)

# Initial plot
update_plot({"new": column_selector.value})


HTML(value="<b>Select columns to plot (X axis is always 'data'):</b> Use CTRL or CMD to select multiple rows")

HBox(children=(SelectMultiple(description='Y Columns:', layout=Layout(height='200px', width='400px'), options=…

## Select Inflation Data
---

In [8]:
# Load metadata
IPCA_file_explorer_refs = file_explorer(df_ibge_series_metadata )

VBox(children=(HTML(value='<h3>🔎 Analyze All Files by Source and Series</h3>'), Dropdown(description='Source:'…

In [74]:
# Load selected IPCA file from the file explorer widget
dfa = get_data(IPCA_file_explorer_refs["get_selected_file"]())

# Convert monthly percent change to decimal (for compounding)
dfa["decimal"] = 1 + dfa["valor"] / 100

# Set date as index and resample to quarterly using compounded product
dfa.set_index("data", inplace=True)
dfa = dfa.resample("QE").prod()  # 'QE' = quarter end

# Shift quarterly dates from end-of-quarter to start-of-quarter
dfa = dfa.reset_index()[["data", "decimal"]]
dfa["data"] = dfa["data"] + pd.Timedelta(days=1)

# Convert decimal back to percent change and drop intermediate column
dfa["pi"] = (dfa["decimal"] - 1)*100
dfa = dfa[["data", "pi"]]

In [75]:
# Parameters
a1l = 0.24
a1i = 0.38
a4 = 0.12

# Compute required lags
dfa["pi_lead1"] = dfa["pi"].shift(-1)
dfa["pi_t"]   = dfa["pi"]
dfa["pi_lag1"] = dfa["pi"].shift(1)
dfa["pi_lag2"] = dfa["pi"].shift(2)
dfa["pi_lag3"] = dfa["pi"].shift(3)
dfa["pi_lag4"] = dfa["pi"].shift(4)

# Apply formula
dfa["GDP_gap_calc"] = (1/a4)*(
    dfa["pi_t"] 
    - a1l * dfa["pi_lag1"] 
    - (a1i / 4) * (dfa["pi_lag1"] + dfa["pi_lag2"] + dfa["pi_lag3"] + dfa["pi_lag4"])
    - (1-a1l-a1i)*dfa["pi_lead1"]
    )

In [79]:
dfa

Unnamed: 0,data,pi,pi_lead1,pi_t,pi_lag1,pi_lag2,pi_lag3,pi_lag4,GDP_gap_calc
0,1998-04-01,1.315487,0.691076,1.315487,,,,,
1,1998-07-01,0.691076,-0.579176,0.691076,1.315487,,,,
2,1998-10-01,-0.579176,0.279785,-0.579176,0.691076,1.315487,,,
3,1999-01-01,0.279785,2.682942,0.279785,-0.579176,0.691076,1.315487,,
4,1999-04-01,2.682942,1.033315,2.682942,0.279785,-0.579176,0.691076,1.315487,17.174601
...,...,...,...,...,...,...,...,...,...
104,2024-04-01,1.243767,1.103918,1.243767,0.952786,0.802091,0.841452,1.942020,1.370553
105,2024-07-01,1.103918,1.083302,1.103918,1.243767,0.952786,0.802091,0.841452,0.241248
106,2024-10-01,1.083302,1.315618,1.083302,1.103918,1.243767,0.952786,0.802091,-0.594302
107,2025-01-01,1.315618,1.380792,1.315618,1.083302,1.103918,1.243767,0.952786,0.953884


In [80]:
dfb = dfa.merge(df)

In [81]:
def plot_columns_selector(df):
    # Ensure 'data' column is datetime
    df = df.copy()
    df["data"] = pd.to_datetime(df["data"])
    df = df.sort_values("data")

    # Extract Y columns (exclude 'data')
    y_columns = df.columns.drop("data")

    # Create SelectMultiple widget
    column_selector = widgets.SelectMultiple(
        options=y_columns,
        description="Y columns:",
        layout=widgets.Layout(width='250px', height='300px')
    )

    # Output area for the plot
    plot_output = widgets.Output()

    # Define the update function
    def update_plot(change):
        with plot_output:
            clear_output(wait=True)
            selected_cols = list(column_selector.value)
            if selected_cols:
                plt.figure(figsize=(12, 6))
                sns.set_theme()
                for col in selected_cols:
                    plt.plot(df["data"], df[col], label=col)
                plt.xlabel("Date")
                plt.ylabel("Value")
                plt.title("Selected Series")
                plt.legend()
                plt.tight_layout()
                plt.show()
            else:
                print("Select at least one column to display.")

    # Attach the observer
    column_selector.observe(update_plot, names="value")

    # Trigger an initial plot
    update_plot({"new": column_selector.value})

    # Layout side-by-side
    ui = widgets.HBox([
        column_selector,
        plot_output
    ])
    display(ui)

In [82]:
plot_columns_selector(dfb)

HBox(children=(SelectMultiple(description='Y columns:', layout=Layout(height='300px', width='250px'), options=…

## Compare Forecast Errors
---

In [None]:
# Get data acoridng to Widget Selection
df = get_data()

# Set Window Size (i.e. 4*10 = 40 quarters = 10 years of quarterly data)
ws = 4*10

# Set Forecast Size (i.e. 4 = 4 quarters of forecast)
fs = 4

# Calculate number of windows in set
nw = len(df)-ws-fs

In [71]:
# Window Counter, from 0 to nw
i = 0

# Get Window Data
dfa = df[i:i+ws]

# Get Data to be forecasted
dfx = df[i+ws:i+ws+fs]

In [None]:
# Independent variable (x): time
x = dfa['data'].apply(lambda d: d.toordinal())
x = sm.add_constant(x)  # Adds intercept term

# Dependent variable (y): value
y = dfa['log_gdp']

# Fit model
model = sm.OLS(y, x).fit()

# Get all coefficients
coefficients = model.params

# Add predicted values (trend) to the DataFrame
df['OLS_trend'] = model.predict(x)

# Calculate the cycle (residual)
df['OLS_cycle'] = df['log_gdp'] - df['OLS_trend']