# PIB Filtering and Trend-Cycle Decomposition
---

This notebook scans the ../data/raw/ directory for the **latest available GDP series** from IBGE (series can be downloaded using IBGE.ipynb notebook).

It automatically detects CSV files with filenames following the standard naming pattern, and for each unique series (identified by table and variable), it keeps only the most recent file.

These files are matched with IBGE metadata to present the user with an intuitive interface to select:

- Real or Nominal GDP  
- Seasonally Adjusted or Non-Adjusted  
- Quarterly or Annual Frequency  

Once a series is selected, the notebook loads and processes the data. The following transformations and filters are applied:

- **Natural Log** — computed using **NumPy**
- **First Difference** — computed using **pandas**
- **Percentage Change** — computed using **pandas**
- **Hodrick-Prescott Filter** — using **statsmodels**
- **Baxter-King Filter** — using **statsmodels**
- **Christiano-Fitzgerald Filter** — using **statsmodels**

All of these operations are implemented using well-established, trusted Python libraries for time series and econometric analysis:

Finally, the notebook provides an interactive plotting interface so you can visually explore trends, cycles, and transformations of the GDP series with ease.

This environment is ideal for filtering, comparing smoothing methods, and preparing data for macroeconomic analysis and visualization.


## Notebook Setup and Dependencies Loading
---

Run the cell below in order to load dependencies, metadata, and start the logging session.

In [24]:
# Importing external libraries and functions
import os
import re
import sys
import numpy as np
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
import ipywidgets as widgets
import matplotlib.pyplot as plt

from datetime import datetime
from IPython.display import display, Markdown, clear_output


# Add the 'src' folder to the Python path so project-specific modules can be imported
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "..", "src")))

# Import project-specific functions
from logger import start_logger
from ibge import load_ibge_series_metadata

# Enable automatic reloading of modules when their source code changes
%reload_ext autoreload
%autoreload 2

# Define Session ID
session_type = "Filtering"
session_ID = datetime.now().strftime("%Y%m%d_%H%M%S")

# Setup Logging
log_file_name = f"../logs/{session_type}_{session_ID}.log"
logger_name = "root"
logger = start_logger(logger_name, log_file_name)

2025-03-29 21:46:48,767 - INFO - Logger started. File path: ../logs/Filtering_20250329_214648.log


## Select GDP data from available series
---

In [None]:
# Load metadata
df_ibge_series_metadata = load_ibge_series_metadata()

# Ensure index is name
df_ibge_series_metadata.index.name = "name"
df_ibge_series_metadata = df_ibge_series_metadata.reset_index()

# ───────────────────────────────────────────────
# Step 1: Scan directory and find latest CSV for each (table, variable)

raw_data_dir = "../data/raw/"
filename_pattern = re.compile(
    r"IBGE_\d{8}_\d{6}_T(?P<table>\d+)-V(?P<variable>\d+)_(?P<timestamp>\d{6})\.csv"
)

latest_files = {}

for fname in os.listdir(raw_data_dir):
    match = filename_pattern.match(fname)
    if match:
        table = int(match.group("table"))
        variable = int(match.group("variable"))
        timestamp = match.group("timestamp")
        key = (table, variable)
        if key not in latest_files or timestamp > latest_files[key]["timestamp"]:
            latest_files[key] = {
                "filename": os.path.join(raw_data_dir, fname),
                "timestamp": timestamp
            }

# ───────────────────────────────────────────────
# Step 2: Join metadata with file info

# Convert to DataFrame for easy display
df_files = pd.DataFrame([
    {"table": k[0], "variable": k[1], "filename": v["filename"], "timestamp": v["timestamp"]}
    for k, v in latest_files.items()
])

# Merge with metadata
df_merged = pd.merge(
    df_ibge_series_metadata,
    df_files,
    on=["table", "variable"],
    how="inner"
).set_index("name")

# ───────────────────────────────────────────────
# Step 3: Create widget to display and select a file

# Dropdown with series names
series_dropdown = widgets.Dropdown(
    options=df_merged.index.tolist(),
    description="Series:",
    layout=widgets.Layout(width='500px')
)

# Output area for metadata preview
file_preview_out = widgets.Output()

def update_preview(change):
    with file_preview_out:
        clear_output()
        selected = change["new"]
        row = df_merged.loc[selected]

        # Display metadata
        display(Markdown(f"**Selected file:** `{row['filename']}`"))
        display(Markdown(f"**Table:** {row['table']} &nbsp;&nbsp; **Variable:** {row['variable']}"))
        display(Markdown(f"**Frequency:** {row['frequency']}  &nbsp;&nbsp; **Category:** {row['category']}"))

        # Try to load and plot data
        try:
            df = pd.read_csv(row['filename'])

            # Basic checks or parsing
            if "data" in df.columns:
                df["data"] = pd.to_datetime(df["data"], errors="coerce")
            if "valor" in df.columns:
                df["valor"] = pd.to_numeric(df["valor"], errors="coerce")

            # Plot
            sns.set_theme()
            sns.set_context("notebook")
            plt.figure(figsize=(12, 6))
            sns.lineplot(data=df, x="data", y="valor")
            plt.xlabel("Date")
            plt.ylabel("Value")
            plt.title(f"IBGE Series: {selected}")
            sns.despine()
            plt.tight_layout()
            plt.show()
        except Exception as e:
            display(Markdown(f"**Error loading or plotting data:** {e}"))

# Initial display
update_preview({"new": series_dropdown.value})
series_dropdown.observe(update_preview, names="value")

# Display
display(Markdown("### 📁 Select one of the available IBGE series from disk"))
display(series_dropdown, file_preview_out)

# ➕ You can later get the selected file like this:
# selected_filename = df_merged.loc[series_dropdown.value, "filename"]


2025-03-29 21:32:05,786 - INFO - Loaded IBGE Metadata from file: ../data/metadata/ibge_series.json


### 📁 Select one of the available IBGE series from disk

Dropdown(description='Series:', layout=Layout(width='500px'), options=('GDP Quarterly - Real - 1620-583-90707'…

Output()

## Filter data
---

In [31]:
# Get selected filename from dropdown
selected_name = series_dropdown.value
selected_filename = df_merged.loc[selected_name, "filename"]

# Load the DataFrame
df = pd.read_csv(selected_filename)

# Convert columns if present
if "data" in df.columns:
    df["data"] = pd.to_datetime(df["data"], errors="coerce")

if "valor" in df.columns:
    df["valor"] = pd.to_numeric(df["valor"], errors="coerce")

# Create log of valor
df["log_valor"] = df["valor"].apply(lambda x: np.log(x) if x > 0 else np.nan)

# Create first difference of valor
df["diff_cycle"] = df["valor"].diff()

# Create first difference of valor
df["pct_change_valor"] = df["valor"].pct_change()

# HP Filter
df["hp_cycle"], df["hp_trend"]  = sm.tsa.filters.hpfilter(df["valor"], 1600)

# BK Filter
df["bk_cycle"] = sm.tsa.filters.bkfilter(df["valor"], 6, 32, 12)

# CF Filter
df["ck_cycle"], df["ck_trend"]  = sm.tsa.filters.cffilter(df["valor"], 6,32,False)




### Plot filtered data
---

In [32]:
# Create the widget to select columns (except 'data')
column_selector = widgets.SelectMultiple(
    options=[col for col in df.columns if col != "data"],
    description="Y Columns:",
    layout=widgets.Layout(width="400px", height="200px")
)

# Output area for the plot
plot_output = widgets.Output()

# Function to update the plot
def update_plot(change):
    with plot_output:
        clear_output()
        selected = list(column_selector.value)

        if not selected:
            print("Select at least one column to plot.")
            return

        # Plot
        sns.set_theme()
        sns.set_context("notebook")
        plt.figure(figsize=(12, 6))

        for col in selected:
            plt.plot(df["data"], df[col], label=col)

        plt.xlabel("Date")
        plt.ylabel("Value")
        plt.title("Selected Columns Over Time")
        plt.legend()
        sns.despine()
        plt.tight_layout()
        plt.show()

# Connect widget to function
column_selector.observe(update_plot, names="value")

# Display UI
display(widgets.HTML("<b>Select columns to plot (X axis is always 'data'):</b> Use CTRL or CMD to select multiple rows"))
display(column_selector, plot_output)

# Initial plot
update_plot({"new": column_selector.value})


HTML(value="<b>Select columns to plot (X axis is always 'data'):</b> Use CTRL or CMD to select multiple rows")

SelectMultiple(description='Y Columns:', layout=Layout(height='200px', width='400px'), options=('valor', 'log_…

Output()