## 1. Imports

First, we import our necessary libraries. We'll import `pandas`, `numpy`, `matplotlib`, and `seaborn`. We also import our local `RiskLabAI` modules for differentiation, bar generation, and our new plot styling utility.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Import from our RiskLabAI package
import RiskLabAI.data.differentiation.differentiation as diff
import RiskLabAI.data.structures as st
import RiskLabAI.utils.publication_plots as pub_plots

# Set plotting backend for pandas (though we'll use matplotlib directly)
pd.options.plotting.backend = "matplotlib"

## 2. Setup Publication Style

We call our utility function to set up the `matplotlib` and `seaborn` styles for the entire notebook. This ensures all plots are consistent and publication-ready.

In [None]:
pub_plots.setup_publication_style()

## 3. Load and Prepare Data

We load tick-level trade data for 'IVE' during 2020. Ticks are not suitable for all types of analysis, so we will resample them.

We use the `generate_time_bar` function from our library to create **1-Business-Day** OHLCV bars. This aggregates the tick data into a daily series.

In [None]:
# Load tick data
dir_url = "https://raw.githubusercontent.com/risk-labratory/data/main/"
file_url = dir_url + "IVE_2020.csv"

df = pd.read_csv(file_url, header=0)
df['dates'] = pd.to_datetime(df['dates'])
df.set_index('dates', inplace=True, drop=True)
df.drop_duplicates(inplace=True)

# Filter for standard 9:30-16:00 market hours
df = df[(df.index.hour >= 9) & (df.index.hour < 16)]

# Generate daily OHLCV bars
ohlcv = st.generate_time_bar(df, frequency="1B")
ohlcv.dropna(inplace=True)
close = ohlcv['close'].to_frame() # Use a DataFrame for compatibility

print("Daily OHLCV data:")
ohlcv.head()

## 4. Snippet 5.1: Plotting Fractional Weights

Before differentiating, let's visualize what the weights look like. Fractional differentiation gives us a way to control the "memory" of a time series.

* **d = 0:** Full memory (original series).
* **d = 1:** No memory (standard returns).
* **0 < d < 1:** Decaying memory.

We will create a 1x2 subplot to show both `d` ranges.

In [None]:
# Create a figure with two subplots side-by-side
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

# Plot weights for d in [0, 1]
diff.plot_weights((0.0, 1.0), 11, 6, ax=ax1)
# Apply our custom style
pub_plots.apply_plot_style(
    ax1, 
    'FracDiff Weights (d=[0,1])', 
    'Lag', 
    'Weight', 
    legend_title='Degree (d)'
)

# Plot weights for d in [1, 2]
diff.plot_weights((1, 2), 11, 6, ax=ax2)
# Apply our custom style
pub_plots.apply_plot_style(
    ax2, 
    'FracDiff Weights (d=[1,2])', 
    'Lag', 
    'Weight', 
    legend_title='Degree (d)'
)

plt.tight_layout()
plt.show()

## 5. Snippet 5.2: Standard (Expanding Window) FracDiff

This method calculates the differentiated series using an expanding window. We will compare the effect of the `threshold` parameter, which controls the "warm-up" period. We'll use a twin-Y-axis plot to show the original price and the differentiated series on different scales.

In [None]:
# Calculate standard fractional differentiation
close_fd_std_short = diff.fractional_difference_std(
    close, 
    degree=0.4, 
    threshold=0.01
).iloc[:, 0]

close_fd_std_long = diff.fractional_difference_std(
    close, 
    degree=0.4, 
    threshold=0.1
).iloc[:, 0]

# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))

# Plot 1 (Primary Y-axis)
ax.plot(close.index, close['close'], label='Original Close', color='C0')

# Create secondary Y-axis
ax2 = ax.twinx()
ax2.plot(close_fd_std_short.index, close_fd_std_short, label='FracDiff (Thresh=0.01)', color='C1', linestyle='--')
ax2.plot(close_fd_std_long.index, close_fd_std_long, label='FracDiff (Thresh=0.1)', color='C2', linestyle=':')

# Apply styling
pub_plots.apply_plot_style(
    ax, 
    'Standard (Expanding Window) FracDiff', 
    'Date', 
    'Price ($)'
)
ax2.set_ylabel('Differentiated Value')

# Combine legends from both axes
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper left')

# Turn off the grid for the secondary axis to avoid clutter
ax2.grid(False)

plt.show()

## 6. Snippet 5.3: Fixed-Width Window (FFD) FracDiff

The Fixed-Width Window (FFD) method is more efficient. It uses a fixed-size window determined by the weight `threshold`. This is the most practical and recommended method.

In [None]:
# Calculate FFD
close_ffd = diff.fractional_difference_fixed(
    close, 
    degree=0.4, 
    threshold=1e-3
).iloc[:, 0]

# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))

# Plot 1 (Primary Y-axis)
ax.plot(close.index, close['close'], label='Original Close', color='C0')

# Create secondary Y-axis
ax2 = ax.twinx()
ax2.plot(close_ffd.index, close_ffd, label='FFD (d=0.4)', color='C1', linestyle='--')

# Apply styling
pub_plots.apply_plot_style(
    ax, 
    'Fixed-Width Window (FFD) FracDiff', 
    'Date', 
    'Price ($)'
)
ax2.set_ylabel('Differentiated Value')

# Combine legends
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper left')
ax2.grid(False)

plt.show()

## 7. Snippet 5.4: Finding the Optimal 'd'

The main goal of differentiation is to make a time series **stationary** while preserving as much memory (correlation) as possible.

We iterate through `d` values and find the *minimum `d`* that passes the Augmented Dickey-Fuller (ADF) test for stationarity (i.e., `adfStat` < `95% conf`).

In [None]:
# Find the optimal 'd' by testing for stationarity
out = diff.find_optimal_ffd_simple(ohlcv, p_value_threshold=0.05)

print("ADF Test Results per 'd':")
print(out)

Now, let's plot the results. We want to find the lowest `d` where the **ADF Statistic** (blue line) drops *below* the **95% confidence level** (green dashed line).

In [None]:
# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))

# Plot Correlation (Primary Y-axis)
ax.plot(out.index, out['corr'], label='Correlation', color='C0', marker='o')

# Create secondary Y-axis for ADF Stat
ax2 = ax.twinx()
ax2.plot(out.index, out['adfStat'], label='ADF Statistic', color='C1', marker='x')

# Add 95% confidence line
ax2.axhline(
    y=out['95% conf'].mean(), 
    color='green', 
    linestyle='--', 
    linewidth=2, 
    label='95% Confidence'
)

# Apply styling
pub_plots.apply_plot_style(
    ax, 
    "Finding Minimum 'd' for Stationarity", 
    'Degree (d)', 
    'Correlation'
)
ax2.set_ylabel('ADF Statistic')

# Combine legends
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper right')
ax2.grid(False)

plt.show()

**Analysis:** Based on the table and plot, a `d` value of **0.3** is the first to pass the test: its `adfStat` (-3.04) is less than the `95% conf` (-2.87), and its `pVal` (0.03) is below our 0.05 threshold.

## 8. Applying the Optimal 'd'

Finally, we use the `fractionally_differentiated_log_price` function to automate the process of finding the minimum `d` and applying it to our log-price series.

In [None]:
# This function automatically finds the minimum 'd' and returns the series
log_price_series = np.log(ohlcv['close'])

optimal_series = diff.fractionally_differentiated_log_price(
    log_price_series, 
    step=0.1,  # We can use a coarser step for speed
    p_value_threshold=0.05
)

print(f"Optimal series has {optimal_series.shape[0]} observations.")

# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))

# Plot Log Price (Primary Y-axis)
ax.plot(log_price_series.index, log_price_series, label='Log Price (Non-Stationary)', color='C0')

# Create secondary Y-axis
ax2 = ax.twinx()
ax2.plot(optimal_series.index, optimal_series, label='Optimal FFD Series (Stationary)', color='C1', linestyle='--')

# Apply styling
pub_plots.apply_plot_style(
    ax, 
    'Original Log-Price vs. Optimal Stationary Series', 
    'Date', 
    'Log Price'
)
ax2.set_ylabel('Stationary Series')

# Combine legends
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper left')
ax2.grid(False)

plt.show()