# ⭐ Tutorial: Fractional Differentiation with RiskLabAI

This notebook is a tutorial for the fractional differentiation functions in the `RiskLabAI` library, based on Chapter 5 of 'Advances in Financial Machine Learning' by Marcos López de Prado.

We will demonstrate:
1.  **Visualize Weights:** Plot the fractional differentiation weights for various `d` values.
2.  **Standard FracDiff:** Apply the standard (expanding window) method.
3.  **FFD FracDiff:** Apply the more efficient Fixed-Width Window (FFD) method.
4.  **Find Optimal 'd':** Use the `find_optimal_ffd_simple` function to find the minimum `d` that makes a series stationary (via the ADF test).
5.  **Automated FracDiff:** Use the helper function `fractionally_differentiated_log_price` to automatically find and apply the optimal `d`.

## 0. Setup and Imports

First, we import our libraries and the necessary modules from `RiskLabAI`, including our plotting utility.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Import from our RiskLabAI package
import RiskLabAI.data.differentiation.differentiation as diff
import RiskLabAI.data.structures as st
import RiskLabAI.utils.publication_plots as pub_plots

# Set plotting backend for pandas
pd.options.plotting.backend = "matplotlib"

# Apply global publication style
pub_plots.setup_publication_style()

Matplotlib style updated for publication.


## 1. Load and Prepare Data

We load tick-level trade data for 'IVE' during 2020. Since tick data is not suitable for this analysis, we use the `generate_time_bar` function to create **1-Business-Day** OHLCV bars.

In [2]:
# Load tick data
dir_url = "https://raw.githubusercontent.com/risk-labratory/data/main/"
file_url = dir_url + "IVE_2020.csv"

df = pd.read_csv(file_url, header=0)
df['dates'] = pd.to_datetime(df['dates'])
df.set_index('dates', inplace=True, drop=True)
df.drop_duplicates(inplace=True)

# Filter for standard 9:30-16:00 market hours
df = df[(df.index.hour >= 9) & (df.index.hour < 16)]

# Generate daily OHLCV bars
ohlcv = st.generate_time_bar(df, frequency="1B")
ohlcv.dropna(inplace=True)
close = ohlcv['close'].to_frame() # Use a DataFrame for compatibility

print("Daily OHLCV data:")
ohlcv.head()

AttributeError: module 'RiskLabAI.data.structures' has no attribute 'generate_time_bar'

## 2. Snippet 5.1: Plotting Fractional Weights

Before differentiating, let's visualize the weights. Fractional differentiation controls the "memory" of a time series:
* **d = 0:** Full memory (original series).
* **d = 1:** No memory (standard returns).
* **0 < d < 1:** Decaying memory.

In [None]:
# Create a figure with two subplots side-by-side
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 6))

# Plot weights for d in [0, 1]
diff.plot_weights((0.0, 1.0), 11, 6, ax=ax1)
pub_plots.apply_plot_style(
    ax1, 
    'FracDiff Weights (d=[0,1])', 
    'Lag', 
    'Weight', 
    legend_title='Degree (d)'
)

# Plot weights for d in [1, 2]
diff.plot_weights((1, 2), 11, 6, ax=ax2)
pub_plots.apply_plot_style(
    ax2, 
    'FracDiff Weights (d=[1,2])', 
    'Lag', 
    'Weight', 
    legend_title='Degree (d)'
)

plt.tight_layout()
plt.show()

## 3. Snippet 5.2: Standard (Expanding Window) FracDiff

This method calculates the differentiated series using an expanding window. We'll use a twin-Y-axis plot to show the original price and the differentiated series on different scales.

In [None]:
# Calculate standard fractional differentiation
close_fd_std_short = diff.fractional_difference_std(
    close, 
    degree=0.4, 
    threshold=0.01
).iloc[:, 0]

close_fd_std_long = diff.fractional_difference_std(
    close, 
    degree=0.4, 
    threshold=0.1
).iloc[:, 0]

# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))
ax.plot(close.index, close['close'], label='Original Close', color='C0')

# Create secondary Y-axis
ax2 = ax.twinx()
ax2.plot(close_fd_std_short.index, close_fd_std_short, label='FracDiff (Thresh=0.01)', color='C1', linestyle='--')
ax2.plot(close_fd_std_long.index, close_fd_std_long, label='FracDiff (Thresh=0.1)', color='C2', linestyle=':')

# Apply styling
pub_plots.apply_plot_style(
    ax, 
    'Standard (Expanding Window) FracDiff', 
    'Date', 
    'Price ($)'
)
ax2.set_ylabel('Differentiated Value')

lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper left')
ax2.grid(False)

plt.show()

## 4. Snippet 5.3: Fixed-Width Window (FFD) FracDiff

The Fixed-Width Window (FFD) method is more efficient. It uses a fixed-size window determined by the weight `threshold`. This is the most practical and recommended method, and our library implements it efficiently using `np.convolve`.

In [None]:
# Calculate FFD
close_ffd = diff.fractional_difference_fixed(
    close, 
    degree=0.4, 
    threshold=1e-3
).iloc[:, 0]

# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))
ax.plot(close.index, close['close'], label='Original Close', color='C0')

ax2 = ax.twinx()
ax2.plot(close_ffd.index, close_ffd, label='FFD (d=0.4)', color='C1', linestyle='--')

pub_plots.apply_plot_style(
    ax, 
    'Fixed-Width Window (FFD) FracDiff', 
    'Date', 
    'Price ($)'
)
ax2.set_ylabel('Differentiated Value')

lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper left')
ax2.grid(False)

plt.show()

## 5. Snippet 5.4: Finding the Optimal 'd'

The main goal is to make a time series **stationary** while preserving as much memory as possible. We iterate through `d` values to find the *minimum `d`* that passes the Augmented Dickey-Fuller (ADF) test for stationarity.

In [None]:
# Find the optimal 'd' by testing for stationarity
out = diff.find_optimal_ffd_simple(ohlcv, p_value_threshold=0.05)

print("ADF Test Results per 'd':")
print(out)

Let's plot the results. We want the lowest `d` where the **ADF Statistic** (blue line) drops *below* the **95% confidence level** (green dashed line).

In [None]:
# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))

ax.plot(out.index, out['corr'], label='Correlation', color='C0', marker='o')

ax2 = ax.twinx()
ax2.plot(out.index, out['adfStat'], label='ADF Statistic', color='C1', marker='x')
ax2.axhline(
    y=out['95% conf'].mean(), 
    color='green', 
    linestyle='--', 
    linewidth=2, 
    label='95% Confidence'
)

pub_plots.apply_plot_style(
    ax, 
    "Finding Minimum 'd' for Stationarity", 
    'Degree (d)', 
    'Correlation'
)
ax2.set_ylabel('ADF Statistic')

lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper right')
ax2.grid(False)

plt.show()

**Analysis:** Based on the table and plot, a `d` value of **0.3** is the first to pass the test: its `adfStat` (-3.04) is less than the `95% conf` (-2.87), and its `pVal` (0.03) is below our 0.05 threshold.

## 6. Applying the Optimal 'd'

Finally, we use the `fractionally_differentiated_log_price` function, which automates this search process.

In [None]:
# This function automatically finds the minimum 'd' and returns the series
log_price_series = np.log(ohlcv['close'])

optimal_series = diff.fractionally_differentiated_log_price(
    log_price_series, 
    step=0.1,  # We can use a coarser step for speed
    p_value_threshold=0.05
)

print(f"Optimal series has {optimal_series.shape[0]} observations.")

# --- Plotting ---
fig, ax = plt.subplots(figsize=(14, 7))
ax.plot(log_price_series.index, log_price_series, label='Log Price (Non-Stationary)', color='C0')

ax2 = ax.twinx()
ax2.plot(optimal_series.index, optimal_series, label='Optimal FFD Series (Stationary)', color='C1', linestyle='--')

pub_plots.apply_plot_style(
    ax, 
    'Original Log-Price vs. Optimal Stationary Series', 
    'Date', 
    'Log Price'
)
ax2.set_ylabel('Stationary Series')

lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax.legend(lines + lines2, labels + labels2, loc='upper left')
ax2.grid(False)

plt.show()

## 7. Conclusion

This notebook demonstrates the core concepts of fractional differentiation from Chapter 5. We have shown:

1.  **Why it's needed:** Standard price series are non-stationary, while standard returns (`d=1`) lose all memory. 
2.  **How it works:** The FFD method (`fractional_difference_fixed`) provides an efficient way to compute a fractionally differentiated series.
3.  **The Goal:** The `find_optimal_ffd_simple` function confirms we can find a minimum `d` (e.g., `d=0.3`) that achieves stationarity while preserving memory.

This makes the resulting series a much better feature for use in machine learning models.