# Getting Started with the MFE Toolbox

This notebook provides an introduction to the MFE Toolbox, a comprehensive Python-based suite for financial econometrics, time series analysis, and risk modeling. The toolbox is a complete modernization of the original MATLAB-based toolbox, now implemented in Python 3.12 with a focus on performance, usability, and integration with the Python scientific ecosystem.

## What is the MFE Toolbox?

The MFE Toolbox provides a wide range of tools for financial econometrics and time series analysis, including:

- **Univariate volatility modeling**: GARCH, EGARCH, TARCH, and other variants
- **Multivariate volatility modeling**: BEKK, DCC, RARCH, and related models
- **ARMA/ARMAX time series modeling and forecasting**
- **Bootstrap methods for dependent data**
- **Non-parametric volatility estimation** (realized volatility)
- **Classical statistical tests and distributions**
- **Vector autoregression (VAR) analysis**
- **Principal component analysis and cross-sectional econometrics**

This notebook will guide you through the installation process and demonstrate basic usage of the toolbox's core functionality.

## Installation

The MFE Toolbox can be installed using pip, Python's package installer. It's recommended to install the toolbox in a virtual environment to avoid conflicts with other packages.

### Using pip

```bash
pip install mfe-toolbox
```

### Using a virtual environment

```bash
# Create a virtual environment
python -m venv mfe_env

# Activate the environment (Windows)
mfe_env\Scripts\activate

# Activate the environment (Unix/Linux/macOS)
source mfe_env/bin/activate

# Install MFE Toolbox in the virtual environment
pip install mfe-toolbox
```

### Dependencies

The MFE Toolbox depends on several Python packages that will be automatically installed:

- NumPy (≥1.26.0): For efficient array operations and linear algebra
- SciPy (≥1.11.3): For scientific computing and optimization routines
- Pandas (≥2.1.1): For time series data handling and analysis
- Statsmodels (≥0.14.0): For econometric modeling and statistical analysis
- Numba (≥0.58.0): For JIT compilation of performance-critical functions
- matplotlib (≥3.8.0): For visualization
- PyQt6 (≥6.5.0): For GUI components (optional, only needed for the ARMAX interface)

## Importing the MFE Toolbox

Once installed, you can import the MFE Toolbox in your Python code. Let's start by importing the main package and checking the version:

In [None]:
import mfe

print(f"MFE Toolbox version: {mfe.__version__}")

The MFE Toolbox is organized into several subpackages:

- `mfe.core`: Core functionality and base classes
- `mfe.models`: Implementation of econometric models
- `mfe.utils`: Helper functions for data transformation and analysis
- `mfe.ui`: User interface components

Let's import some commonly used components:

In [None]:
# Import commonly used models directly from the mfe package
from mfe import GARCH, ARMA, DCC, BlockBootstrap, RealizedVariance

# Import specific modules as needed
from mfe.models.univariate import EGARCH, TARCH
from mfe.models.time_series import VAR
from mfe.models.bootstrap import StationaryBootstrap
from mfe.models.distributions import StudentT

# Import utility functions
from mfe.utils import matrix_ops, covariance

# Import NumPy and Pandas for data handling
import numpy as np
import pandas as pd

# Import matplotlib for visualization
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

## Working with Data

The MFE Toolbox works with NumPy arrays and Pandas DataFrames for data handling. Let's create some sample financial data to work with:

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Create a date range for our time series
dates = pd.date_range(start='2020-01-01', periods=1000, freq='D')

# Generate simulated returns with volatility clustering
# (This is a simple AR(1)-GARCH(1,1) process)
n = len(dates)
returns = np.zeros(n)
volatility = np.zeros(n)
volatility[0] = 0.01

# Parameters
omega = 0.00001
alpha = 0.1
beta = 0.85
phi = 0.2  # AR(1) coefficient

for t in range(1, n):
    # Volatility equation
    volatility[t] = np.sqrt(omega + alpha * returns[t-1]**2 + beta * volatility[t-1]**2)
    
    # Return equation with AR(1) component
    returns[t] = phi * returns[t-1] + volatility[t] * np.random.standard_normal()

# Create a Pandas DataFrame
df = pd.DataFrame({
    'returns': returns,
    'volatility': volatility
}, index=dates)

# Display the first few rows
df.head()

In [None]:
# Plot the returns and volatility
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

ax1.plot(df.index, df['returns'])
ax1.set_title('Simulated Returns')
ax1.set_ylabel('Returns')

ax2.plot(df.index, df['volatility'])
ax2.set_title('True Volatility')
ax2.set_ylabel('Volatility')
ax2.set_xlabel('Date')

plt.tight_layout()
plt.show()

## Example 1: Univariate Volatility Modeling with GARCH

Let's start with a basic example of fitting a GARCH(1,1) model to our simulated returns data:

In [None]:
# Create a GARCH(1,1) model
garch_model = GARCH(p=1, q=1, mean='constant')

# Fit the model to our returns data
garch_results = garch_model.fit(df['returns'])

# Display the model summary
print(garch_results)

In [None]:
# Extract the estimated conditional volatility
df['garch_volatility'] = np.sqrt(garch_results.conditional_variance)

# Plot the true volatility vs. estimated volatility
plt.figure(figsize=(12, 6))
plt.plot(df.index, df['volatility'], label='True Volatility')
plt.plot(df.index, df['garch_volatility'], label='GARCH(1,1) Volatility', alpha=0.7)
plt.title('True vs. GARCH(1,1) Estimated Volatility')
plt.xlabel('Date')
plt.ylabel('Volatility')
plt.legend()
plt.show()

### Forecasting with GARCH

Now let's generate volatility forecasts using our fitted GARCH model:

In [None]:
# Generate 30-day ahead forecasts
forecast_horizon = 30
forecasts = garch_results.forecast(horizon=forecast_horizon)

# Create a date range for the forecast period
forecast_dates = pd.date_range(start=df.index[-1] + pd.Timedelta(days=1), periods=forecast_horizon, freq='D')

# Create a DataFrame for the forecasts
forecast_df = pd.DataFrame({
    'volatility_forecast': np.sqrt(forecasts.variance),
    'volatility_lower': np.sqrt(forecasts.variance_lower),
    'volatility_upper': np.sqrt(forecasts.variance_upper)
}, index=forecast_dates)

# Plot the historical volatility and forecasts
plt.figure(figsize=(12, 6))

# Plot historical volatility (last 90 days)
plt.plot(df.index[-90:], df['garch_volatility'][-90:], label='Historical Volatility')

# Plot forecasts
plt.plot(forecast_df.index, forecast_df['volatility_forecast'], label='Volatility Forecast', color='red')
plt.fill_between(forecast_df.index, 
                 forecast_df['volatility_lower'], 
                 forecast_df['volatility_upper'], 
                 color='red', alpha=0.2, label='95% Confidence Interval')

plt.title('GARCH(1,1) Volatility Forecast')
plt.xlabel('Date')
plt.ylabel('Volatility')
plt.legend()
plt.show()

## Example 2: Time Series Analysis with ARMA

Now let's fit an ARMA model to our returns data to capture the serial correlation:

In [None]:
# Create an ARMA(1,1) model
arma_model = ARMA(ar_order=1, ma_order=1, include_constant=True)

# Fit the model to our returns data
arma_results = arma_model.fit(df['returns'])

# Display the model summary
print(arma_results)

In [None]:
# Plot the ACF and PACF of the returns
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# ACF
lags = 20
acf_values = arma_results.acf(lags=lags)
ax1.bar(range(lags+1), acf_values, width=0.3)
ax1.axhline(y=0, linestyle='-', color='black', alpha=0.3)
ax1.axhline(y=1.96/np.sqrt(len(df)), linestyle='--', color='blue', alpha=0.7)
ax1.axhline(y=-1.96/np.sqrt(len(df)), linestyle='--', color='blue', alpha=0.7)
ax1.set_title('Autocorrelation Function (ACF)')
ax1.set_xlabel('Lag')
ax1.set_ylabel('Correlation')

# PACF
pacf_values = arma_results.pacf(lags=lags)
ax2.bar(range(lags+1), pacf_values, width=0.3)
ax2.axhline(y=0, linestyle='-', color='black', alpha=0.3)
ax2.axhline(y=1.96/np.sqrt(len(df)), linestyle='--', color='blue', alpha=0.7)
ax2.axhline(y=-1.96/np.sqrt(len(df)), linestyle='--', color='blue', alpha=0.7)
ax2.set_title('Partial Autocorrelation Function (PACF)')
ax2.set_xlabel('Lag')
ax2.set_ylabel('Partial Correlation')

plt.tight_layout()
plt.show()

In [None]:
# Generate forecasts from the ARMA model
arma_forecasts = arma_results.forecast(horizon=30)

# Create a DataFrame for the forecasts
arma_forecast_df = pd.DataFrame({
    'point_forecast': arma_forecasts.mean,
    'lower_bound': arma_forecasts.mean_lower,
    'upper_bound': arma_forecasts.mean_upper
}, index=forecast_dates)

# Plot the historical returns and forecasts
plt.figure(figsize=(12, 6))

# Plot historical returns (last 90 days)
plt.plot(df.index[-90:], df['returns'][-90:], label='Historical Returns')

# Plot forecasts
plt.plot(arma_forecast_df.index, arma_forecast_df['point_forecast'], label='Return Forecast', color='green')
plt.fill_between(arma_forecast_df.index, 
                 arma_forecast_df['lower_bound'], 
                 arma_forecast_df['upper_bound'], 
                 color='green', alpha=0.2, label='95% Confidence Interval')

plt.title('ARMA(1,1) Return Forecast')
plt.xlabel('Date')
plt.ylabel('Returns')
plt.legend()
plt.show()

## Example 3: Bootstrap Methods

The MFE Toolbox provides bootstrap methods for dependent data. Let's use the block bootstrap to estimate the standard error of the mean return:

In [None]:
# Create a block bootstrap object
block_size = 20  # Block size for the bootstrap
n_bootstraps = 1000  # Number of bootstrap samples

bootstrap = BlockBootstrap(block_size=block_size)

# Generate bootstrap samples
bootstrap_samples = bootstrap.generate_samples(df['returns'].values, n_bootstraps)

# Calculate the mean for each bootstrap sample
bootstrap_means = np.mean(bootstrap_samples, axis=1)

# Calculate the standard error of the mean
bootstrap_std_error = np.std(bootstrap_means, ddof=1)

# Calculate the 95% confidence interval
bootstrap_ci_lower = np.percentile(bootstrap_means, 2.5)
bootstrap_ci_upper = np.percentile(bootstrap_means, 97.5)

# Display the results
print(f"Sample Mean: {np.mean(df['returns']):.6f}")
print(f"Bootstrap Standard Error: {bootstrap_std_error:.6f}")
print(f"95% Confidence Interval: [{bootstrap_ci_lower:.6f}, {bootstrap_ci_upper:.6f}]")

In [None]:
# Plot the bootstrap distribution of the mean
plt.figure(figsize=(10, 6))
plt.hist(bootstrap_means, bins=50, alpha=0.7, color='blue')
plt.axvline(x=np.mean(df['returns']), color='red', linestyle='--', label='Sample Mean')
plt.axvline(x=bootstrap_ci_lower, color='green', linestyle='--', label='2.5% Percentile')
plt.axvline(x=bootstrap_ci_upper, color='green', linestyle='--', label='97.5% Percentile')
plt.title('Bootstrap Distribution of the Mean Return')
plt.xlabel('Mean Return')
plt.ylabel('Frequency')
plt.legend()
plt.show()

## Example 4: Multivariate Volatility Modeling

Let's generate some multivariate return data and fit a DCC-GARCH model:

In [None]:
# Generate multivariate return data (2 assets)
n_assets = 2
n_obs = 1000

# Set correlation between assets
correlation = 0.5
cov_matrix = np.array([[1.0, correlation], [correlation, 1.0]])

# Generate correlated random returns
np.random.seed(42)
multi_returns = np.random.multivariate_normal(mean=np.zeros(n_assets), cov=cov_matrix, size=n_obs)

# Apply GARCH effects to each series
volatility = np.ones((n_obs, n_assets)) * 0.01
returns = np.zeros((n_obs, n_assets))

# GARCH parameters for each asset
omega = np.array([0.00001, 0.00002])
alpha = np.array([0.1, 0.15])
beta = np.array([0.85, 0.8])

for t in range(1, n_obs):
    for i in range(n_assets):
        volatility[t, i] = np.sqrt(omega[i] + alpha[i] * returns[t-1, i]**2 + beta[i] * volatility[t-1, i]**2)
    
    # Apply volatility to the correlated returns
    returns[t] = multi_returns[t] * volatility[t]

# Create a Pandas DataFrame
multi_df = pd.DataFrame(
    returns, 
    columns=['Asset1', 'Asset2'],
    index=pd.date_range(start='2020-01-01', periods=n_obs, freq='D')
)

# Display the first few rows
multi_df.head()

In [None]:
# Plot the multivariate returns
plt.figure(figsize=(12, 6))
plt.plot(multi_df.index, multi_df['Asset1'], label='Asset 1')
plt.plot(multi_df.index, multi_df['Asset2'], label='Asset 2')
plt.title('Multivariate Returns')
plt.xlabel('Date')
plt.ylabel('Returns')
plt.legend()
plt.show()

In [None]:
# Fit a DCC-GARCH model
dcc_model = DCC()
dcc_results = dcc_model.fit(multi_df.values)

# Display the model summary
print(dcc_results)

In [None]:
# Extract the estimated conditional correlations
correlations = dcc_results.conditional_correlations

# Plot the time-varying correlation between the two assets
plt.figure(figsize=(12, 6))
plt.plot(multi_df.index, correlations[:, 0, 1])
plt.axhline(y=correlation, color='red', linestyle='--', label='True Correlation')
plt.title('DCC-GARCH: Time-Varying Correlation between Assets')
plt.xlabel('Date')
plt.ylabel('Correlation')
plt.legend()
plt.show()

## Example 5: Realized Volatility

Let's simulate some high-frequency data and compute realized volatility measures:

In [None]:
# Simulate high-frequency price data
n_days = 20
n_intraday = 100  # 100 observations per day
n_total = n_days * n_intraday

# Generate a random walk with drift
np.random.seed(42)
daily_vol = 0.01  # Daily volatility
intraday_vol = daily_vol / np.sqrt(n_intraday)  # Intraday volatility
drift = 0.0001  # Small drift term

# Generate log prices
log_returns = np.random.normal(drift, intraday_vol, n_total)
log_prices = np.cumsum(log_returns)
prices = np.exp(log_prices)

# Create a time index (seconds within the trading day)
seconds_per_day = 23400  # 6.5 hours = 390 minutes = 23400 seconds
seconds_per_obs = seconds_per_day / n_intraday

# Create time and price arrays
times = np.zeros(n_total)
for d in range(n_days):
    day_start = d * n_intraday
    day_end = (d + 1) * n_intraday
    times[day_start:day_end] = np.arange(0, seconds_per_day, seconds_per_obs)

# Create a DataFrame with time and price data
high_freq_data = pd.DataFrame({
    'day': np.repeat(np.arange(n_days), n_intraday),
    'time': times,
    'price': prices
})

# Display the first few rows
high_freq_data.head()

In [None]:
# Plot the high-frequency price data for the first 3 days
plt.figure(figsize=(12, 6))
for day in range(3):
    day_data = high_freq_data[high_freq_data['day'] == day]
    plt.plot(day_data['time'], day_data['price'], label=f'Day {day+1}')
    
plt.title('High-Frequency Price Data')
plt.xlabel('Seconds (within trading day)')
plt.ylabel('Price')
plt.legend()
plt.show()

In [None]:
# Compute realized volatility for each day
realized_vol = []

for day in range(n_days):
    # Extract data for this day
    day_data = high_freq_data[high_freq_data['day'] == day]
    
    # Calculate log returns
    log_prices = np.log(day_data['price'].values)
    log_returns = np.diff(log_prices)
    
    # Create time and returns arrays for the RealizedVariance estimator
    times = day_data['time'].values[1:]  # Skip the first time point
    
    # Compute realized variance
    rv_estimator = RealizedVariance()
    rv = rv_estimator.compute(returns=log_returns, times=times)
    
    realized_vol.append(np.sqrt(rv))

# Create a DataFrame with the realized volatility estimates
rv_df = pd.DataFrame({
    'day': np.arange(n_days),
    'realized_volatility': realized_vol
})

# Display the results
rv_df

In [None]:
# Plot the realized volatility estimates
plt.figure(figsize=(12, 6))
plt.bar(rv_df['day'], rv_df['realized_volatility'], width=0.6)
plt.axhline(y=daily_vol, color='red', linestyle='--', label='True Daily Volatility')
plt.title('Realized Volatility Estimates')
plt.xlabel('Day')
plt.ylabel('Realized Volatility')
plt.legend()
plt.show()

## Example 6: Statistical Distributions

The MFE Toolbox provides several statistical distributions commonly used in financial econometrics. Let's explore the Student's t-distribution:

In [None]:
# Create a standardized Student's t-distribution with different degrees of freedom
x = np.linspace(-5, 5, 1000)
t_dist_5 = StudentT(nu=5)  # 5 degrees of freedom
t_dist_10 = StudentT(nu=10)  # 10 degrees of freedom
t_dist_30 = StudentT(nu=30)  # 30 degrees of freedom

# Calculate PDF values
pdf_5 = t_dist_5.pdf(x)
pdf_10 = t_dist_10.pdf(x)
pdf_30 = t_dist_30.pdf(x)

# Plot the PDFs
plt.figure(figsize=(12, 6))
plt.plot(x, pdf_5, label='t(5)')
plt.plot(x, pdf_10, label='t(10)')
plt.plot(x, pdf_30, label='t(30)')
plt.plot(x, np.exp(-x**2/2) / np.sqrt(2*np.pi), 'k--', label='Normal')
plt.title("Student's t-Distribution PDF with Different Degrees of Freedom")
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.show()

In [None]:
# Generate random samples from the t-distribution
n_samples = 1000
samples_5 = t_dist_5.rvs(size=n_samples)
samples_10 = t_dist_10.rvs(size=n_samples)
samples_30 = t_dist_30.rvs(size=n_samples)

# Plot histograms of the samples
plt.figure(figsize=(12, 6))
plt.hist(samples_5, bins=50, alpha=0.5, label='t(5)')
plt.hist(samples_10, bins=50, alpha=0.5, label='t(10)')
plt.hist(samples_30, bins=50, alpha=0.5, label='t(30)')
plt.title("Random Samples from Student's t-Distribution")
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

## Example 7: Matrix Operations

The MFE Toolbox provides various matrix operations commonly used in financial econometrics. Let's explore some of these utilities:

In [None]:
# Create a symmetric matrix
A = np.array([[1.0, 0.5, 0.3],
              [0.5, 2.0, 0.7],
              [0.3, 0.7, 3.0]])

print("Original matrix A:")
print(A)
print()

# Vectorize the lower triangular part of the matrix (vech operation)
v = matrix_ops.vech(A)
print("vech(A):")
print(v)
print()

# Reconstruct the matrix from the vectorized form (ivech operation)
A_reconstructed = matrix_ops.ivech(v)
print("ivech(vech(A)):")
print(A_reconstructed)
print()

# Check if the reconstruction is correct
print("Reconstruction error:")
print(np.max(np.abs(A - A_reconstructed)))

In [None]:
# Compute the Cholesky decomposition
L = np.linalg.cholesky(A)
print("Cholesky factor L:")
print(L)
print()

# Verify: L @ L.T should equal A
print("L @ L.T:")
print(L @ L.T)
print()

# Convert covariance matrix to correlation matrix
corr = matrix_ops.cov2corr(A)
print("Correlation matrix:")
print(corr)

## Example 8: Robust Covariance Estimation

Let's demonstrate the robust covariance estimation utilities:

In [None]:
# Generate some autocorrelated data
n = 500
x = np.zeros(n)
x[0] = np.random.standard_normal()
for t in range(1, n):
    x[t] = 0.7 * x[t-1] + np.random.standard_normal()

# Create a design matrix with a constant and the lagged variable
X = np.column_stack((np.ones(n-1), x[:-1]))
y = x[1:]

# OLS estimation
beta = np.linalg.inv(X.T @ X) @ (X.T @ y)
residuals = y - X @ beta

# Standard OLS covariance matrix
sigma2 = np.sum(residuals**2) / (n - 3)
cov_ols = sigma2 * np.linalg.inv(X.T @ X)

# Newey-West robust covariance matrix
cov_nw = covariance.covnw(X, residuals, lags=5)

print("OLS Parameter Estimates:")
print(f"Constant: {beta[0]:.4f}")
print(f"AR(1) coefficient: {beta[1]:.4f}")
print()

print("Standard OLS Standard Errors:")
print(f"Constant: {np.sqrt(cov_ols[0,0]):.4f}")
print(f"AR(1) coefficient: {np.sqrt(cov_ols[1,1]):.4f}")
print()

print("Newey-West Robust Standard Errors:")
print(f"Constant: {np.sqrt(cov_nw[0,0]):.4f}")
print(f"AR(1) coefficient: {np.sqrt(cov_nw[1,1]):.4f}")

## Conclusion

This notebook has provided an introduction to the MFE Toolbox, demonstrating its core functionality for financial econometrics and time series analysis. We've covered:

1. Installation and basic setup
2. Working with data using NumPy arrays and Pandas DataFrames
3. Univariate volatility modeling with GARCH
4. Time series analysis with ARMA
5. Bootstrap methods for dependent data
6. Multivariate volatility modeling with DCC-GARCH
7. Realized volatility estimation
8. Statistical distributions
9. Matrix operations and robust covariance estimation

The MFE Toolbox provides a comprehensive suite of tools for financial econometrics, leveraging the power and flexibility of the Python ecosystem. For more detailed information on specific components, please refer to the documentation and other example notebooks.