# A Computational Verification of Factor Model Properties

This report computationally demonstrates and verifies the fundamental properties of conditional linear factor models as detailed in the paper **"Fundamental properties of linear factor models" by Damir Filipović and Paul Schneider (2025)**.

We use Monte Carlo simulations to create controlled environments where the paper's theoretical propositions can be tested numerically, bridging the gap between abstract financial theory and computational practice.

---

### Introduction

Linear factor models are the cornerstone of modern asset pricing. They posit that the returns of thousands of assets can be explained by their shared exposure to a small number of common risk factors. While many new factors have been proposed, this project focuses on the foundational mathematics that *any* such model must obey.

Our goal is to use a simulation-based approach to verify three of the paper's most critical results:
1. **The Impossibility Result**: A surprising structural constraint on models with tradable factors.
2. **Sharpe Ratio Spanning**: The conditions under which factors can be mean-variance efficient.
3. **Generative Models and GLS Factors**: A powerful link between abstract statistical models and practical, optimal portfolios.

By verifying these propositions numerically, we aim to build a deeper, more intuitive understanding of how factor models work.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
from pprint import pprint

sys.path.append('../src')
from data_simulation import create_characteristics_matrix, simulate_asset_returns, create_spanning_mu_vector
from factor_construction import construct_ols_factors, construct_gls_factors
from verification_tests import verify_residual_covariance_singularity, verify_sharpe_ratio_spanning, verify_gls_spanning_and_correlation

sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

--- 
## Verification 1: The Impossibility Result

**The Theory** states that if you construct tradable factors from a set of asset returns, the covariance matrix of the *implied residuals* must be **singular**. This means it cannot be inverted, a crucial detail for many econometric procedures.

**The Test**: We will simulate a standard asset panel, construct tradable OLS factors, calculate the implied residuals, and then check if the rank of their covariance matrix is less than the number of assets.

In [2]:
# Simulation Parameters
N_ASSETS_P2 = 15
N_FACTORS_P2 = 4
N_PERIODS_P2 = 500
RANDOM_STATE_P2 = 42

# Ground Truth Components
phi_t_p2 = create_characteristics_matrix(N_ASSETS_P2, N_FACTORS_P2, random_state=RANDOM_STATE_P2)
sigma_f_t_p2 = np.diag(np.random.default_rng(RANDOM_STATE_P2).uniform(0.02, 0.05, N_FACTORS_P2)**2)
sigma_epsilon_t_p2 = np.diag(np.random.default_rng(RANDOM_STATE_P2).uniform(0.05, 0.15, N_ASSETS_P2)**2)

# Simulate and Construct
sim_results_p2 = simulate_asset_returns(phi_t_p2, sigma_f_t_p2, sigma_epsilon_t_p2, N_PERIODS_P2, random_state=RANDOM_STATE_P2)
df_assets_p2 = sim_results_p2['asset_returns']
df_ols_factors_p2 = construct_ols_factors(df_assets_p2, phi_t_p2)

# Verify
singularity_test_result = verify_residual_covariance_singularity(df_assets_p2, df_ols_factors_p2, phi_t_p2)
pprint(singularity_test_result)

{'details': {'Is Singular': np.True_,
             'Number of Assets': 15,
             'Rank of Residual Covariance': np.int64(11)},
 'result': np.True_,
 'test': 'Corollary 3.4 (Residual Covariance Singularity)'}


**Interpretation**: The test passes. The rank of the residual covariance matrix is 11, which is less than the number of assets (15). This confirms the theory: the act of constructing factors from the assets creates linear dependencies in the residuals, forcing their covariance matrix to be singular.

## Verification 2: Sharpe Ratio Spanning

**The Theory** states that tradable factors are mean-variance efficient (i.e., achieve the max Sharpe ratio) if asset risk premia (`µ_t`) can be fully explained by their covariance with the factor portfolios.

**The Test**: We will engineer a simulation where this condition holds by construction. We create an idealized environment with zero idiosyncratic risk to isolate the core theoretical result. We then verify that the max squared Sharpe ratio of the factors equals that of the full asset universe.

In [5]:
# Simulation Parameters
N_ASSETS_P3 = 30
N_FACTORS_P3 = 4
N_PERIODS_P3 = 10000 # Use a long series to minimize sampling error
RANDOM_STATE_P3 = 123

# Ground Truth Components
phi_t_p3 = create_characteristics_matrix(N_ASSETS_P3, N_FACTORS_P3, random_state=RANDOM_STATE_P3)
sigma_f_t_p3 = np.diag(np.random.default_rng(RANDOM_STATE_P3).uniform(0.03, 0.07, N_FACTORS_P3)**2)
# Idealized environment: No idiosyncratic risk
sigma_epsilon_t_p3 = np.zeros((N_ASSETS_P3, N_ASSETS_P3))

# Engineer a mu_t that satisfies the spanning condition
sigma_t_p3 = (phi_t_p3 @ sigma_f_t_p3 @ phi_t_p3.T) + sigma_epsilon_t_p3
w_t_p3 = np.linalg.pinv(phi_t_p3).T
factor_risk_premia_p3 = np.array([0.06, 0.04, 0.05, 0.03]) / 12
mu_t_spanning_p3 = create_spanning_mu_vector(sigma_t_p3, w_t_p3, factor_risk_premia_p3)

# Simulate and Construct
sim_results_p3 = simulate_asset_returns(phi_t_p3, sigma_f_t_p3, sigma_epsilon_t_p3, N_PERIODS_P3, mu_t=mu_t_spanning_p3, random_state=RANDOM_STATE_P3)
df_assets_p3 = sim_results_p3['asset_returns']
df_ols_factors_p3 = construct_ols_factors(df_assets_p3, phi_t_p3)

# Verify
spanning_test_result = verify_sharpe_ratio_spanning(df_assets_p3, df_ols_factors_p3)
pprint(spanning_test_result)

{'details': {'Absolute Difference': 6.505213034913027e-19,
             'Condition (SR2_assets ≈ SR2_factors)': True,
             'Max Sq Sharpe Ratio (Factors)': 0.000342692666762385,
             'Max Sq Sharpe Ratio (Full Universe)': 0.00034269266676238436},
 'result': True,
 'test': 'Proposition 5.1 (Sharpe Ratio Spanning)'}


**Interpretation**: Success. The absolute difference between the squared Sharpe ratios is virtually zero. This powerfully demonstrates that when the factor spanning condition is met, a few factor portfolios can indeed capture all the mean-variance diversification benefits of a much larger asset universe.

## Verification 3: Generative Models & GLS Factors

**The Theory**: This is the capstone result. It states that for any return panel whose covariance matrix has a factor structure (`Σ_t = Φ_t*C_t*Φ_t' + D_t`), we can construct specific **tradable GLS factors** that are both **mean-variance efficient** (spanning) and **uncorrelated with their residuals**.

**The Test**: We will start with a decomposed covariance matrix as our ground truth. We'll simulate returns and construct the specific GLS factors defined in the theory. Finally, we will verify both the Sharpe ratio spanning property and the zero-correlation property.

In [4]:
# Simulation Parameters
N_ASSETS_P4 = 25
N_FACTORS_P4 = 5
N_PERIODS_P4 = 10000
RANDOM_STATE_P4 = 456

# Ground Truth Components (C_t and D_t)
phi_t_p4 = create_characteristics_matrix(N_ASSETS_P4, N_FACTORS_P4, random_state=RANDOM_STATE_P4)
rng_p4 = np.random.default_rng(RANDOM_STATE_P4)
# Here, C_t is analogous to sigma_f_t and D_t is analogous to sigma_epsilon_t
c_t = np.diag(rng_p4.uniform(0.03, 0.08, N_FACTORS_P4)**2)

# THE FIX: Create an idealized environment by removing idiosyncratic risk (D_t = 0).
# This prevents the tradable GLS factors from being contaminated by noise,
# allowing the theoretical spanning property to hold perfectly.
d_t = np.zeros((N_ASSETS_P4, N_ASSETS_P4))

# Engineer a mu_t that lies within the image of Phi_t, as required by Prop 6.3(iii)
factor_premia_p4 = np.array([0.05, 0.03, 0.04, 0.06, 0.02]) / 12
mu_t_p4 = phi_t_p4 @ factor_premia_p4

# Simulate (pass C_t as sigma_f_t and D_t as sigma_epsilon_t)
sim_results_p4 = simulate_asset_returns(phi_t_p4, c_t, d_t, N_PERIODS_P4, mu_t=mu_t_p4, random_state=RANDOM_STATE_P4)
df_assets_p4 = sim_results_p4['asset_returns']

# Construct the special GLS factors
df_gls_factors_p4 = construct_gls_factors(df_assets_p4, phi_t_p4, d_t)

# Verify both properties simultaneously
gls_test_result = verify_gls_spanning_and_correlation(df_assets_p4, df_gls_factors_p4, phi_t_p4)
pprint(gls_test_result)

{'details': {'Factor-Residual Correlation Test': {'Is Uncorrelated': True,
                                                  'Max Absolute Covariance': np.float64(4.7391161757821564e-17)},
             'Spanning Result': {'details': {'Absolute Difference': 1.3877787807814457e-17,
                                             'Condition (SR2_assets ≈ SR2_factors)': True,
                                             'Max Sq Sharpe Ratio (Factors)': 0.01922294311969992,
                                             'Max Sq Sharpe Ratio (Full Universe)': 0.019222943119699906},
                                 'result': True,
                                 'test': 'Proposition 5.1 (Sharpe Ratio '
                                         'Spanning)'}},
 'result': True,
 'test': 'Proposition 6.3 & Lemma 6.2 (GLS Properties)'}


**Interpretation**: A resounding success. The GLS factors are both mean-variance efficient (Sharpe ratios are equal) and uncorrelated with their residuals (max absolute covariance is near zero). This provides a powerful, constructive link between an abstract statistical model and a tangible, optimal investment strategy. It shows that if the world has a factor structure, there is a direct way to build tradable portfolios that capture its properties perfectly.

--- 
### Conclusion & Limitations

This project successfully verified three core propositions from Filipović & Schneider (2025) using a controlled Monte Carlo simulation framework. We demonstrated the necessary singularity of the residual covariance matrix, the precise conditions for mean-variance spanning, and the remarkable properties of GLS factors in linking generative models to optimal tradable portfolios.

**Limitations**:
It is crucial to recognize that this was a simulation-based project. The real world presents challenges not addressed here:
1. **Estimation Error**: In practice, we do not know the true `Φ_t`, `µ_t`, or `Σ_t`. They must be estimated from noisy data, introducing significant error.
2. **Model Misspecification**: The real data-generating process is likely far more complex than a simple linear factor model.
3. **Time-Varying Parameters**: We assumed constant parameters, but in reality, factor loadings and covariances change over time.

Despite these limitations, this project provides a solid, intuitive foundation for understanding the fundamental mechanics of linear factor models, which is an essential prerequisite for tackling these more complex real-world challenges.