# **Analysis of Estimation Errors in Portfolio Optimization**

## This notebook analyzes the impact of estimation errors in means, variances, and covariances on portfolio optimization results. We'll use historical data to create a base portfolio and then simulate various types of estimation errors to understand their effects on portfolio performance.


In [1]:
import sys
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from numpy.random import default_rng

project_root = Path.cwd().parent
sys.path.append(str(project_root))

# Import custom modules
from src.data_management import DataManager
from src.portfolio_optimizer import PortfolioParameters, create_base_portfolio
from src.error_analysis import ErrorAnalysisConfig, run_error_analysis
from src.visualization import PortfolioVisualizer

In [2]:
import seaborn as sns
sns.set_style("whitegrid")
%matplotlib inline

## 1. Data Collection and Processing
First, we'll define our universe of stocks (DJIA constituents) and randomly select 10 of them.


In [3]:
djia_constituents = [
    'AAPL',  # Apple
    'AMGN',  # Amgen
    'AXP',   # American Express
    'BA',    # Boeing
    'CAT',   # Caterpillar
    'CRM',   # Salesforce
    'CSCO',  # Cisco
    'CVX',   # Chevron
    'DIS',   # Disney
    'DOW',   # Dow Inc.
    'GS',    # Goldman Sachs
    'HD',    # Home Depot
    'HON',   # Honeywell
    'IBM',   # IBM
    'INTC',  # Intel
    'JNJ',   # Johnson & Johnson
    'JPM',   # JPMorgan Chase
    'KO',    # Coca-Cola
    'MCD',   # McDonald's
    'MMM',   # 3M
    'MRK',   # Merck
    'MSFT',  # Microsoft
    'NKE',   # Nike
    'PG',    # Procter & Gamble
    'TRV',   # Travelers
    'UNH',   # UnitedHealth
    'V',     # Visa
    'VZ',    # Verizon
    'WBA',   # Walgreens Boots Alliance
    'WMT',   # Walmart
]

# Set random seed for reproducibility
rng = default_rng(42)

# Randomly select 10 stocks
selected_symbols = sorted(rng.choice(djia_constituents, size=10, replace=False))

print("Selected Stocks:")
for symbol in selected_symbols:
    print(f"- {symbol}")


Selected Stocks:
- AMGN
- AXP
- CRM
- GS
- JNJ
- KO
- MMM
- NKE
- TRV
- WMT


In [4]:
# Initialize DataManager
data_manager = DataManager(
    symbols=selected_symbols,
    start_date='2014-01-01',  # Using 5 years of data
    end_date='2023-12-31',
    data_dir='../data'
)

# Process data
statistics = data_manager.process_all()

# Print basic statistics
print("\nPortfolio Components Statistics:")
for i, symbol in enumerate(selected_symbols):
    print(f"\n{symbol}:")
    print(f"  Expected Monthly Return: {statistics['expected_returns'][i]:.4%}")
    print(f"  Monthly Volatility: {np.sqrt(statistics['covariance_matrix'][i,i]):.4%}")

# Print data range
print("\nData Range:")
print(f"First data point: {data_manager.prices.index[0].strftime('%Y-%m-%d')}")
print(f"Last data point: {data_manager.prices.index[-1].strftime('%Y-%m-%d')}")

2024-11-29 14:31:09,134 - src.data_management - INFO - Initialized DataManager with 10 symbols
2024-11-29 14:31:09,229 - src.data_management - INFO - Data loaded successfully for 10 symbols



Portfolio Components Statistics:

AMGN:
  Expected Monthly Return: 2.2277%
  Monthly Volatility: 7.9929%

AXP:
  Expected Monthly Return: 2.5788%
  Monthly Volatility: 8.9089%

CRM:
  Expected Monthly Return: 1.1137%
  Monthly Volatility: 5.0717%

GS:
  Expected Monthly Return: 1.8014%
  Monthly Volatility: 6.8561%

JNJ:
  Expected Monthly Return: 0.6672%
  Monthly Volatility: 4.5168%

KO:
  Expected Monthly Return: 1.6570%
  Monthly Volatility: 7.1277%

MMM:
  Expected Monthly Return: 2.1688%
  Monthly Volatility: 9.7777%

NKE:
  Expected Monthly Return: 2.1990%
  Monthly Volatility: 6.3406%

TRV:
  Expected Monthly Return: 5.7350%
  Monthly Volatility: 13.5089%

WMT:
  Expected Monthly Return: 1.8031%
  Monthly Volatility: 5.7076%

Data Range:
First data point: 2014-12-01
Last data point: 2024-11-27


## 2. Base Portfolio Optimization

Now we'll create a base optimal portfolio using the true parameters (historical estimates).


In [5]:
# Create base optimal portfolio
risk_tolerance = 50  # Moderate risk tolerance
optimal_weights, base_optimizer = create_base_portfolio(
    expected_returns=statistics['expected_returns'],
    covariance_matrix=statistics['covariance_matrix'],
    risk_tolerance=risk_tolerance
)

# Print base portfolio characteristics
print("\nBase Portfolio Allocation:")
selected_weights = [(symbol, weight) for symbol, weight in zip(selected_symbols, optimal_weights) 
                   if weight > 0.01]  # Only show positions > 1%
selected_weights.sort(key=lambda x: x[1], reverse=True)

for symbol, weight in selected_weights:
    print(f"{symbol}: {weight:.2%}")

expected_return = np.dot(optimal_weights, statistics['expected_returns'])
portfolio_risk = np.sqrt(optimal_weights @ statistics['covariance_matrix'] @ optimal_weights)

print(f"\nPortfolio Characteristics:")
print(f"Expected Monthly Return: {expected_return:.2%}")
print(f"Monthly Portfolio Risk: {portfolio_risk:.2%}")
print(f"Active Positions (>1%): {len(selected_weights)}")
print(f"Annualized Sharpe Ratio (Rf=0): {(expected_return * 12) / (portfolio_risk * np.sqrt(12)):.2f}")



Base Portfolio Allocation:
TRV: 40.00%
AXP: 28.07%
WMT: 24.78%
NKE: 7.15%

Portfolio Characteristics:
Expected Monthly Return: 3.62%
Monthly Portfolio Risk: 7.70%
Active Positions (>1%): 4
Annualized Sharpe Ratio (Rf=0): 1.63


## 3. Error Analysis Configuration

We'll configure the error analysis parameters to study how different types and magnitudes of estimation errors affect portfolio performance.


In [None]:
# Configure error analysis
config = ErrorAnalysisConfig(
    n_iterations=5,  # Drásticamente reducido para prueba
    error_magnitudes=np.array([0.10]),  # Solo una magnitud
    risk_tolerances=np.array([50]),  # Solo un nivel de riesgo
    n_jobs=1,  # Un solo proceso
    random_seed=42
)

print("Starting error analysis...")
print(f"Total combinations to process: {len(config.error_magnitudes) * len(config.risk_tolerances) * 3}")
print(f"Iterations per combination: {config.n_iterations}")
print(f"Using {config.n_jobs} CPU cores\n")

# Run error analysis with progress tracking
results = run_error_analysis(
    expected_returns=statistics['expected_returns'],
    covariance_matrix=statistics['covariance_matrix'],
    config=config
)

# Print completion message
print("\nError analysis completed!")

2024-11-29 14:31:22,838 - src.error_analysis - INFO - Starting analysis with 100 iterations per combination
2024-11-29 14:31:22,840 - src.error_analysis - INFO - Total parameter combinations: 36
2024-11-29 14:31:22,842 - src.error_analysis - INFO - Total simulations to run: 3600
2024-11-29 14:31:22,845 - src.error_analysis - INFO - Processing combination 1/36: means, k=0.05, rt=25
2024-11-29 14:31:22,846 - src.error_analysis - INFO - Processing combination 2/36: means, k=0.05, rt=50
2024-11-29 14:31:22,848 - src.error_analysis - INFO - Processing combination 3/36: means, k=0.05, rt=75
2024-11-29 14:31:22,850 - src.error_analysis - INFO - Processing combination 4/36: means, k=0.1, rt=25
2024-11-29 14:31:22,853 - src.error_analysis - INFO - Processing combination 5/36: means, k=0.1, rt=50
2024-11-29 14:31:22,855 - src.error_analysis - INFO - Processing combination 6/36: means, k=0.1, rt=75
2024-11-29 14:31:22,857 - src.error_analysis - INFO - Processing combination 7/36: means, k=0.15, r

Starting error analysis...
Total combinations to process: 36
Iterations per combination: 100
Using 4 CPU cores



In [None]:
# Print summary statistics
print("\nError Analysis Summary:")
print("\nMean Cash Equivalent Loss (CEL) by Error Type and Magnitude:")
cel_summary = results.xs(('cel', 'mean'), level=1, axis=1).groupby('error_type').mean()
print(cel_summary.round(4))

print("\nMean Weight Difference by Error Type and Magnitude:")
weight_diff_summary = results.xs(('mean_weight_diff', 'mean'), level=1, axis=1).groupby('error_type').mean()
print(weight_diff_summary.round(4))

## 4. Visualization of Results
Let's create various visualizations to better understand the impact of estimation errors.


In [None]:
# Initialize visualizer
visualizer = PortfolioVisualizer(figsize=(12, 8))

# Create and save all visualizations
visualizer.create_analysis_dashboard(results, output_dir='../figures')

# Display key plots inline
visualizer.plot_cel_heatmap(results)
plt.show()

visualizer.plot_cel_confidence_bands(results)
plt.show()

visualizer.plot_risk_return_scatter(results)
plt.show()

## 5. Analysis of Results
Let's analyze the key findings from our error analysis:


In [None]:
# Calculate detailed summary statistics
summary_stats = pd.DataFrame({
    'Error Type': results.index.get_level_values('error_type'),
    'Error Magnitude': results.index.get_level_values('error_magnitude'),
    'Risk Tolerance': results.index.get_level_values('risk_tolerance'),
    'Mean CEL': results[('cel', 'mean')],
    'Max CEL': results[('cel', 'max')],
    'Mean Weight Diff': results[('mean_weight_diff', 'mean')],
    'Return Difference': results[('suboptimal_return', 'mean')] - results[('optimal_return', 'mean')],
    'Risk Difference': results[('suboptimal_risk', 'mean')] - results[('optimal_risk', 'mean')]
}).round(4)

# Group by error type and magnitude
grouped_stats = summary_stats.groupby(['Error Type', 'Error Magnitude']).mean()
print("\nDetailed Summary Statistics:")
print(grouped_stats)