# Portfolio Clustering - Exploratory Analysis

**Author:** Roberto Berardi  
**Student ID:** 25419094  
**Project:** Dynamic Portfolio Clustering and Risk Profiling with Machine Learning

This notebook contains exploratory data analysis and visualizations for the portfolio clustering project.

## Setup

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image, display

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("âœ… Libraries loaded")

## Load Results

Load the results from the main analysis (after running `main.py`).

In [None]:
# Load performance tables
clustering_results = pd.read_csv('../results/tables/clustering_performance.csv')
ml_results = pd.read_csv('../results/tables/ml_performance.csv')
ml_model_eval = pd.read_csv('../results/tables/ml_model_evaluation.csv')

print("ðŸ“Š Clustering Results:")
display(clustering_results)

print("\nðŸ“Š ML Results:")
display(ml_results)

print("\nðŸ¤– ML Model Evaluation:")
display(ml_model_eval)

## Key Visualizations

### 1. Performance Comparison

In [None]:
display(Image(filename='../results/figures/1_performance_comparison.png'))

**Key Insights:**
- Clustering strategies consistently outperform ML-driven approaches
- Aggressive clustering achieves 84% return vs 81% for ML
- All strategies beat S&P 500 benchmark (59.62%)
- Sharpe ratios indicate good risk-adjusted performance

### 2. Risk-Return Profile

In [None]:
display(Image(filename='../results/figures/2_risk_return_scatter.png'))

**Key Insights:**
- Higher returns come with higher drawdowns (expected)
- Clustering strategies achieve better return/risk tradeoff
- Conservative portfolios minimize drawdown
- All strategies superior to benchmark

### 3. ML Model Performance

In [None]:
display(Image(filename='../results/figures/3_ml_model_performance.png'))

**Key Insights:**
- Ridge regression performs best (59% directional accuracy)
- Enhanced models (with cluster features) slightly better than base
- All models better than random (50%)
- Negative RÂ² expected in noisy stock markets

### 4. Clustering vs ML Heatmap

In [None]:
display(Image(filename='../results/figures/4_clustering_vs_ml_heatmap.png'))

**Key Insights:**
- Green cells show clustering advantage
- Conservative portfolio benefits most from clustering
- Clustering provides more consistent improvements
- Simple methods can outperform complex ML

### 5. Summary Table

In [None]:
display(Image(filename='../results/figures/5_performance_table.png'))

## Statistical Analysis

Let's perform some additional statistical analysis on the results.

In [None]:
# Calculate average improvements
avg_return_improvement = (clustering_results['Total Return'].mean() - 
                         ml_results['Total Return'].mean())

avg_sharpe_improvement = (clustering_results['Sharpe'].mean() - 
                         ml_results['Sharpe'].mean())

print(f"ðŸ“Š Average Performance Improvements (Clustering vs ML):")
print(f"   Return: +{avg_return_improvement:.2f}%")
print(f"   Sharpe: +{avg_sharpe_improvement:.3f}")

print(f"\nðŸ“Š Benchmark Comparison:")
sp500_return = 59.62
for i, portfolio in enumerate(clustering_results['Portfolio']):
    clust_excess = clustering_results.iloc[i]['Total Return'] - sp500_return
    ml_excess = ml_results.iloc[i]['Total Return'] - sp500_return
    print(f"   {portfolio}:")
    print(f"      Clustering vs S&P: +{clust_excess:.2f}%")
    print(f"      ML vs S&P: +{ml_excess:.2f}%")

## Conclusion

This analysis demonstrates that:

1. **Risk-based clustering outperforms ML predictions** by 3-10% across all portfolio strategies
2. **Both approaches beat the market** - all portfolios exceeded S&P 500 returns
3. **Simpler methods can be more robust** in noisy financial markets
4. **Enhanced ML models** with cluster features show marginal improvements over base models
5. **Directional accuracy matters more than RÂ²** for portfolio applications

For complete analysis, see `main.py` and the full report.

---

**Project:** Dynamic Portfolio Clustering and Risk Profiling with Machine Learning  
**Author:** Roberto Berardi (25419094)  
**Institution:** HEC Lausanne - UNIL, MSc Finance  
**Course:** Advanced Programming - Fall 2025