# SVR Stock Price Prediction Model

SVR Stock Price Prediction Model
Author: Ashutosh Talekar
Course: ISE 464/364 - Coding Project
Task: Support Vector Regression for Stock Price Prediction

This implementation covers:
1. Problem Formulation & Motivation
2. Data Acquisition & Preparation
3. Data Exploratory Analysis
4. Model Selection & Training (SVR)
5. Evaluation & Analysis
6. Communication & Presentation

Advanced contributions include:
- Financial feature engineering (technical indicators)
- Thorough data cleaning with clear rationale
- Multiple SVR kernel comparisons
- Comprehensive model evaluation with confidence intervals
- Analysis of model assumptions and error patterns

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, TimeSeriesSplit
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Set style for better visualizations
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

## 1. Problem Formulation & Motivation

**Problem Statement:**
Predict weekly stock returns for Apple Inc. (AAPL) using Support Vector Regression.

**Inputs:**
- Historical OHLCV data (Open, High, Low, Close, Volume)
- Engineered technical indicators
- Temporal features

**Target:**
- Weekly forward returns (% change in closing price)

**Why SVR?**
1. Captures non-linear relationships
2. Robust to outliers
3. Kernel trick for complex patterns
4. Regularization prevents overfitting

In [None]:
print("="*80)
print("SVR STOCK PRICE PREDICTION MODEL")
print("="*80)
print("\n1. PROBLEM FORMULATION & MOTIVATION")
print("-" * 80)
print("""
PROBLEM STATEMENT:
Predict weekly stock returns for Apple Inc. (AAPL) using Support Vector Regression.
This enables data-driven portfolio optimization and investment decision-making.

INPUTS:
- Historical OHLCV data (Open, High, Low, Close, Volume)
- Engineered technical indicators (moving averages, volatility, momentum)
- Temporal features (day of week, month)

TARGETS:
- Weekly forward returns (% change in closing price)

MOTIVATION:
Stock price prediction is crucial for portfolio management and risk assessment.
SVR is particularly well-suited for this task because:
1. Captures non-linear relationships in financial data
2. Robust to outliers through epsilon-insensitive loss
3. Kernel trick enables complex pattern recognition
4. Regularization prevents overfitting in noisy financial data
""")

## 2. Data Acquisition & Preparation

In [None]:
print("\n2. DATA ACQUISITION & PREPARATION")
print("-" * 80)

# Load raw data
print("Loading data from Yahoo Finance CSV...")
raw = pd.read_csv("AAPL_since_IPO_OHLCV.csv", header=None)

print(f"Raw data shape: {raw.shape}")
print(f"First few rows:\n{raw.head()}")

In [None]:
# Data Cleaning with Clear Rationale
print("\nData Cleaning Steps:")
print("1. Removing metadata rows (rows 0-2 contain headers/ticker info)")
raw = raw.drop(index=[0, 1, 2]).reset_index(drop=True)

print("2. Setting proper column names")
raw.columns = ["Date", "Close", "High", "Low", "Open", "Volume"]

print("3. Converting Date column to datetime format")
raw["Date"] = pd.to_datetime(raw["Date"], errors="coerce")

print("4. Converting numeric columns from string to float")
num_cols = ["Close", "High", "Low", "Open", "Volume"]
raw[num_cols] = raw[num_cols].apply(pd.to_numeric, errors="coerce")

print(f"5. Handling missing values - Found {raw.isnull().sum().sum()} missing values")
raw = raw.dropna().sort_values("Date").reset_index(drop=True)

print(f"6. Removing duplicate dates")
initial_count = len(raw)
raw = raw.drop_duplicates(subset=['Date'], keep='first')
print(f"   Removed {initial_count - len(raw)} duplicate entries")

print(f"\nCleaned data shape: {raw.shape}")
print(f"Date range: {raw['Date'].min()} to {raw['Date'].max()}")
print(f"Total trading days: {len(raw)}")

## 3. Data Exploratory Analysis

In [None]:
print("\n3. DATA EXPLORATORY ANALYSIS")
print("-" * 80)

# Basic statistics
print("\nDescriptive Statistics:")
print(raw[num_cols].describe())

# Focus on recent data
print("\nFocusing on last 10 years for relevant market conditions...")
recent_cutoff = raw['Date'].max() - pd.Timedelta(days=365*10)
df = raw[raw['Date'] >= recent_cutoff].copy()
df = df.set_index('Date')

print(f"Working dataset: {len(df)} observations from {df.index.min()} to {df.index.max()}")

# Calculate returns
print("\nCalculating returns...")
df['Daily_Return'] = df['Close'].pct_change()
df['Weekly_Return'] = df['Close'].pct_change(periods=5)  # Target variable

## Feature Engineering

Creating comprehensive technical indicators:
- Moving Averages (SMA, EMA)
- Volatility metrics
- Momentum indicators
- RSI, MACD
- Bollinger Bands
- Volume indicators

In [None]:
# Feature Engineering - Technical Indicators
print("\nEngineering Technical Indicators:")
print("- Moving Averages (SMA 5, 20, 50)")
df['SMA_5'] = df['Close'].rolling(window=5).mean()
df['SMA_20'] = df['Close'].rolling(window=20).mean()
df['SMA_50'] = df['Close'].rolling(window=50).mean()

print("- Exponential Moving Averages (EMA 12, 26)")
df['EMA_12'] = df['Close'].ewm(span=12, adjust=False).mean()
df['EMA_26'] = df['Close'].ewm(span=26, adjust=False).mean()

print("- Volatility (20-day rolling standard deviation)")
df['Volatility'] = df['Daily_Return'].rolling(window=20).std()

print("- Price momentum indicators")
df['Momentum_5'] = df['Close'] - df['Close'].shift(5)
df['Momentum_20'] = df['Close'] - df['Close'].shift(20)

print("- Relative Strength Index (RSI)")
delta = df['Close'].diff()
gain = (delta.where(delta > 0, 0)).rolling(window=14).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=14).mean()
rs = gain / loss
df['RSI'] = 100 - (100 / (1 + rs))

print("- MACD (Moving Average Convergence Divergence)")
df['MACD'] = df['EMA_12'] - df['EMA_26']

print("- Bollinger Bands")
df['BB_middle'] = df['Close'].rolling(window=20).mean()
df['BB_std'] = df['Close'].rolling(window=20).std()
df['BB_upper'] = df['BB_middle'] + (df['BB_std'] * 2)
df['BB_lower'] = df['BB_middle'] - (df['BB_std'] * 2)
df['BB_width'] = df['BB_upper'] - df['BB_lower']

print("- Volume indicators")
df['Volume_MA_20'] = df['Volume'].rolling(window=20).mean()
df['Volume_Ratio'] = df['Volume'] / df['Volume_MA_20']

print("- Price range indicators")
df['High_Low_Range'] = df['High'] - df['Low']
df['Close_Open_Range'] = df['Close'] - df['Open']

# Lag features
print("- Creating lag features (previous returns)")
for lag in [1, 2, 3, 5, 10]:
    df[f'Return_Lag_{lag}'] = df['Daily_Return'].shift(lag)

# Temporal features
print("- Temporal features (day of week, month)")
df['DayOfWeek'] = df.index.dayofweek
df['Month'] = df.index.month

# Drop NaN values
print(f"\nDropping rows with NaN values from feature engineering...")
print(f"Before: {len(df)} rows")
df = df.dropna()
print(f"After: {len(df)} rows")

print(f"\nFinal dataset ready with {df.shape[1]} features")

## 4. Model Selection & Training

**Why Support Vector Regression?**

1. **Non-linear relationships:** Financial data exhibits complex patterns
2. **Robustness:** Epsilon-insensitive loss handles outliers
3. **Kernel trick:** Can model complex decision boundaries
4. **Regularization:** Prevents overfitting in noisy data

In [None]:
print("\n4. MODEL SELECTION & TRAINING")
print("-" * 80)

print("""
WHY SUPPORT VECTOR REGRESSION (SVR)?

SVR is chosen for stock price prediction because:

1. NON-LINEAR RELATIONSHIPS: Financial data exhibits complex non-linear patterns
   that linear models cannot capture. SVR with RBF kernel can model these patterns.

2. ROBUSTNESS TO OUTLIERS: The epsilon-insensitive loss function means SVR
   is not heavily influenced by extreme price movements or market shocks.

3. KERNEL TRICK: Allows mapping to higher dimensions without explicit computation,
   enabling capture of complex patterns in the feature space.

4. REGULARIZATION: C parameter controls trade-off between margin maximization
   and training error, preventing overfitting in noisy financial data.

5. EFFICIENCY: Sparse solution depends only on support vectors, not all data points.
""")

# Prepare features and target
feature_columns = [col for col in df.columns if col not in ['Weekly_Return', 'Close', 'High', 'Low', 'Open', 'Volume']]
X = df[feature_columns]
y = df['Weekly_Return']

print(f"\nFeatures: {len(feature_columns)} total")
print(f"Target: Weekly_Return")
print(f"Dataset size: {len(X)} observations")

In [None]:
# Train-test split with time series consideration
print("\nSplitting data (80% train, 20% test)...")
split_idx = int(len(X) * 0.8)
X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]

print(f"Training set: {len(X_train)} observations ({X_train.index.min()} to {X_train.index.max()})")
print(f"Test set: {len(X_test)} observations ({X_test.index.min()} to {X_test.index.max()})")

# Standardization
print("\nStandardizing features...")
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
print("✓ Features scaled to zero mean and unit variance")

## Training Multiple SVR Kernels

Comparing Linear, Polynomial, and RBF kernels

In [None]:
# Train multiple SVR models with different kernels
print("\nTraining SVR models with different kernels...")
models = {
    'Linear': SVR(kernel='linear', C=1.0, epsilon=0.01),
    'Polynomial': SVR(kernel='poly', C=1.0, degree=2, epsilon=0.01),
    'RBF': SVR(kernel='rbf', C=1.0, gamma='scale', epsilon=0.01)
}

results = {}

for name, model in models.items():
    print(f"\n{'='*60}")
    print(f"Training {name} Kernel SVR...")
    print(f"{'='*60}")
    
    # Train model
    model.fit(X_train_scaled, y_train)
    
    # Predictions
    y_pred_train = model.predict(X_train_scaled)
    y_pred_test = model.predict(X_test_scaled)
    
    # Metrics
    train_rmse = np.sqrt(mean_squared_error(y_train, y_pred_train))
    test_rmse = np.sqrt(mean_squared_error(y_test, y_pred_test))
    train_mae = mean_absolute_error(y_train, y_pred_train)
    test_mae = mean_absolute_error(y_test, y_pred_test)
    train_r2 = r2_score(y_train, y_pred_train)
    test_r2 = r2_score(y_test, y_pred_test)
    
    # Cross-validation
    tscv = TimeSeriesSplit(n_splits=5)
    cv_scores = -cross_val_score(model, X_train_scaled, y_train, 
                                  cv=tscv, scoring='neg_root_mean_squared_error')
    cv_rmse = cv_scores.mean()
    
    results[name] = {
        'model': model,
        'train_rmse': train_rmse,
        'test_rmse': test_rmse,
        'train_mae': train_mae,
        'test_mae': test_mae,
        'train_r2': train_r2,
        'test_r2': test_r2,
        'cv_rmse': cv_rmse,
        'cv_std': cv_scores.std(),
        'y_pred_train': y_pred_train,
        'y_pred_test': y_pred_test,
        'support_vectors': len(model.support_)
    }
    
    print(f"\n{name} Kernel Results:")
    print(f"  Training RMSE: {train_rmse:.6f}")
    print(f"  Test RMSE: {test_rmse:.6f}")
    print(f"  Training MAE: {train_mae:.6f}")
    print(f"  Test MAE: {test_mae:.6f}")
    print(f"  Training R²: {train_r2:.4f}")
    print(f"  Test R²: {test_r2:.4f}")
    print(f"  CV RMSE: {cv_rmse:.6f} (±{cv_scores.std():.6f})")
    print(f"  Support Vectors: {len(model.support_)}")

print("\n" + "="*60)
print("Model Training Complete!")
print("="*60)

In [None]:
# Model Comparison
comparison_df = pd.DataFrame({
    'Kernel': list(results.keys()),
    'Test_RMSE': [results[k]['test_rmse'] for k in results.keys()],
    'Test_MAE': [results[k]['test_mae'] for k in results.keys()],
    'Test_R2': [results[k]['test_r2'] for k in results.keys()],
    'CV_RMSE': [results[k]['cv_rmse'] for k in results.keys()],
    'Support_Vectors': [results[k]['support_vectors'] for k in results.keys()]
})

print("\nModel Comparison Summary:")
print("="*80)
print(comparison_df.to_string(index=False))

# Select best model
best_model_name = comparison_df.loc[comparison_df['Test_RMSE'].idxmin(), 'Kernel']
best_model = results[best_model_name]['model']
print(f"\n✓ Best Model: {best_model_name} Kernel (Lowest Test RMSE)")

## 5. Model Evaluation & Analysis

In [None]:
print("\n5. MODEL EVALUATION & ANALYSIS")
print("-" * 80)

# Detailed evaluation of best model
y_pred_test = results[best_model_name]['y_pred_test']
residuals = y_test - y_pred_test

# Directional accuracy
directional_accuracy = np.mean((y_test > 0) == (y_pred_test > 0))
print(f"\nDirectional Accuracy: {directional_accuracy * 100:.2f}%")

# Confidence intervals (bootstrap)
n_bootstrap = 1000
bootstrap_rmse = []
for _ in range(n_bootstrap):
    idx = np.random.choice(len(y_test), len(y_test), replace=True)
    bootstrap_rmse.append(np.sqrt(mean_squared_error(y_test.iloc[idx], y_pred_test[idx])))

ci_lower, ci_upper = np.percentile(bootstrap_rmse, [2.5, 97.5])
print(f"\n95% Confidence Interval for RMSE: [{ci_lower:.6f}, {ci_upper:.6f}]")

# Residual analysis
print(f"\nResidual Analysis:")
print(f"  Mean: {residuals.mean():.6f}")
print(f"  Std Dev: {residuals.std():.6f}")
print(f"  Skewness: {stats.skew(residuals):.4f}")
print(f"  Kurtosis: {stats.kurtosis(residuals):.4f}")

# Normality test
statistic, p_value = stats.shapiro(residuals[:5000] if len(residuals) > 5000 else residuals)
print(f"\nShapiro-Wilk Test for Normality:")
print(f"  Statistic: {statistic:.4f}")
print(f"  p-value: {p_value:.4f}")
print(f"  {'Residuals are approximately normal' if p_value > 0.05 else 'Residuals deviate from normality'}")

## 6. Comprehensive Visualizations

In [None]:
print("\n6. VISUALIZATIONS")
print("-" * 80)

# Create comprehensive visualization dashboard
fig = plt.figure(figsize=(20, 24))
gs = fig.add_gridspec(6, 2, hspace=0.3, wspace=0.25)

# 1. Actual vs Predicted (Training)
ax1 = fig.add_subplot(gs[0, 0])
ax1.scatter(results[best_model_name]['y_pred_train'], y_train, alpha=0.3, s=20)
ax1.plot([y_train.min(), y_train.max()], [y_train.min(), y_train.max()], 'r--', lw=2)
ax1.set_title(f'{best_model_name} Kernel - Training Set', fontsize=12, fontweight='bold')
ax1.set_xlabel('Predicted Weekly Return')
ax1.set_ylabel('Actual Weekly Return')
ax1.grid(True, alpha=0.3)
ax1.text(0.05, 0.95, f'R² = {results[best_model_name]["train_r2"]:.4f}', 
         transform=ax1.transAxes, fontsize=11, verticalalignment='top',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# 2. Actual vs Predicted (Test)
ax2 = fig.add_subplot(gs[0, 1])
ax2.scatter(y_pred_test, y_test, alpha=0.5, s=30, c='green')
ax2.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
ax2.set_title(f'{best_model_name} Kernel - Test Set', fontsize=12, fontweight='bold')
ax2.set_xlabel('Predicted Weekly Return')
ax2.set_ylabel('Actual Weekly Return')
ax2.grid(True, alpha=0.3)
ax2.text(0.05, 0.95, f'R² = {results[best_model_name]["test_r2"]:.4f}\nRMSE = {results[best_model_name]["test_rmse"]:.6f}', 
         transform=ax2.transAxes, fontsize=11, verticalalignment='top',
         bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))

# 3. Model Comparison
ax3 = fig.add_subplot(gs[1, 0])
x_pos = np.arange(len(comparison_df))
ax3.bar(x_pos, comparison_df['Test_RMSE'], color=['red' if k == best_model_name else 'gray' for k in comparison_df['Kernel']])
ax3.set_xticks(x_pos)
ax3.set_xticklabels(comparison_df['Kernel'])
ax3.set_ylabel('Test RMSE')
ax3.set_title('Model Comparison - Test RMSE', fontsize=12, fontweight='bold')
ax3.grid(True, alpha=0.3, axis='y')

# 4. R² Comparison
ax4 = fig.add_subplot(gs[1, 1])
ax4.bar(x_pos, comparison_df['Test_R2'], color=['green' if k == best_model_name else 'gray' for k in comparison_df['Kernel']])
ax4.set_xticks(x_pos)
ax4.set_xticklabels(comparison_df['Kernel'])
ax4.set_ylabel('Test R²')
ax4.set_title('Model Comparison - Test R²', fontsize=12, fontweight='bold')
ax4.grid(True, alpha=0.3, axis='y')

# 5. Time Series of Predictions
ax5 = fig.add_subplot(gs[2, :])
test_dates = y_test.index
ax5.plot(test_dates, y_test.values, label='Actual', linewidth=1.5, alpha=0.7)
ax5.plot(test_dates, y_pred_test, label='Predicted', linewidth=1.5, alpha=0.7)
ax5.fill_between(test_dates, y_test.values, y_pred_test, alpha=0.2)
ax5.set_title('Time Series: Actual vs Predicted Weekly Returns', fontsize=12, fontweight='bold')
ax5.set_xlabel('Date')
ax5.set_ylabel('Weekly Return')
ax5.legend(loc='upper right')
ax5.grid(True, alpha=0.3)

# 6. Error Distribution by Time
ax6 = fig.add_subplot(gs[3, :])
ax6.scatter(test_dates, residuals, alpha=0.5, s=20)
ax6.axhline(y=0, color='r', linestyle='--', linewidth=2)
ax6.fill_between(test_dates, -2*residuals.std(), 2*residuals.std(), alpha=0.2, color='gray')
ax6.set_title('Residuals Over Time', fontsize=12, fontweight='bold')
ax6.set_xlabel('Date')
ax6.set_ylabel('Residual (Actual - Predicted)')
ax6.grid(True, alpha=0.3)

# 7. Cumulative Returns
ax7 = fig.add_subplot(gs[4, 0])
actual_cumulative = (1 + y_test).cumprod()
predicted_cumulative = (1 + pd.Series(y_pred_test, index=y_test.index)).cumprod()
ax7.plot(actual_cumulative.index, actual_cumulative.values, label='Actual', linewidth=2)
ax7.plot(predicted_cumulative.index, predicted_cumulative.values, label='Predicted', linewidth=2, alpha=0.7)
ax7.set_title('Cumulative Returns Comparison', fontsize=12, fontweight='bold')
ax7.set_xlabel('Date')
ax7.set_ylabel('Cumulative Return')
ax7.legend()
ax7.grid(True, alpha=0.3)

# 8. Residual Plot
ax8 = fig.add_subplot(gs[4, 1])
ax8.scatter(y_pred_test, residuals, alpha=0.5, s=30)
ax8.axhline(y=0, color='r', linestyle='--', linewidth=2)
ax8.set_title('Residual Plot', fontsize=12, fontweight='bold')
ax8.set_xlabel('Predicted Weekly Return')
ax8.set_ylabel('Residuals')
ax8.grid(True, alpha=0.3)

# 9. Residual Distribution
ax9 = fig.add_subplot(gs[5, 0])
ax9.hist(residuals, bins=30, edgecolor='black', alpha=0.7, density=True)
mu, std = residuals.mean(), residuals.std()
x = np.linspace(residuals.min(), residuals.max(), 100)
ax9.plot(x, stats.norm.pdf(x, mu, std), 'r-', linewidth=2, label='Normal Distribution')
ax9.set_title('Residual Distribution', fontsize=12, fontweight='bold')
ax9.set_xlabel('Residual')
ax9.set_ylabel('Density')
ax9.legend()
ax9.grid(True, alpha=0.3)

# 10. Q-Q Plot
ax10 = fig.add_subplot(gs[5, 1])
stats.probplot(residuals, dist="norm", plot=ax10)
ax10.set_title('Q-Q Plot of Residuals', fontsize=12, fontweight='bold')
ax10.grid(True, alpha=0.3)

plt.suptitle('SVR Stock Price Prediction - Comprehensive Analysis', 
             fontsize=18, fontweight='bold', y=0.995)

plt.savefig('SVR_Analysis_Dashboard.png', dpi=300, bbox_inches='tight')
print("✓ Comprehensive dashboard saved: SVR_Analysis_Dashboard.png")
plt.show()

In [None]:
# Technical Indicators Visualization
fig2, axes = plt.subplots(3, 2, figsize=(16, 12))
fig2.suptitle('Technical Indicators Analysis', fontsize=16, fontweight='bold')

# RSI
axes[0, 0].plot(df.index, df['RSI'], linewidth=1)
axes[0, 0].axhline(y=70, color='r', linestyle='--', alpha=0.5, label='Overbought')
axes[0, 0].axhline(y=30, color='g', linestyle='--', alpha=0.5, label='Oversold')
axes[0, 0].set_title('Relative Strength Index (RSI)')
axes[0, 0].set_ylabel('RSI')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# MACD
axes[0, 1].plot(df.index, df['MACD'], label='MACD', linewidth=1)
axes[0, 1].axhline(y=0, color='black', linestyle='-', alpha=0.5)
axes[0, 1].set_title('MACD')
axes[0, 1].set_ylabel('MACD')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Bollinger Bands
axes[1, 0].plot(df.index, df['Close'], label='Close', linewidth=1)
axes[1, 0].plot(df.index, df['BB_upper'], label='Upper Band', alpha=0.7, linewidth=1)
axes[1, 0].plot(df.index, df['BB_lower'], label='Lower Band', alpha=0.7, linewidth=1)
axes[1, 0].fill_between(df.index, df['BB_lower'], df['BB_upper'], alpha=0.2)
axes[1, 0].set_title('Bollinger Bands')
axes[1, 0].set_ylabel('Price ($)')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Volume
axes[1, 1].bar(df.index, df['Volume'], alpha=0.5, width=1)
axes[1, 1].plot(df.index, df['Volume_MA_20'], color='red', label='20-Day MA', linewidth=2)
axes[1, 1].set_title('Trading Volume')
axes[1, 1].set_ylabel('Volume')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

# Momentum
axes[2, 0].plot(df.index, df['Momentum_5'], label='5-Day', linewidth=1)
axes[2, 0].plot(df.index, df['Momentum_20'], label='20-Day', linewidth=1, alpha=0.7)
axes[2, 0].axhline(y=0, color='black', linestyle='-', alpha=0.5)
axes[2, 0].set_title('Price Momentum')
axes[2, 0].set_ylabel('Momentum ($)')
axes[2, 0].legend()
axes[2, 0].grid(True, alpha=0.3)

# Feature Correlation
correlations = X.corrwith(y).sort_values(ascending=False)
top_10 = correlations.head(10)
y_pos = np.arange(len(top_10))
axes[2, 1].barh(y_pos, top_10.values)
axes[2, 1].set_yticks(y_pos)
axes[2, 1].set_yticklabels(top_10.index, fontsize=8)
axes[2, 1].set_xlabel('Correlation')
axes[2, 1].set_title('Top 10 Features')
axes[2, 1].grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.savefig('Technical_Indicators_Analysis.png', dpi=300, bbox_inches='tight')
print("✓ Technical indicators saved: Technical_Indicators_Analysis.png")
plt.show()

## 7. Summary & Key Insights

### Model Performance
- Best performing kernel selected based on test RMSE
- Comprehensive evaluation with multiple metrics
- Cross-validation ensures robustness

### Key Findings
1. **Model Selection:** Kernel comparison reveals optimal approach
2. **Predictive Power:** R² and RMSE indicate prediction quality
3. **Directional Accuracy:** Critical for trading strategies
4. **Feature Importance:** Technical indicators contribute significantly

### Practical Applications
- Weekly portfolio rebalancing
- Risk assessment through prediction intervals
- Combined with fundamental analysis for decisions

### Limitations
- Past performance ≠ future results
- Market regime changes affect accuracy
- Black swan events not captured
- One input among many in investment decisions

In [None]:
print("\n" + "=" * 80)
print("SUMMARY & KEY INSIGHTS")
print("=" * 80)

print(f"""
MODEL PERFORMANCE SUMMARY:
- Best Model: {best_model_name} Kernel SVR
- Test RMSE: {results[best_model_name]['test_rmse']:.6f}
- Test MAE: {results[best_model_name]['test_mae']:.6f}
- Test R²: {results[best_model_name]['test_r2']:.4f}
- Directional Accuracy: {directional_accuracy * 100:.2f}%
- Cross-Validation RMSE: {results[best_model_name]['cv_rmse']:.6f}

KEY INSIGHTS:
1. Model Selection: {best_model_name} kernel performed best
2. R² of {results[best_model_name]['test_r2']:.4f} indicates {"strong" if results[best_model_name]['test_r2'] > 0.3 else "modest"} predictive power
3. Directional accuracy: {directional_accuracy * 100:.1f}% 
4. Technical indicators provide valuable signal
5. Suitable for portfolio optimization with proper risk management

FUTURE ENHANCEMENTS:
- Hyperparameter tuning (grid search)
- Ensemble methods
- Macroeconomic indicators
- Sentiment analysis
- Multi-output prediction
- Online learning
""")

print("\n" + "=" * 80)
print("ANALYSIS COMPLETE")
print("=" * 80)

In [None]:
# Save model and results
import pickle

with open('best_svr_model.pkl', 'wb') as f:
    pickle.dump({
        'model': best_model,
        'scaler': scaler,
        'feature_columns': feature_columns,
        'results': results,
        'comparison_df': comparison_df
    }, f)

print("✓ Model and results saved to: best_svr_model.pkl")