# SVR Model - Cloud Resource Forecasting

---

## Objectives

1. **Hyperparameter Tuning**: Grid search for optimal SVR parameters
2. **Training**: Fit SVR models with high correlation features
3. **Forecasting**: Multi-step ahead prediction (10 minutes = 20 steps)
4. **Evaluation**: Calculate MAE, RMSE, MAPE, R² metrics
5. **Comparison**: Save results for model comparison

---

**Dataset Info:**
- Time interval: 30 seconds
- Forecast horizon: 10 minutes (20 steps)
- Models: 3 (memory_usage_pct, cpu_total_usage, system_load)
- Method: SVR (Support Vector Regression) with RBF kernel
- Feature selection: High correlation features from ETL


## 1. Import Libraries


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import warnings
from datetime import datetime
import time

# Machine Learning
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler

# Import model utilities
from model_utils import (
    save_model,
    load_model,
    calculate_metrics,
    print_metrics,
    save_results,
    create_models_directory
)

warnings.filterwarnings('ignore')

# Display settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)

# Create models directory
create_models_directory()

print("✓ Libraries imported")
print("✓ Model utilities loaded")
print(f"Analysis started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")


✓ Models directory ready: models
✓ Libraries imported
✓ Model utilities loaded
Analysis started: 2025-11-11 16:21:23


## 2. Load Processed Data


In [2]:
# Load feature metadata
with open('processed_data/feature_metadata.json', 'r') as f:
    feature_metadata = json.load(f)

print("Feature Metadata:")
print("="*80)
for target, info in feature_metadata.items():
    print(f"\n{target}:")
    print(f"  Features: {info['n_features']}")
    print(f"  List: {info['features']}")

# Target variables
target_vars = ['memory_usage_pct', 'cpu_total_usage', 'system_load']

print("\n" + "="*80)
print("✓ Metadata loaded")


Feature Metadata:

memory_usage_pct:
  Features: 7
  List: ['load-15m', 'sys-context-switch-rate', 'cpu-system', 'cpu-user', 'load-5m', 'sys-mem-buffered', 'sys-mem-free']

cpu_total_usage:
  Features: 7
  List: ['sys-context-switch-rate', 'sys-fork-rate', 'sys-interrupt-rate', 'load-15m', 'load-5m', 'disk-io-write', 'sys-mem-available']

system_load:
  Features: 2
  List: ['load-5m', 'load-15m']

✓ Metadata loaded


In [3]:
# Load train/test datasets
datasets = {}

for target in target_vars:
    print(f"\nLoading {target}...")
    
    X_train = pd.read_csv(f'processed_data/{target}/X_train.csv')
    X_test = pd.read_csv(f'processed_data/{target}/X_test.csv')
    y_train = pd.read_csv(f'processed_data/{target}/y_train.csv').squeeze()
    y_test = pd.read_csv(f'processed_data/{target}/y_test.csv').squeeze()
    
    datasets[target] = {
        'X_train': X_train,
        'X_test': X_test,
        'y_train': y_train,
        'y_test': y_test,
        'features': feature_metadata[target]['features']
    }
    
    print(f"  X_train: {X_train.shape}")
    print(f"  X_test: {X_test.shape}")
    print(f"  y_train: {len(y_train):,} samples")
    print(f"  y_test: {len(y_test):,} samples")

print("\n" + "="*80)
print("✓ All datasets loaded")
print("="*80)



Loading memory_usage_pct...
  X_train: (68599, 7)
  X_test: (17150, 7)
  y_train: 68,599 samples
  y_test: 17,150 samples

Loading cpu_total_usage...
  X_train: (68599, 7)
  X_test: (17150, 7)
  y_train: 68,599 samples
  y_test: 17,150 samples

Loading system_load...
  X_train: (68599, 2)
  X_test: (17150, 2)
  y_train: 68,599 samples
  y_test: 17,150 samples

✓ All datasets loaded


## 3. SVR Configuration & Hyperparameter Tuning

SVR hyperparameters to tune:
- **C**: Regularization parameter
- **gamma**: Kernel coefficient  
- **epsilon**: Epsilon-tube tolerance
- **kernel**: RBF (Radial Basis Function)


In [4]:
# SVR hyperparameter grid
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 0.001, 0.01, 0.1],
    'epsilon': [0.01, 0.1, 0.2]
}

# Forecast horizon
FORECAST_HORIZON = 20  # 10 minutes

# Grid search configuration
GRID_SEARCH = True  # Set to False to use default parameters
N_JOBS = -1  # Use all CPU cores

print("SVR Configuration:")
print("="*80)
print(f"Kernel: RBF")
print(f"Parameter grid:")
for param, values in param_grid.items():
    print(f"  {param}: {values}")
print(f"\nGrid search enabled: {GRID_SEARCH}")
print(f"Forecast horizon: {FORECAST_HORIZON} steps (10 minutes)")
print("="*80)


SVR Configuration:
Kernel: RBF
Parameter grid:
  C: [0.1, 1, 10]
  gamma: ['scale', 0.001, 0.01, 0.1]
  epsilon: [0.01, 0.1, 0.2]

Grid search enabled: True
Forecast horizon: 20 steps (10 minutes)


## 4. Train SVR Models


In [5]:
# Train SVR models with optional grid search
svr_models = {}
training_results = {}

print("="*80)
print("TRAINING SVR MODELS")
print("="*80)

for target in target_vars:
    print(f"\n{'='*80}")
    print(f"Target: {target}")
    print(f"{'='*80}")
    
    X_train = datasets[target]['X_train']
    y_train = datasets[target]['y_train']
    
    print(f"Training samples: {len(y_train):,}")
    print(f"Features: {len(X_train.columns)}")
    
    start_time = time.time()
    
    try:
        if GRID_SEARCH:
            print("\nPerforming grid search...")
            
            # Grid search with cross-validation
            svr = SVR(kernel='rbf')
            grid_search = GridSearchCV(
                svr,
                param_grid,
                cv=3,
                scoring='neg_mean_squared_error',
                n_jobs=N_JOBS,
                verbose=1
            )
            
            grid_search.fit(X_train, y_train)
            
            best_model = grid_search.best_estimator_
            best_params = grid_search.best_params_
            best_score = -grid_search.best_score_  # Convert back to positive MSE
            
            print(f"\n✓ Grid search completed")
            print(f"  Best parameters: {best_params}")
            print(f"  Best CV MSE: {best_score:.6f}")
            
        else:
            print("\nTraining with default parameters...")
            best_model = SVR(kernel='rbf', C=1.0, gamma='scale', epsilon=0.1)
            best_model.fit(X_train, y_train)
            best_params = {'C': 1.0, 'gamma': 'scale', 'epsilon': 0.1}
            best_score = None
        
        training_time = time.time() - start_time
        
        # Store model
        svr_models[target] = best_model
        
        # Save model using utility
        print("\nSaving model...")
        model_path = save_model(
            best_model,
            model_name='svr',
            target=target,
            config={
                'kernel': 'rbf',
                'params': best_params,
                'n_features': len(X_train.columns),
                'features': list(X_train.columns)
            },
            models_dir='models'
        )
        
        training_results[target] = {
            'params': best_params,
            'n_features': len(X_train.columns),
            'n_samples': len(y_train),
            'training_time': training_time,
            'cv_mse': best_score,
            'model_path': model_path
        }
        
        print(f"✓ Training completed in {training_time:.2f}s")
        
    except Exception as e:
        print(f"✗ Training failed: {str(e)}")
        training_results[target] = {'error': str(e), 'success': False}

print("\n" + "="*80)
print("✓ Training completed and models saved")
print("="*80)


TRAINING SVR MODELS

Target: memory_usage_pct
Training samples: 68,599
Features: 7

Performing grid search...
Fitting 3 folds for each of 36 candidates, totalling 108 fits

✓ Grid search completed
  Best parameters: {'C': 1, 'epsilon': 0.2, 'gamma': 0.01}
  Best CV MSE: 0.428956

Saving model...
✓ Model saved: models\svr_memory_usage_pct_20251111_184735.pkl
✓ Training completed in 8771.26s

Target: cpu_total_usage
Training samples: 68,599
Features: 7

Performing grid search...
Fitting 3 folds for each of 36 candidates, totalling 108 fits

✓ Grid search completed
  Best parameters: {'C': 10, 'epsilon': 0.2, 'gamma': 'scale'}
  Best CV MSE: 0.333537

Saving model...
✓ Model saved: models\svr_cpu_total_usage_20251111_203645.pkl
✓ Training completed in 6549.95s

Target: system_load
Training samples: 68,599
Features: 2

Performing grid search...
Fitting 3 folds for each of 36 candidates, totalling 108 fits

✓ Grid search completed
  Best parameters: {'C': 10, 'epsilon': 0.1, 'gamma': 'scale

In [6]:
def svr_rolling_forecast(model, X_test, y_test, horizon=20):
    """
    Rolling forecast for SVR
    Predict 'horizon' steps ahead at each time point using actual features
    """
    n_test = len(y_test)
    predictions = []
    
    # Can only forecast where we have enough future data
    n_forecast_points = n_test - horizon + 1
    
    print(f"  Forecasting {n_forecast_points} points with horizon={horizon}")
    
    for i in range(n_forecast_points):
        # Use features at time t to predict value at time t+horizon
        X_current = X_test.iloc[i:i+1]
        pred = model.predict(X_current)[0]
        predictions.append(pred)
        
        if (i + 1) % 5000 == 0:
            print(f"    Progress: {i+1}/{n_forecast_points}")
    
    # Align actual values (horizon steps ahead)
    predictions = np.array(predictions)
    actual = y_test.iloc[horizon:horizon+n_forecast_points].values
    
    return predictions, actual

# Perform forecasting
print("="*80)
print(f"MULTI-STEP FORECASTING (Horizon: {FORECAST_HORIZON} steps = 10 minutes)")
print("="*80)

forecast_results = {}

for target in target_vars:
    if target not in svr_models:
        print(f"\n✗ Skipping {target} - model not trained")
        continue
    
    print(f"\n{'='*80}")
    print(f"Target: {target}")
    print(f"{'='*80}")
    
    model = svr_models[target]
    X_test = datasets[target]['X_test']
    y_test = datasets[target]['y_test']
    
    start_time = time.time()
    
    try:
        predictions, actual = svr_rolling_forecast(model, X_test, y_test, FORECAST_HORIZON)
        
        forecast_time = time.time() - start_time
        
        forecast_results[target] = {
            'predictions': predictions,
            'actual': actual,
            'n_predictions': len(predictions),
            'forecast_time': forecast_time,
            'horizon': FORECAST_HORIZON
        }
        
        print(f"✓ Completed in {forecast_time:.2f}s")
        print(f"  Predictions: {len(predictions):,}")
        print(f"  Avg time: {forecast_time/len(predictions)*1000:.2f}ms per forecast")
        
    except Exception as e:
        print(f"✗ Forecasting failed: {str(e)}")
        forecast_results[target] = {'error': str(e), 'success': False}

print("\n" + "="*80)
print("✓ Forecasting completed")
print("="*80)


MULTI-STEP FORECASTING (Horizon: 20 steps = 10 minutes)

Target: memory_usage_pct
  Forecasting 17131 points with horizon=20
    Progress: 5000/17131
    Progress: 10000/17131
    Progress: 15000/17131
✓ Completed in 160.02s
  Predictions: 17,131
  Avg time: 9.34ms per forecast

Target: cpu_total_usage
  Forecasting 17131 points with horizon=20
    Progress: 5000/17131
    Progress: 10000/17131
    Progress: 15000/17131
✓ Completed in 80.87s
  Predictions: 17,131
  Avg time: 4.72ms per forecast

Target: system_load
  Forecasting 17131 points with horizon=20
    Progress: 5000/17131
    Progress: 10000/17131
    Progress: 15000/17131
✓ Completed in 153.86s
  Predictions: 17,131
  Avg time: 8.98ms per forecast

✓ Forecasting completed


In [7]:
# Calculate metrics using utility function
print("="*80)
print("EVALUATION METRICS")
print("="*80)

evaluation_results = {}

for target in target_vars:
    if target not in forecast_results or 'predictions' not in forecast_results[target]:
        print(f"\n✗ Skipping {target} - no predictions")
        continue
    
    print(f"\n{'='*80}")
    print(f"Target: {target}")
    print(f"{'='*80}")
    
    y_true = forecast_results[target]['actual']
    y_pred = forecast_results[target]['predictions']
    
    # Calculate metrics
    metrics = calculate_metrics(y_true, y_pred)
    evaluation_results[target] = metrics
    
    # Print formatted metrics
    print_metrics(metrics, target)

print("\n" + "="*80)
print("✓ Evaluation completed")
print("="*80)


EVALUATION METRICS

Target: memory_usage_pct


ValueError: Found input variables with inconsistent numbers of samples: [17130, 17131]

## 7. Visualization


In [None]:
# Plot predictions vs actual
fig, axes = plt.subplots(3, 1, figsize=(15, 12))
fig.suptitle('SVR: Predictions vs Actual (10-minute horizon)', fontsize=16, fontweight='bold')

for idx, target in enumerate(target_vars):
    if target not in forecast_results or 'predictions' not in forecast_results[target]:
        continue
    
    y_true = forecast_results[target]['actual']
    y_pred = forecast_results[target]['predictions']
    
    # Plot first 500 points
    n_plot = min(500, len(y_true))
    
    axes[idx].plot(y_true[:n_plot], label='Actual', alpha=0.7, linewidth=1.5)
    axes[idx].plot(y_pred[:n_plot], label='Predicted', alpha=0.7, linewidth=1.5)
    axes[idx].set_title(f'{target} - MAE: {evaluation_results[target]["mae"]:.4f}, R²: {evaluation_results[target]["r2"]:.4f}')
    axes[idx].set_xlabel('Time Step')
    axes[idx].set_ylabel('Normalized Value')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


In [None]:
# Scatter plots: Predicted vs Actual
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
fig.suptitle('SVR: Predicted vs Actual', fontsize=16, fontweight='bold')

for idx, target in enumerate(target_vars):
    if target not in forecast_results or 'predictions' not in forecast_results[target]:
        continue
    
    y_true = forecast_results[target]['actual']
    y_pred = forecast_results[target]['predictions']
    
    axes[idx].scatter(y_true, y_pred, alpha=0.3, s=10)
    
    # Perfect prediction line
    min_val = min(y_true.min(), y_pred.min())
    max_val = max(y_true.max(), y_pred.max())
    axes[idx].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2, label='Perfect')
    
    axes[idx].set_title(f'{target}\\nR² = {evaluation_results[target]["r2"]:.4f}')
    axes[idx].set_xlabel('Actual')
    axes[idx].set_ylabel('Predicted')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


In [None]:
# Metrics comparison
metrics_df = pd.DataFrame(evaluation_results).T

fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('SVR Performance Metrics', fontsize=16, fontweight='bold')

metrics_df['mae'].plot(kind='bar', ax=axes[0, 0], color='skyblue')
axes[0, 0].set_title('MAE')
axes[0, 0].set_ylabel('MAE')
axes[0, 0].grid(True, alpha=0.3)

metrics_df['rmse'].plot(kind='bar', ax=axes[0, 1], color='lightcoral')
axes[0, 1].set_title('RMSE')
axes[0, 1].set_ylabel('RMSE')
axes[0, 1].grid(True, alpha=0.3)

metrics_df['mape'].plot(kind='bar', ax=axes[1, 0], color='lightgreen')
axes[1, 0].set_title('MAPE')
axes[1, 0].set_ylabel('MAPE (%)')
axes[1, 0].grid(True, alpha=0.3)

metrics_df['r2'].plot(kind='bar', ax=axes[1, 1], color='plum')
axes[1, 1].set_title('R² Score')
axes[1, 1].set_ylabel('R²')
axes[1, 1].axhline(y=0, color='r', linestyle='--', linewidth=1)
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


## 8. Save Results


In [None]:
# Compile results
final_results = {
    'model': 'SVR',
    'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'forecast_horizon': FORECAST_HORIZON,
    'forecast_horizon_minutes': FORECAST_HORIZON * 0.5,
    'targets': {}
}

for target in target_vars:
    if target not in evaluation_results:
        continue
    
    final_results['targets'][target] = {
        'model_config': {
            'kernel': 'rbf',
            'params': training_results[target]['params'],
            'n_features': training_results[target]['n_features']
        },
        'training': {
            'samples': training_results[target]['n_samples'],
            'time_seconds': training_results[target]['training_time'],
            'cv_mse': training_results[target].get('cv_mse'),
            'model_path': training_results[target]['model_path']
        },
        'forecasting': {
            'n_predictions': forecast_results[target]['n_predictions'],
            'time_seconds': forecast_results[target]['forecast_time']
        },
        'metrics': evaluation_results[target]
    }

# Save using utility function
output_file = save_results(final_results, 'results_svr.json')

print("="*80)
print("RESULTS SUMMARY")
print("="*80)
print(f"Model: SVR")
print(f"Forecast horizon: {FORECAST_HORIZON} steps ({FORECAST_HORIZON*0.5:.1f} min)")
print(f"Results saved to: {output_file}")
print(f"Models saved in: models/")

print(f"\nMetrics Overview:")
for target in target_vars:
    if target in final_results['targets']:
        metrics = final_results['targets'][target]['metrics']
        print(f"\n  {target}:")
        print(f"    MAE:  {metrics['mae']:.6f}")
        print(f"    RMSE: {metrics['rmse']:.6f}")
        print(f"    R²:   {metrics['r2']:.6f}")

print("="*80)


## 9. Test with Different Horizons


In [None]:
# Test with multiple horizons
HORIZONS_TO_TEST = [60]  # 5, 10, 20, 30 minutes

print("="*80)
print("TESTING MULTIPLE HORIZONS")
print("="*80)
print(f"Horizons: {HORIZONS_TO_TEST} steps")
print(f"Minutes: {[h*0.5 for h in HORIZONS_TO_TEST]}")
print()

horizon_comparison = {}

for target in target_vars:
    if target not in svr_models:
        continue
    
    print(f"\n{'='*80}")
    print(f"Target: {target}")
    print(f"{'='*80}")
    
    model = svr_models[target]
    X_test = datasets[target]['X_test']
    y_test = datasets[target]['y_test']
    
    horizon_comparison[target] = {}
    
    for horizon in HORIZONS_TO_TEST:
        print(f"  Testing horizon: {horizon} steps ({horizon*0.5:.1f} min)... ", end='')
        
        try:
            # Forecast
            predictions, actual = svr_rolling_forecast(model, X_test, y_test, horizon)
            
            # Metrics
            metrics = calculate_metrics(actual, predictions)
            
            horizon_comparison[target][f"h{horizon}"] = {
                'horizon': horizon,
                'horizon_minutes': horizon * 0.5,
                'n_predictions': len(predictions),
                'metrics': metrics
            }
            
            print(f"MAE={metrics['mae']:.4f}, R²={metrics['r2']:.4f}")
            
        except Exception as e:
            print(f"Failed: {str(e)}")

print("\n" + "="*80)
print("✓ Horizon testing completed")
print("="*80)


In [None]:
# Comparison table
print("\nHORIZON COMPARISON:")
print("="*80)

for target in target_vars:
    if target not in horizon_comparison:
        continue
    
    print(f"\n{target.upper()}:")
    print(f"{'Horizon':>10} {'Minutes':>10} {'MAE':>12} {'RMSE':>12} {'R²':>12}")
    print("-" * 80)
    
    for h_key in sorted(horizon_comparison[target].keys(),
                       key=lambda x: horizon_comparison[target][x]['horizon']):
        h = horizon_comparison[target][h_key]
        print(f"{h['horizon']:>10} {h['horizon_minutes']:>10.1f} "
              f"{h['metrics']['mae']:>12.6f} {h['metrics']['rmse']:>12.6f} "
              f"{h['metrics']['r2']:>12.6f}")


In [None]:
# Visualize performance vs horizon
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
fig.suptitle('SVR: Performance vs Forecast Horizon', fontsize=16, fontweight='bold')

metrics_to_plot = ['mae', 'rmse', 'mape', 'r2']
titles = ['MAE', 'RMSE', 'MAPE (%)', 'R²']
colors_list = ['skyblue', 'lightcoral', 'lightgreen']

for idx, (metric, title) in enumerate(zip(metrics_to_plot, titles)):
    ax = axes[idx // 2, idx % 2]
    
    for cidx, target in enumerate(target_vars):
        if target not in horizon_comparison:
            continue
        
        horizons = []
        values = []
        
        for h_key in sorted(horizon_comparison[target].keys(),
                           key=lambda x: horizon_comparison[target][x]['horizon']):
            h = horizon_comparison[target][h_key]
            horizons.append(h['horizon_minutes'])
            values.append(h['metrics'][metric])
        
        ax.plot(horizons, values, marker='o', linewidth=2,
               label=target, alpha=0.7, color=colors_list[cidx])
    
    ax.set_title(title)
    ax.set_xlabel('Horizon (minutes)')
    ax.set_ylabel(title)
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    if metric == 'r2':
        ax.axhline(y=0, color='r', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()


In [None]:
# Save horizon comparison
horizon_results = {
    'model': 'SVR',
    'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'horizons_tested': HORIZONS_TO_TEST,
    'targets': horizon_comparison
}

horizon_file = save_results(horizon_results, 'results_svr_horizon_comparison.json')

print("\nBest horizon by MAE:")
for target in target_vars:
    if target in horizon_comparison:
        best = min(horizon_comparison[target].items(),
                  key=lambda x: x[1]['metrics']['mae'])
        print(f"  {target}: {best[1]['horizon']} steps ({best[1]['horizon_minutes']:.1f} min) "
              f"- MAE: {best[1]['metrics']['mae']:.6f}")

print(f"\n✓ Horizon comparison saved to: {horizon_file}")


## Summary

### Completed:

1. ✅ **Data Loading**: Loaded preprocessed train/test data
2. ✅ **Hyperparameter Tuning**: Grid search for optimal SVR parameters
3. ✅ **Model Training**: SVR with RBF kernel
4. ✅ **Model Saving**: Saved to `models/` directory
5. ✅ **Forecasting**: 10-minute ahead predictions (20 steps)
6. ✅ **Evaluation**: MAE, RMSE, MAPE, R² metrics
7. ✅ **Multi-Horizon Testing**: Tested with 5, 10, 20, 30 minute horizons
8. ✅ **Visualization**: Performance comparisons
9. ✅ **Results Saved**: JSON files for comparison

### Output Files:

- `models/svr_[target]_[timestamp].pkl`: Trained SVR models
- `results_svr.json`: Main results (20-step horizon)
- `results_svr_horizon_comparison.json`: Multi-horizon comparison

### How to Load and Test:

```python
from model_utils import load_model, calculate_metrics

# Load saved SVR model
model, metadata = load_model('models/svr_memory_usage_pct_xxx.pkl')

# Predict with custom horizon
predictions, actual = svr_rolling_forecast(model, X_test, y_test, horizon=120)

# Evaluate
metrics = calculate_metrics(actual, predictions)
print(f"R²: {metrics['r2']:.4f}")
```

### Model Comparison:

Now you can compare SVR vs ARIMAX:
```python
from model_utils import load_results, compare_results

comparison = compare_results(['results_arimax.json', 'results_svr.json'])
print(comparison)
```

### Next Steps:

- Compare SVR vs ARIMAX performance
- Try other kernels (linear, polynomial)
- Implement LSTM or Prophet models
- Ensemble methods

---

**All SVR models and results saved successfully!**
