# PredictiveModelAgent Demo

This notebook demonstrates the PredictiveModelAgent that trains machine learning models for portfolio price prediction.

## Overview

The PredictiveModelAgent:
- Reads engineered features from Unity Catalog
- Creates binary classification labels for price direction
- Trains GBT (Gradient Boosted Tree) classifiers
- Evaluates model performance
- Registers models in Unity Catalog
- Integrates with MLflow for experiment tracking

In [None]:
# Initialize the PredictiveModelAgent
from src.agents.predictive_model_agent import PredictiveModelAgent

# Create agent with Unity Catalog configuration
agent = PredictiveModelAgent(
    catalog="portfolio_catalog",
    schema="portfolio_schema"
)

print("PredictiveModelAgent initialized successfully!")
print(f"Catalog: {agent.catalog}")
print(f"Schema: {agent.schema}")
print(f"Feature columns: {agent.feature_cols}")
print(f"Label column: {agent.label_col}")

In [None]:
# Train a model on sample tickers
tickers = ['AAPL', 'GOOGL', 'MSFT']

print(f"Training predictive model for tickers: {', '.join(tickers)}")

# Train without hyperparameter tuning for faster execution
try:
    training_results = agent.train(
        tickers=tickers,
        hyperparameter_tuning=False
    )
    
    print("\nTraining completed successfully!")
    print(f"Model performance:")
    for metric, value in training_results['metrics'].items():
        print(f"  {metric}: {value:.4f}")
        
except Exception as e:
    print(f"Training failed: {str(e)}")
    print("This is expected if feature tables don't exist yet.")
    print("Run the FeatureEngineeringAgent first to create feature tables.")

In [None]:
# Register the trained model
try:
    model_name = "portfolio_price_predictor_v1"
    registration_info = agent.register_model(
        model_name=model_name,
        description="GBT classifier for predicting next-day price direction"
    )
    
    print(f"Model registered successfully!")
    print(f"Model name: {registration_info['model_name']}")
    print(f"Model version: {registration_info['model_version']}")
    print(f"Model URI: {registration_info['model_uri']}")
    
except Exception as e:
    print(f"Model registration failed: {str(e)}")
    print("This requires a trained model to be available.")

In [None]:
# Example: Load feature data for specific tickers
try:
    # This demonstrates how the agent would load feature data
    print("Feature data loading process:")
    print(f"1. Check for tables: {agent.catalog}.{agent.schema}.features_{{ticker}}")
    print(f"2. Validate required columns: {agent.feature_cols}")
    print(f"3. Filter and combine data from multiple tickers")
    print(f"4. Create labels for binary classification")
    
    # Note: This would actually load data if tables exist:
    # feature_data = agent.read_feature_data(['AAPL', 'GOOGL'])
    # labeled_data = agent.create_labels(feature_data)
    
except Exception as e:
    print(f"Feature loading demonstration: {str(e)}")

## Key Features

### 1. Feature Integration
- Reads from Unity Catalog tables created by FeatureEngineeringAgent
- Validates feature availability and data quality
- Handles multiple ticker symbols seamlessly

### 2. Label Creation
- Creates binary classification labels (up/down price direction)
- Uses time-based splitting for realistic evaluation
- Handles missing data and edge cases

### 3. Model Training
- Gradient Boosted Tree (GBT) classifier
- Feature scaling and vector assembly
- Optional hyperparameter tuning with cross-validation
- Comprehensive performance metrics

### 4. MLflow Integration
- Experiment tracking and logging
- Model artifact storage
- Parameter and metric logging
- Model versioning and registry

### 5. Unity Catalog Support
- Model registration in Unity Catalog
- Versioned model artifacts
- Governance and lineage tracking

## Next Steps

1. **Feature Engineering**: Run the FeatureEngineeringAgent to create feature tables
2. **Data Validation**: Ensure sufficient historical data for training
3. **Model Training**: Execute the training pipeline with your ticker symbols
4. **Performance Evaluation**: Review metrics and adjust parameters
5. **Model Deployment**: Register best models for production use

## Configuration

The agent can be configured for different environments:
- **Development**: `catalog="main"`, `schema="finance"`
- **Staging**: `catalog="portfolio_catalog"`, `schema="staging_schema"`
- **Production**: `catalog="portfolio_catalog"`, `schema="portfolio_schema"`