# üèÜ Champion Model Training

## Overview
This section covers the **initial champion model training** phase of our Champion-Challenger MLOps framework. The champion model serves as the baseline production model that all future challenger models will compete against.

## Process
1. **Model Development**: Train and validate multiple model candidates using historical data
2. **Model Selection**: Compare performance metrics (AUC, precision, recall) to identify the best performer
3. **Champion Registration**: Deploy the selected model to Snowflake ML Registry with the `CHAMPION` alias
4. **Production Deployment**: The champion model becomes the active production model serving predictions

*üìù Note: This is a one-time setup. Subsequent model updates will be handled automatically by the challenger training pipeline.*

In [None]:
# Import python packages
from sklearn import pipeline, preprocessing, ensemble, metrics
from snowflake.snowpark.context import get_active_session
from snowflake.ml.registry import Registry
from snowflake.ml.model import type_hints
import warnings
warnings.filterwarnings('ignore')

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()

In [None]:
# Set up the database, schema and model registry
database_name='DEV_AUTOMATION_DEMO'
schema_name='CHAMPION_CHALLENGER'
session.use_database(database_name)
session.use_schema(schema_name)  

# Initialize registry
registry = Registry(session=session, 
                           database_name=database_name, 
                           schema_name=schema_name)  
print(f"‚úÖ Environment ready: {database_name}.{schema_name}")

# üìä Load Dataset

## Data Source
The synthetic credit approval dataset has been pre-generated and saved to the **`full_data`** table for consistent use across experiments. This ensures reproducibility and eliminates the need to regenerate the dataset for each model training run.


In [None]:
full_dataset = session.table('full_data').to_pandas()

### üìã **Detailed Breakdown Of Data Split For Champion:**

1. **TRAIN Period**: Weeks 0-9 (10 weeks) = 4,000 samples
2. **TEST Period**: Weeks 10-12 (3 weeks) = 1,200 samples  

In [None]:
def create_time_splits(full_dataset, train_weeks=10, test_weeks=3):
        """
        Create proper time-based train/test splits
        This ensures no data leakage and realistic business scenario
        """
        print(f"‚úÇÔ∏è Creating time-based splits: {train_weeks}w train, {test_weeks}w test")
        
        # Initial training data (first 10 weeks)
        train_mask = full_dataset['week_number'] < train_weeks
        train_data = full_dataset[train_mask].copy()
        
        # Test data (weeks 10-12)
        test_mask = (full_dataset['week_number'] >= train_weeks) & \
                   (full_dataset['week_number'] < train_weeks + test_weeks)
        test_data = full_dataset[test_mask].copy()
        
        # Feature columns (exclude date, week, and target)
        feature_cols = [col for col in full_dataset.columns 
                           if col not in ['application_date', 'week_number', 'approved']]
        
        print(f"   üìä Train: {len(train_data):,} samples (weeks 0-{train_weeks-1})")
        print(f"   üìä Test: {len(test_data):,} samples (weeks {train_weeks}-{train_weeks+test_weeks-1})")  
        
        return train_data, test_data, feature_cols
    
# Get the data
train_data, test_data, feature_cols = create_time_splits(full_dataset)

In [None]:
def train_champion_model(train_data, test_data):
        """Train the initial Champion model"""
        print("üèÜ Training Initial Champion Model...")
        
        X_train = train_data[feature_cols]
        y_train = train_data['approved']
        
        # Create champion pipeline
        champion_pipeline = pipeline.Pipeline([
            ('scaler', preprocessing.StandardScaler()),
            ('classifier', ensemble.RandomForestClassifier(
                n_estimators=100,
                random_state=42,
                max_depth=10,
                min_samples_split=5,
                class_weight='balanced'
            ))
        ])
        
        # Train champion
        champion_pipeline.fit(X_train, y_train)
        
        # Evaluate on validation set
        X_test = test_data[feature_cols]
        y_test = test_data['approved']
        
        champion_pred_proba = champion_pipeline.predict_proba(X_test)[:, 1]
        champion_auc = round(metrics.roc_auc_score(y_test, champion_pred_proba) * 100.0, 2)
        
        print(f"   ‚úÖ Champion trained successfully")
        print(f"   üìà Test data AUC: {champion_auc:.2f}")
        
        # Register champion in model registry
        sample_input = X_train.head(100)
        model_name="CREDIT_APPROVAL"

        #Log the model into Snowflake model registry
        champion_ref = registry.log_model(
            model=champion_pipeline,
            model_name=model_name,
            sample_input_data=sample_input,
            target_platforms=["WAREHOUSE", "SNOWPARK_CONTAINER_SERVICES"],
            comment=f"Champion model trained on weeks 0-9, AUC: {champion_auc:.2f}",
            metrics={
                "test_auc": champion_auc,
                "train_weeks": "0-9",
                "model_type": "champion",
                "training_samples": len(X_train)
            },
            task=type_hints.Task.TABULAR_BINARY_CLASSIFICATION
        )

        try:
            champion_ref.unset_alias("CHAMPION")
        except:
            pass
        # Set as CHAMPION alias
        champion_ref.set_alias("CHAMPION")
        model = registry.get_model(model_name)
        model.set_tag("LIVE_VERSION", champion_ref.version_name)
        
        print(f"   üè∑Ô∏è Champion registered: {champion_ref.version_name}")
        print(f"   üè∑Ô∏è Aliases: CHAMPION")
        
        return

In [None]:
train_champion_model(train_data, test_data)

# Inference

In [None]:
#Get the model
model = registry.get_model("CREDIT_APPROVAL")
live_version = model.get_tag("live_version")
print(f'Current live model version name in Prod is ',live_version)

#Run prediction function
remote_prediction = model.version(live_version).run(test_data, function_name="predict")
remote_prediction = remote_prediction.rename(columns={'output_feature_0': 'predicted_values'})
remote_prediction.head()