# Champion-Challenger MLOps Framework for Snowflake

This repository implements a complete **Champion-Challenger model deployment strategy** using Snowflake ML Registry. It's designed for production-grade machine learning operations with automated model retraining, evaluation, and promotion.

## ðŸŽ¯ What is Champion-Challenger Modeling?

Champion-Challenger is a production MLOps pattern where:
- **Champion**: The current production model serving predictions
- **Challenger**: A newly trained model that competes with the champion
- **Evaluation**: Both models are tested on the same holdout dataset
- **Promotion**: If the challenger performs significantly better, it becomes the new champion

In [None]:
# Import python packages
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from snowflake.snowpark.context import get_active_session
import warnings
warnings.filterwarnings('ignore')

# We can also use Snowpark for our analyses!
from snowflake.snowpark.context import get_active_session
session = get_active_session()


In [None]:
database_name='DEV_AUTOMATION_DEMO'
schema_name='CHAMPION_CHALLENGER'
def setup_environment(database_name,schema_name):
    """Set up Snowflake database, schema and model registry"""
    print("ðŸš€ Setting up Champion-Challenger Environment...")
    
    # Create database and schema
    session.sql(f"CREATE DATABASE IF NOT EXISTS {database_name}").collect()
    session.sql(f"CREATE SCHEMA IF NOT EXISTS {database_name}.{schema_name}").collect()
    session.use_database(database_name)
    session.use_schema(schema_name)

    # Create two tags for tracking live and challenger version names of models.
    session.sql("CREATE TAG IF NOT EXISTS live_version COMMENT = 'live version identification tag'")
    session.sql("CREATE TAG IF NOT EXISTS Challenger_version COMMENT = 'Challenger version identification tag'")

    print(f"âœ… Environment ready: {database_name}.{schema_name}")
    
setup_environment(database_name, schema_name)

## ðŸ“Š Generate Synthetic Data Timeline (20 weeks total)
Create a synthetic credit approval dataset with 8,000 loan applications spanning 20 weeks (400 applications per week). Each application contains 9 realistic financial features: applicant age, annual income, credit score, debt-to-income ratio, employment years, credit cards count, mortgage status, education score, and location risk. The target variable indicates loan approval (approved/denied) based on weighted business logic combining creditworthiness factors.

```
Timeline: Week 0 â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â–º Week 19

Dataset: [====================FULL DATASET====================]
         Week 0                                           Week 19
         Jan 1                                           May 15
```

In [None]:
def create_temporal_dataset(n_weeks=20, samples_per_week=400):
        """
        Create a time-based dataset simulating real business data arrival
        This mimics credit applications arriving weekly with concept drift
        """
        print(f"ðŸ“Š Creating temporal dataset: {n_weeks} weeks, {samples_per_week} samples/week")
        
        np.random.seed(42)
        
        all_data = []
        base_date = datetime(2024, 1, 1)
        
        for week in range(n_weeks):
            week_date = base_date + timedelta(weeks=week)
            
            # Simulate concept drift: fraud patterns change over time
            drift_factor = 1 + (week * 0.02)  # 2% change per week
            seasonal_factor = 1 + 0.3 * np.sin(2 * np.pi * week / 52)  # Annual seasonality
            
            # Generate weekly data
            week_data = {
                'application_date': [week_date + timedelta(days=np.random.randint(0, 7)) 
                                   for _ in range(samples_per_week)],
                'week_number': [week] * samples_per_week,
                'applicant_age': np.random.uniform(18, 80, samples_per_week),
                'annual_income': np.random.uniform(20000, 200000, samples_per_week) * seasonal_factor,
                'credit_score': np.random.uniform(300, 850, samples_per_week),
                'debt_to_income': np.random.uniform(0, 1, samples_per_week) * drift_factor,
                'years_employed': np.random.uniform(0, 40, samples_per_week),
                'num_credit_cards': np.random.randint(0, 8, samples_per_week),
                'has_mortgage': np.random.choice([0, 1], samples_per_week, p=[0.6, 0.4]),
                'education_score': np.random.uniform(1, 5, samples_per_week),
                'location_risk_score': np.random.uniform(0.1, 1.0, samples_per_week) * drift_factor,
            }
            
            # Create realistic target with temporal dependencies
            approval_prob = (
                (week_data['credit_score'] - 300) / 550 * 0.3 +
                (week_data['annual_income'] - 20000) / 180000 * 0.25 +
                (1 - week_data['debt_to_income']) * 0.2 +
                (week_data['years_employed'] / 40) * 0.15 +
                (1 - week_data['location_risk_score']) * 0.1 +
                np.random.normal(0, 0.05, samples_per_week)
            )
            
            approval_prob = np.clip(approval_prob, 0, 1)
            week_data['approved'] = np.random.binomial(1, approval_prob)
            
            week_df = pd.DataFrame(week_data)
            all_data.append(week_df)
        
        full_dataset = pd.concat(all_data, ignore_index=True)
        full_dataset = full_dataset.sort_values('application_date').reset_index(drop=True)
        
        print(f"âœ… Dataset created: {len(full_dataset):,} total applications")
        print(f"   ðŸ“… Date range: {full_dataset['application_date'].min()} to {full_dataset['application_date'].max()}")
        print(f"   ðŸ“ˆ Approval rate: {full_dataset['approved'].mean():.1%}")
        
        return full_dataset

In [None]:
full_data = create_temporal_dataset()
#Show a sample of data
full_data.head(5)

In [None]:
#Convert pandas DataFrame to Snowpark DataFrame and save as table
full_data_snowpark_df = session.create_dataframe(full_data)
full_data_snowpark_df.write.mode("overwrite").save_as_table('full_data')

# Evaluation dataset
Reserve the week 16-19 data as hold out test. This will be used later to decide new model (Challenger) performs better than existing Champion model.

In [None]:
# Evaluation data (weeks 16-19) - this is our "future" evaluation set
evaluation_week = 16
evaluation_mask = full_data['week_number'] >= evaluation_week
evaluation_data = full_data[evaluation_mask].copy()

print(f"âœ… Dataset created: {len(evaluation_data):,} total applications")
print(f"   ðŸ“… Date range: {evaluation_data['application_date'].min()} to {evaluation_data['application_date'].max()}")
print(f"   ðŸ“ˆ Approval rate: {evaluation_data['approved'].mean():.1%}")

#Convert pandas DataFrame to Snowpark DataFrame and save as table
eval_data_snowpark_df = session.create_dataframe(evaluation_data)
eval_data_snowpark_df.write.mode("overwrite").save_as_table('evaluation_data')