# Build Your First Machine Learning Project - Part 4 | `Experiment Tracking`

In this notebook, we'll demonstrate **experiment tracking** using Snowflake ML's experiment tracking capabilities with the **Bear Species Classification** dataset. We'll train multiple models and compare their performance using Snowflake's built-in experiment tracking features.

### What We'll Cover:

1. **Data Loading and Preparation** - Load and process the bear dataset using Snowpark (`snowflake-snowpark-python`)
2. **Experiment Setup** - Initialize Snowflake ML experiment tracking (`ExperimentTracking()` from `snowflake-ml-python`)
3. **Model Training** - Train multiple models as part of hyperparameter tuning with `scikit-learn` using the Random Forest algorithm. Performance are tracked.
4. **Performance Comparison** - Compare models using tracked metrics
5. **Model Selection** - Select the best performing model and register with Snowflake Model Registry (`log_model()` from `snowflake-ml-python`)

## 1. Setup and Data Loading

First, let's set up our Snowflake session and check our GPU compute. Next, we'll load the bear dataset.


In [None]:
! pip install snowflake-ml-python

In [None]:
from snowflake.snowpark.context import get_active_session
import streamlit as st

# Get active Snowflake session 
session = get_active_session()
st.write("✅ Connected using active Snowflake session!")

In [None]:
import warnings

# Filter out ResourceWarning
warnings.filterwarnings('ignore', category=ResourceWarning)

# Filter out DeprecationWarning
warnings.filterwarnings('ignore', category=DeprecationWarning)

# Filter out UserWarning
warnings.filterwarnings('ignore', category=UserWarning)


### Load Data

Finally, we'll load in the Bear data set.

In [None]:
# Load bear dataset from Snowflake
import pandas as pd

bear_df = session.table('BEAR').to_pandas()

st.write("📊 Bear Dataset Loaded:")
st.write(f"Shape: {bear_df.shape}")

bear_df

## 2. Data Preparation

### Prepare features and target variables

In [None]:
# Prepare features and target
from sklearn.model_selection import train_test_split

X = bear_df.drop(columns=['species', 'id'])
y = bear_df['species']

### Missing data

It's always good practice to check for missing data.

In [None]:
# Check for missing data
missing_features = X.isnull().sum().sum()
missing_target = y.isnull().sum()

st.subheader("🔍 Data Quality Check:")
st.write(f"Missing feature values: `{missing_features}`")
st.write(f"Missing target values: `{missing_target}`")

### Data splitting

Here, we'll split the data to training and test sets using the 80/20 split ratio.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42, 
    stratify=y
)

st.write("✅ Data preparation completed!")

st.subheader("📊 Data Split Summary:")
st.write(f"Training set: `{X_train.shape[0]} samples ({X_train.shape[0]/len(X)*100:.1f}%)`")
st.write(f"Testing set: `{X_test.shape[0]} samples ({X_test.shape[0]/len(X)*100:.1f}%)`")
st.write(f"Number of features: {X_train.shape[1]}")

st.subheader("🎯 Class Distribution:")
st.write(f"Training set: `{y_train.value_counts().sort_index().to_dict()}`")
st.write(f"Testing set: `{y_test.value_counts().sort_index().to_dict()}`")

### Feature Scaling

To prepare our data for model training, we'll apply the following pre-processing:
- `StandardScaler` for numerical features. This transforms features to have mean=0 and std=1
- `OneHotEncoder` for categorical features (fur_color, facial_profile, paw_pad_texture). This converts categorical variables into binary columns.


In [None]:
# Feature scaling and preprocessing
import numpy as np
from sklearn.preprocessing import StandardScaler, OneHotEncoder

# Identify numerical and categorical columns
numerical_features = X_train.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X_train.select_dtypes(include=['object']).columns

# Scale numerical features
scaler = StandardScaler()
X_train_scaled_num = scaler.fit_transform(X_train[numerical_features])
X_test_scaled_num = scaler.transform(X_test[numerical_features])

# Handle categorical features
onehot = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
X_train_scaled_cat = onehot.fit_transform(X_train[categorical_features])
X_test_scaled_cat = onehot.transform(X_test[categorical_features])

# Get feature names after one-hot encoding and replace spaces with underscores
cat_feature_names = []
for feature, categories in zip(categorical_features, onehot.categories_):
    for category in categories:
        cat_feature_names.append(f"{feature}_{category}".replace(' ', '_').lower())

# Combine numerical and categorical features
X_train_scaled = np.hstack([X_train_scaled_num, X_train_scaled_cat])
X_test_scaled = np.hstack([X_test_scaled_num, X_test_scaled_cat])

# Convert to DataFrame with proper column names
all_feature_names = list(numerical_features) + cat_feature_names
X_train_scaled = pd.DataFrame(X_train_scaled, columns=all_feature_names, index=X_train.index)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=all_feature_names, index=X_test.index)

st.write("✅ Feature scaling completed!")

st.subheader("🔧 Feature Processing:")
st.write(f"Numerical features: `{numerical_features.tolist()}`")
st.write(f"Categorical features: `{categorical_features.tolist()}`")

st.subheader("📊 Scaled Data Dimensions:")
st.write(f"Training features: `{X_train_scaled.shape}`")
st.write(f"Testing features: `{X_test_scaled.shape}`")

# Display first few encoded feature names
st.write("\n🏷️ First few encoded feature names:")
st.write(all_feature_names[:10])


### Encode Target Variable

Let's also encode the target variable (Bear species) to numerical values using scikit-learn's `LabelEncoder`.
- Machine learning models require numerical inputs
- Each unique bear species will be assigned a unique integer value
- The encoding preserves the categorical nature of the species while making it suitable for model training

In [None]:
from sklearn.preprocessing import LabelEncoder

# Encode target variable
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)
y_test_encoded = label_encoder.transform(y_test)


In [None]:
y_train_encoded

In [None]:
y_test_encoded

### Write Test Data to Table

Let's now write the test set data (stored in the `X_test_scaled` variable) to a Snowflake table `BEAR_TEST_DATA`

In [None]:
# Create a copy of X_test_scaled
test_df = X_test_scaled.copy()

# Add the encoded target variable
test_df['ACTUAL_SPECIES'] = y_test_encoded

# Convert to Snowpark DataFrame and write to table
snowpark_df = session.create_dataframe(test_df)
snowpark_df.write.mode("overwrite").save_as_table("CHANINN_DEMO_DATA.PUBLIC.BEAR_TEST_DATA")

st.write("✅ Data successfully saved to BEAR_TEST_DATA table!")
st.write(f"Number of rows: {len(test_df)}")
st.write(f"Number of columns: {len(test_df.columns)}")


## 3. Experiment Tracking Setup

Now let's set up our experiment tracking using Snowflake ML's experiment tracking capabilities. This will allow us to systematically log and compare different models and their hyperparameters.

We'll start out by creating the experiment tracker with `ExperimentTracking` from the Snowflake ML package.

In [None]:
from snowflake.ml.experiment.experiment_tracking import ExperimentTracking

# Create ExperimentTracking
exp = ExperimentTracking(session=session)

# Set Experiment Name
experiment_name = "Bear_Classification_Experiment"
exp.set_experiment(experiment_name)

st.write(f"✅ Experiment Tracking Initialized: `{experiment_name}`")

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, matthews_corrcoef
from datetime import datetime

# Define Hyperparameters ---
params = {
    "n_estimators": 100,
    "max_depth": 3,
    "min_samples_leaf": 5,
    'max_features': 'sqrt',
    "random_state": 42
}

# Create unique run name with timestamp
run_name = f"bear_baseline_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

# Train, Evaluate and Log Model
with exp.start_run(run_name):
    # Log hyperparameters
    exp.log_params(params)

    # Train model
    model = RandomForestClassifier(**params)
    model.fit(X_train_scaled, y_train_encoded)

    # Predict
    y_pred = model.predict(X_test_scaled)

    # Calculate metrics
    acc = accuracy_score(y_test_encoded, y_pred)
    precision = precision_score(y_test_encoded, y_pred, average='macro')
    recall = recall_score(y_test_encoded, y_pred, average='macro')
    mcc = matthews_corrcoef(y_test_encoded, y_pred)

    # Log metrics
    exp.log_metric("accuracy", acc)
    exp.log_metric("precision", precision)
    exp.log_metric("recall", recall)
    exp.log_metric("mcc", mcc)

    # Display results
    st.write("📊 Model Performance:")
    st.write(f"- Accuracy: `{acc:.4f}`")
    st.write(f"- Precision: `{precision:.4f}`")
    st.write(f"- Recall: `{recall:.4f}`")
    st.write(f"- MCC: `{mcc:.4f}`")


## 4. Hyperparameter Tuning

Now let's analyze the results from our Random Forest hyperparameter tuning experiments. We'll retrieve the logged metrics and examine the tuning process in detail.


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, matthews_corrcoef
from datetime import datetime
import itertools
import pandas as pd

# Define hyperparameter grid
param_grid = {
    "n_estimators": [100, 200],
    "max_depth": [3, 5],
    "min_samples_leaf": [1, 2, 5],
    "max_features": ['sqrt', 'log2']
}

# Initialize list to store results
results = []

# Generate all combinations of parameters
param_combinations = [dict(zip(param_grid.keys(), v)) for v in itertools.product(*param_grid.values())]

# Train models with different parameters
for params in param_combinations:
    # Add random state to params
    params['random_state'] = 42
    
    # Create unique run name with timestamp and params summary
    run_name = f"RF_tune_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

    with exp.start_run(run_name):
        # Log hyperparameters
        exp.log_params(params)

        # Train model
        model = RandomForestClassifier(**params)
        model.fit(X_train_scaled, y_train_encoded)

        # Predict
        y_pred = model.predict(X_test_scaled)

        # Calculate metrics
        acc = accuracy_score(y_test_encoded, y_pred)
        precision = precision_score(y_test_encoded, y_pred, average='macro')
        recall = recall_score(y_test_encoded, y_pred, average='macro')
        mcc = matthews_corrcoef(y_test_encoded, y_pred)

        # Log metrics
        exp.log_metric("accuracy", acc)
        exp.log_metric("precision", precision)
        exp.log_metric("recall", recall)
        exp.log_metric("mcc", mcc)

        # Store results
        results.append({
            'run_name': run_name,
            'n_estimators': params['n_estimators'],
            'max_depth': params['max_depth'],
            'min_samples_leaf': params['min_samples_leaf'],
            'max_features': params['max_features'],
            'accuracy': acc,
            'precision': precision,
            'recall': recall,
            'mcc': mcc
        })

        st.write(f"Parameters: {params}")

Now that we've run the hyperparameter tuning, let's analyze the results to find the best performing model.

In [None]:
# Create DataFrame from results
results_df = pd.DataFrame(results)

# Display summary statistics
st.write("📊 Model Performance Summary:")
st.dataframe(results_df.style.highlight_max(subset=['accuracy', 'precision', 'recall', 'mcc'], color="green"))

# Display best performing configuration
best_model = results_df.loc[results_df['accuracy'].idxmax()]

st.subheader("🏆 Best Model Configuration:")
st.write("Learning Algorithm: `Random Forest`")
st.write(f"Accuracy: `{best_model['accuracy']:.4f}`")
st.write(f"Precision: `{best_model['precision']:.4f}`")
st.write(f"Recall: `{best_model['recall']:.4f}`")
st.write(f"MCC: `{best_model['mcc']:.4f}`")
st.write(f"Learning Parameters:")
st.write(f"- n_estimators: `{best_model['n_estimators']}`")
st.write(f"- max_depth: `{best_model['max_depth']}`")
st.write(f"- min_samples_leaf: `{best_model['min_samples_leaf']}`")
st.write(f"- max_features: `{best_model['max_features']}`")


## 5. Model Registry

Let's now register the best model to Snowflake's Model Registry for deployment and management.


### Train Final Model with Best Parameters

Now that we've identified the best performing model configuration from our hyperparameter tuning experiments, let's train the final model using these optimal parameters:

- Number of estimators: 100
- Maximum depth: 5
- Minimum samples per leaf: 2
- Maximum features: sqrt

In [None]:
# Get best model configuration from previous results
best_params = {
    'n_estimators': int(best_model['n_estimators']),
    'max_depth': int(best_model['max_depth']),
    'min_samples_leaf': int(best_model['min_samples_leaf']),
    'max_features': best_model['max_features'],
    'random_state': 42
}

# Train the best model
final_model = RandomForestClassifier(**best_params)
final_model.fit(X_train_scaled, y_train_encoded)

In [None]:
best_params

### Register Best Model

Now we'll register our best-performing Random Forest model in Snowflake's Model Registry. 

In [None]:
# Create model registry instance
from snowflake.ml.registry import Registry
from datetime import datetime
import warnings

# Temporarily suppress warnings
warnings.filterwarnings('ignore', category=RuntimeWarning)

registry = Registry(session)

# Clean the column names by replacing spaces with underscores
X_train_clean = X_train_scaled.copy()
X_train_clean.columns = X_train_clean.columns.str.replace(' ', '_')

# Register model
model_name = "BEAR_SPECIES_CLASSIFIER"
model_version = f"v_{datetime.now().strftime('%Y%m%d_%H%M%S')}"

model_ref = registry.log_model(
    model=final_model,
    model_name=model_name,
    version_name=model_version,
    sample_input_data=X_train_clean.head(5),
    metrics={
        'accuracy': float(best_model['accuracy']),
        'precision': float(best_model['precision']),
        'recall': float(best_model['recall']),
        'mcc': float(best_model['mcc'])
    },
    options={
        "case_sensitive": True,
        "min_positive_value": 1e-10  # Add small constant to prevent log(0)
    },
    comment="Best performing Random Forest model from hyperparameter tuning"
)

# Reset warnings to default
warnings.resetwarnings()

st.write("✅ Model successfully registered!")
st.write(f"Model Name: `{model_name}`")
st.write(f"Version: `{model_version}`")


### Show Models in Registry

In [None]:
SHOW MODELS

In [None]:
registry.show_models()

### Show Available Versions in a Model

In [None]:
SHOW VERSIONS IN MODEL bear_species_classifier;

### Show Available Functions in a Model

In [None]:
SHOW FUNCTIONS IN MODEL bear_species_classifier

## Deploy the Model as a Service

In [None]:
# First drop existing service using SQL through session
session.sql("DROP SERVICE IF EXISTS bear_rf_classifier").collect()

# Deploy to a GPU compute pool on SPCS
model_ref.create_service(
    service_name="bear_rf_classifier",
    service_compute_pool="system_compute_pool_cpu",
    ingress_enabled=True,
    gpu_requests=None
)

st.write("✅ Model service created successfully!")

### Show Service Endpoints

Let's examine the endpoints exposed by our deployed model.

In [None]:
SHOW SERVICES;

In [None]:
SHOW ENDPOINTS IN SERVICE bear_rf_classifier;

### Perform Model Inference

Now that our model is deployed as a service, we can use it to make predictions on new data. 

Here's what we're doing:
- Query the `BEAR_TEST_DATA` table
- Pass the features through our deployed model
- Get predictions for bear species classification using the `PREDICT()` function

In [None]:
SELECT
  BEAR_RF_CLASSIFIER ! PREDICT(
    "body_mass_kg",
    "shoulder_hump_height_cm",
    "claw_length_cm",
    "snout_length_cm",
    "forearm_circumference_cm",
    "ear_length_cm",
    "fur_color_black",
    "fur_color_blackish_brown",
    "fur_color_blond",
    "fur_color_brown",
    "fur_color_cinnamon",
    "fur_color_dark_brown",
    "fur_color_grizzled",
    "fur_color_light_brown",
    "fur_color_medium_brown",
    "fur_color_reddish_brown",
    "facial_profile_dished",
    "facial_profile_straight",
    "paw_pad_texture_rough",
    "paw_pad_texture_smooth"
  ) AS predicted_species
FROM
  CHANINN_DEMO_DATA.PUBLIC.BEAR_TEST_DATA
LIMIT
  5;

### Average Values of Features

In [None]:
SELECT
  AVG("body_mass_kg") AS "body_mass_kg",
  AVG("shoulder_hump_height_cm") AS "shoulder_hump_height_cm",
  AVG("claw_length_cm") AS "claw_length_cm",
  AVG("snout_length_cm") AS "snout_length_cm",
  AVG("forearm_circumference_cm") AS "forearm_circumference_cm",
  AVG("ear_length_cm") AS "ear_length_cm",

  -- Fur Color Proportions
  AVG(CASE WHEN "fur_color" = 'Black' THEN 1 ELSE 0 END) AS "fur_color_black",
  AVG(CASE WHEN "fur_color" = 'Blackish-Brown' THEN 1 ELSE 0 END) AS "fur_color_blackish_brown",
  AVG(CASE WHEN "fur_color" = 'Blond' THEN 1 ELSE 0 END) AS "fur_color_blond",
  AVG(CASE WHEN "fur_color" = 'Brown' THEN 1 ELSE 0 END) AS "fur_color_brown",
  AVG(CASE WHEN "fur_color" = 'Cinnamon' THEN 1 ELSE 0 END) AS "fur_color_cinnamon",
  AVG(CASE WHEN "fur_color" = 'Dark Brown' THEN 1 ELSE 0 END) AS "fur_color_dark_brown",
  AVG(CASE WHEN "fur_color" = 'Grizzled' THEN 1 ELSE 0 END) AS "fur_color_grizzled",
  AVG(CASE WHEN "fur_color" = 'Light Brown' THEN 1 ELSE 0 END) AS "fur_color_light_brown",
  AVG(CASE WHEN "fur_color" = 'Medium Brown' THEN 1 ELSE 0 END) AS "fur_color_medium_brown",
  AVG(CASE WHEN "fur_color" = 'Reddish-Brown' THEN 1 ELSE 0 END) AS "fur_color_reddish_brown",

  -- Facial Profile Proportions
  AVG(CASE WHEN "facial_profile" = 'Dished' THEN 1 ELSE 0 END) AS "facial_profile_dished",
  AVG(CASE WHEN "facial_profile" = 'Straight' THEN 1 ELSE 0 END) AS "facial_profile_straight",

  -- Paw Pad Texture Proportions
  AVG(CASE WHEN "paw_pad_texture" = 'Rough' THEN 1 ELSE 0 END) AS "paw_pad_texture_rough",
  AVG(CASE WHEN "paw_pad_texture" = 'Smooth' THEN 1 ELSE 0 END) AS "paw_pad_texture_smooth"
FROM
  CHANINN_DEMO_DATA.PUBLIC.BEAR
WHERE
  "species" ILIKE '%American Black Bear%';

## Resources

Dive deeper into the topics mentioned in this notebook with these great articles:
- [Snowflake Model Registry](https://docs.snowflake.com/en/developer-guide/snowflake-ml/model-registry/overview)
- [Python APIs for Snowflake ML](https://docs.snowflake.com/en/developer-guide/snowflake-ml/snowpark-ml)
- [Snowflake ML: End-to-End Machine Learning](https://docs.snowflake.com/en/developer-guide/snowflake-ml/overview)