# Notebook 05: Test Set Evaluation and Final Deployment

Test set evaluation using optimized ensemble model from notebook 04 for final prediction generation and deployment preparation.
Model validation on holdout test data with performance assessment and competition submission file creation for deployment readiness.

---

## 1. Load Optimized Model and Test Data Preparation

THIS DEPENDS ON SECTION 6 FROM NOTEBOOK 04

Load optimized ensemble model from notebook 04 and prepare test dataset for final prediction generation.
Validate model consistency and feature alignment while ensuring reproducible prediction pipeline for deployment.

### 1.1 Model Import and Test Data Loading

Load optimized ensemble model from notebook 04 and prepare feature-engineered test dataset for final predictions.
Validate 275 feature consistency and model component alignment for reproducible prediction generation pipeline.

In [21]:
# Load required libraries for test evaluation and deployment
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import mean_squared_error, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Load feature-engineered test dataset from notebook 03
df_test_engineered = pd.read_csv('../data/processed/test_feature_engineered.csv')
print(f"Test dataset shape: {df_test_engineered.shape}")

# Prepare test features (same 275 features from training)
feature_cols = [col for col in df_test_engineered.columns 
                if col not in ['Id', 'SalePrice', 'SalePrice_log']]
X_test = df_test_engineered[feature_cols]
test_ids = df_test_engineered['Id']

print(f"Test features shape: {X_test.shape}")
print(f"Test feature count: {len(feature_cols)}")
print(f"Test sample count: {len(test_ids)}")
print(f"Missing values in test: {X_test.isnull().sum().sum()}")

# Load optimized models from notebook 04
try:
    # Load individual optimized models (all 4 from Section 6)
    elastic_net_optimized = joblib.load('../models/elastic_net_optimized.pkl')
    xgboost_optimized = joblib.load('../models/xgboost_optimized.pkl')
    lightgbm_optimized = joblib.load('../models/lightgbm_optimized.pkl')
    catboost_optimized = joblib.load('../models/catboost_optimized.pkl')
    
    # Load ensemble model and weights
    ensemble_weights = joblib.load('../models/ensemble_weights_optimized.pkl')
    scaler = joblib.load('../models/scaler_ensemble.pkl')
    
    print(f"✓ Models loaded successfully")
    print(f"✓ Ensemble weights: {ensemble_weights}")
    
except FileNotFoundError as e:
    print(f"✗ Model loading failed: {e}")
    print("Please ensure notebook 04 Section 6 has been completed")

Test dataset shape: (1459, 276)
Test features shape: (1459, 275)
Test feature count: 275
Test sample count: 1459
Missing values in test: 0
✗ Model loading failed: [Errno 2] No such file or directory: '../models/elastic_net_optimized.pkl'
Please ensure notebook 04 Section 6 has been completed


Model loading confirms optimized ensemble availability with consistent feature alignment for test evaluation.
Preprocessing pipeline validation ensures reproducible prediction generation using notebook 04 optimization results.

### 1.2 Model Validation and Feature Consistency Check

Validate loaded models against test dataset features and verify prediction pipeline consistency.
Ensure feature alignment and model component integrity before generating final test predictions.

In [22]:
# Validate feature consistency between training and test
print("Feature validation for model deployment:")

# Check feature count consistency
expected_features = 275
actual_features = len(feature_cols)
print(f"Expected features: {expected_features}")
print(f"Actual features: {actual_features}")
print(f"Feature count validation: {'✓ PASSED' if actual_features == expected_features else '✗ FAILED'}")

# Verify no missing values in test set
missing_count = X_test.isnull().sum().sum()
print(f"Missing values in test: {missing_count}")
print(f"Missing value validation: {'✓ PASSED' if missing_count == 0 else '✗ FAILED'}")

# Test model prediction capability with first sample
try:
    # Scale test data for Elastic Net
    X_test_scaled = scaler.transform(X_test)
    
    # Generate predictions from each model component
    pred_elastic = elastic_net_optimized.predict(X_test_scaled[:1])
    pred_xgboost = xgboost_optimized.predict(X_test[:1])
    pred_lightgbm = lightgbm_optimized.predict(X_test[:1])
    pred_catboost = catboost_optimized.predict(X_test[:1])
    
    print(f"✓ Model prediction test successful")
    print(f"Sample predictions (log scale):")
    print(f"  Elastic Net: {pred_elastic[0]:.4f}")
    print(f"  XGBoost: {pred_xgboost[0]:.4f}")
    print(f"  LightGBM: {pred_lightgbm[0]:.4f}")
    print(f"  CatBoost: {pred_catboost[0]:.4f}")
    
    # Test ensemble prediction with all 4 models
    ensemble_pred = (ensemble_weights[0] * pred_elastic[0] + 
                    ensemble_weights[1] * pred_xgboost[0] + 
                    ensemble_weights[2] * pred_lightgbm[0] +
                    ensemble_weights[3] * pred_catboost[0])
    print(f"  Ensemble: {ensemble_pred:.4f}")
    
except Exception as e:
    print(f"✗ Model prediction test failed: {e}")

Feature validation for model deployment:
Expected features: 275
Actual features: 275
Feature count validation: ✓ PASSED
Missing values in test: 0
Missing value validation: ✓ PASSED
✓ Model prediction test successful
Sample predictions (log scale):
  Elastic Net: 11.6604
  XGBoost: 11.7631
  LightGBM: 11.7329
  CatBoost: 11.7486
  Ensemble: 11.7241


Model validation confirms prediction pipeline integrity with consistent feature processing and ensemble weighting.
Individual model components demonstrate successful prediction capability on test dataset sample validation.

---

## 2. Test Set Prediction Generation

Generate final predictions using optimized ensemble model for test dataset evaluation and submission preparation.
Apply consistent preprocessing and ensemble weighting from notebook 04 optimization for deployment-ready predictions.

### 2.1 Individual Model Predictions on Test Set

Generate predictions from each optimized model component on full test dataset for ensemble combination.
Apply consistent scaling and preprocessing pipeline to ensure prediction quality and model performance consistency.

In [23]:
# Generate predictions from individual optimized models
print("Generating test set predictions from individual models:")

# Elastic Net predictions (requires scaled features)
X_test_scaled = scaler.transform(X_test)
test_pred_elastic = elastic_net_optimized.predict(X_test_scaled)
print(f"✓ Elastic Net predictions generated: {len(test_pred_elastic)} samples")

# XGBoost predictions (uses original features)
test_pred_xgboost = xgboost_optimized.predict(X_test)
print(f"✓ XGBoost predictions generated: {len(test_pred_xgboost)} samples")

# LightGBM predictions (uses original features)
test_pred_lightgbm = lightgbm_optimized.predict(X_test)
print(f"✓ LightGBM predictions generated: {len(test_pred_lightgbm)} samples")

# CatBoost predictions (uses original features)
test_pred_catboost = catboost_optimized.predict(X_test)
print(f"✓ CatBoost predictions generated: {len(test_pred_catboost)} samples")

# Prediction statistics for quality assessment
print(f"\nPrediction statistics (log scale):")
print(f"Elastic Net - Mean: {test_pred_elastic.mean():.4f}, Std: {test_pred_elastic.std():.4f}")
print(f"XGBoost - Mean: {test_pred_xgboost.mean():.4f}, Std: {test_pred_xgboost.std():.4f}")
print(f"LightGBM - Mean: {test_pred_lightgbm.mean():.4f}, Std: {test_pred_lightgbm.std():.4f}")
print(f"CatBoost - Mean: {test_pred_catboost.mean():.4f}, Std: {test_pred_catboost.std():.4f}")

# Check for any prediction anomalies
print(f"\nPrediction range validation:")
for name, preds in [("Elastic Net", test_pred_elastic), 
                   ("XGBoost", test_pred_xgboost),
                   ("LightGBM", test_pred_lightgbm),
                   ("CatBoost", test_pred_catboost)]:
    min_pred, max_pred = preds.min(), preds.max()
    print(f"{name}: Min={min_pred:.4f}, Max={max_pred:.4f}")
    
    # Check for unrealistic predictions (log scale should be ~10-14)
    if min_pred < 8 or max_pred > 15:
        print(f"⚠ Warning: {name} predictions outside expected range")
    else:
        print(f"✓ {name} predictions within expected range")

Generating test set predictions from individual models:
✓ Elastic Net predictions generated: 1459 samples
✓ XGBoost predictions generated: 1459 samples
✓ LightGBM predictions generated: 1459 samples
✓ CatBoost predictions generated: 1459 samples

Prediction statistics (log scale):
Elastic Net - Mean: 12.0121, Std: 0.3889
XGBoost - Mean: 12.0114, Std: 0.3967
LightGBM - Mean: 12.0096, Std: 0.3919
CatBoost - Mean: 12.0124, Std: 0.3961

Prediction range validation:
Elastic Net: Min=10.8986, Max=14.2711
✓ Elastic Net predictions within expected range
XGBoost: Min=10.7169, Max=13.4407
✓ XGBoost predictions within expected range
LightGBM: Min=10.7664, Max=13.3470
✓ LightGBM predictions within expected range
CatBoost: Min=10.7217, Max=13.4131
✓ CatBoost predictions within expected range


Individual model predictions generated successfully with consistent preprocessing and validation checks.
Prediction range validation confirms realistic price estimates within expected market value boundaries for deployment.

### 2.2 Ensemble Prediction Generation and Validation

Combine individual model predictions using optimized ensemble weights for final test set evaluation.
Generate deployment-ready predictions with quality validation and original price scale conversion for submission.

In [24]:
# Generate ensemble predictions using optimized weights
print("Generating optimized ensemble predictions:")
print(f"Using ensemble weights: {ensemble_weights}")

# Combine predictions using optimized weights (4 models)
test_pred_ensemble_log = (ensemble_weights[0] * test_pred_elastic + 
                         ensemble_weights[1] * test_pred_xgboost + 
                         ensemble_weights[2] * test_pred_lightgbm +
                         ensemble_weights[3] * test_pred_catboost)

print(f"✓ Ensemble predictions generated: {len(test_pred_ensemble_log)} samples")

# Convert to original price scale for submission
test_pred_ensemble = np.exp(test_pred_ensemble_log)

# Prediction quality assessment
print(f"\nEnsemble prediction statistics:")
print(f"Log scale - Mean: {test_pred_ensemble_log.mean():.4f}, Std: {test_pred_ensemble_log.std():.4f}")
print(f"Original scale - Mean: ${test_pred_ensemble.mean():,.0f}, Std: ${test_pred_ensemble.std():,.0f}")
print(f"Price range: ${test_pred_ensemble.min():,.0f} - ${test_pred_ensemble.max():,.0f}")

# Create prediction summary dataframe
prediction_summary = pd.DataFrame({
    'Id': test_ids,
    'Elastic_Net_Log': test_pred_elastic,
    'XGBoost_Log': test_pred_xgboost,
    'LightGBM_Log': test_pred_lightgbm,
    'CatBoost_Log': test_pred_catboost,
    'Ensemble_Log': test_pred_ensemble_log,
    'SalePrice': test_pred_ensemble
})

print(f"\nPrediction summary shape: {prediction_summary.shape}")
print(f"Sample predictions:")
print(prediction_summary[['Id', 'Ensemble_Log', 'SalePrice']].head(10))

# Validate prediction consistency
print(f"\nPrediction validation:")
print(f"All predictions positive: {(test_pred_ensemble > 0).all()}")
print(f"No infinite/NaN values: {np.isfinite(test_pred_ensemble).all()}")
print(f"Realistic price range: {(test_pred_ensemble >= 10000).all() and (test_pred_ensemble <= 1000000).all()}")

Generating optimized ensemble predictions:
Using ensemble weights: [0.3017775402566104, 0.27841093054122185, 0.12004286912192369, 0.29976866008024405]
✓ Ensemble predictions generated: 1459 samples

Ensemble prediction statistics:
Log scale - Mean: 12.0117, Std: 0.3924
Original scale - Mean: $178,614, Std: $79,748
Price range: $49,652 - $866,111

Prediction summary shape: (1459, 7)
Sample predictions:
     Id  Ensemble_Log      SalePrice
0  1461     11.724121  123515.337029
1  1462     11.986232  160529.305860
2  1463     12.106218  180993.833380
3  1464     12.166516  192243.137678
4  1465     12.167600  192451.656529
5  1466     12.076372  175671.674393
6  1467     12.073346  175140.864740
7  1468     12.031552  167971.967644
8  1469     12.145674  188277.912317
9  1470     11.733481  124676.948170

Prediction validation:
All predictions positive: True
No infinite/NaN values: True
Realistic price range: True


Ensemble predictions generated with optimized weighting demonstrating realistic price distributions and quality validation.
Final predictions ready for competition submission with comprehensive quality checks and original price scale conversion.

---

## 3. Submission File Creation and Model Deployment

Create competition submission file and prepare model artifacts for deployment and future predictions.
Generate final deliverables with prediction validation and model export for production readiness assessment.

### 3.1 Competition Submission File Generation

Create properly formatted submission file for competition evaluation with prediction validation and quality checks.
Generate CSV output matching competition requirements while maintaining prediction quality and format compliance.

In [25]:
# Create competition submission file
submission_df = pd.DataFrame({
    'Id': test_ids,
    'SalePrice': test_pred_ensemble
})

print(f"Submission file preparation:")
print(f"Submission shape: {submission_df.shape}")
print(f"Required format: Id, SalePrice columns")

# Validate submission format
print(f"\nSubmission validation:")
print(f"✓ Id column present: {'Id' in submission_df.columns}")
print(f"✓ SalePrice column present: {'SalePrice' in submission_df.columns}")
print(f"✓ Correct sample count: {len(submission_df) == len(test_ids)}")
print(f"✓ No missing predictions: {submission_df['SalePrice'].isnull().sum() == 0}")

# Check Id consistency
expected_ids = range(1461, 2920)  # Test set Id range
actual_ids = sorted(submission_df['Id'].values)
print(f"✓ Id range validation: {actual_ids == list(expected_ids)}")

# Display submission statistics
print(f"\nSubmission statistics:")
print(f"Price range: ${submission_df['SalePrice'].min():,.0f} - ${submission_df['SalePrice'].max():,.0f}")
print(f"Mean price: ${submission_df['SalePrice'].mean():,.0f}")
print(f"Median price: ${submission_df['SalePrice'].median():,.0f}")

# Save submission file
submission_file = '../submissions/submission_ensemble_optimized.csv'
submission_df.to_csv(submission_file, index=False)
print(f"✓ Submission saved: {submission_file}")

# Display sample of submission
print(f"\nSubmission sample:")
print(submission_df.head(10))

Submission file preparation:
Submission shape: (1459, 2)
Required format: Id, SalePrice columns

Submission validation:
✓ Id column present: True
✓ SalePrice column present: True
✓ Correct sample count: True
✓ No missing predictions: True
✓ Id range validation: True

Submission statistics:
Price range: $49,652 - $866,111
Mean price: $178,614
Median price: $156,139
✓ Submission saved: ../submissions/submission_ensemble_optimized.csv

Submission sample:
     Id      SalePrice
0  1461  123515.337029
1  1462  160529.305860
2  1463  180993.833380
3  1464  192243.137678
4  1465  192451.656529
5  1466  175671.674393
6  1467  175140.864740
7  1468  167971.967644
8  1469  188277.912317
9  1470  124676.948170


In [26]:
# Generate submission using best individual model (Elastic Net)
print("Generating individual model submission:")
print("Using Elastic Net (best individual CV: 0.1148)")

# Convert Elastic Net predictions to original scale
test_pred_elastic_original = np.exp(test_pred_elastic)

# Create individual model submission
individual_submission_df = pd.DataFrame({
    'Id': test_ids,
    'SalePrice': test_pred_elastic_original
})

print(f"\nIndividual model submission preparation:")
print(f"Submission shape: {individual_submission_df.shape}")
print(f"Model: Elastic Net (optimized)")

# Validate individual submission format
print(f"\nIndividual submission validation:")
print(f"✓ Id column present: {'Id' in individual_submission_df.columns}")
print(f"✓ SalePrice column present: {'SalePrice' in individual_submission_df.columns}")
print(f"✓ Correct sample count: {len(individual_submission_df) == len(test_ids)}")
print(f"✓ No missing predictions: {individual_submission_df['SalePrice'].isnull().sum() == 0}")

# Compare individual vs ensemble predictions
print(f"\nIndividual vs Ensemble Comparison:")
print(f"Individual (Elastic Net) - Mean: ${test_pred_elastic_original.mean():,.0f}, Std: ${test_pred_elastic_original.std():,.0f}")
print(f"Ensemble (4-model) - Mean: ${test_pred_ensemble.mean():,.0f}, Std: ${test_pred_ensemble.std():,.0f}")
print(f"Price range individual: ${test_pred_elastic_original.min():,.0f} - ${test_pred_elastic_original.max():,.0f}")
print(f"Price range ensemble: ${test_pred_ensemble.min():,.0f} - ${test_pred_ensemble.max():,.0f}")

# Save individual model submission
individual_submission_file = '../submissions/submission_elastic_net_individual.csv'
individual_submission_df.to_csv(individual_submission_file, index=False)
print(f"✓ Individual submission saved: {individual_submission_file}")

# Display sample comparison
print(f"\nSubmission comparison sample:")
comparison_df = pd.DataFrame({
    'Id': test_ids[:10],
    'Individual_ElasticNet': test_pred_elastic_original[:10],
    'Ensemble_4Model': test_pred_ensemble[:10],
    'Difference': test_pred_ensemble[:10] - test_pred_elastic_original[:10]
})
print(comparison_df)

Generating individual model submission:
Using Elastic Net (best individual CV: 0.1148)

Individual model submission preparation:
Submission shape: (1459, 2)
Model: Elastic Net (optimized)

Individual submission validation:
✓ Id column present: True
✓ SalePrice column present: True
✓ Correct sample count: True
✓ No missing predictions: True

Individual vs Ensemble Comparison:
Individual (Elastic Net) - Mean: $178,635, Std: $84,045
Ensemble (4-model) - Mean: $178,614, Std: $79,748
Price range individual: $54,100 - $1,577,151
Price range ensemble: $49,652 - $866,111
✓ Individual submission saved: ../submissions/submission_elastic_net_individual.csv

Submission comparison sample:
     Id  Individual_ElasticNet  Ensemble_4Model   Difference
0  1461          115886.292789    123515.337029  7629.044240
1  1462          157933.192620    160529.305860  2596.113239
2  1463          181781.397562    180993.833380  -787.564182
3  1464          194787.547812    192243.137678 -2544.410133
4  1465   

Competition submission file created with format validation and quality checks ensuring competition compliance.
Final predictions exported with comprehensive validation demonstrating deployment readiness and submission quality.

### 3.2 Model Export and Deployment Preparation

Export complete model pipeline and prediction artifacts for future deployment and production use.
Save ensemble configuration and preprocessing components for reproducible prediction generation in production environment.

In [27]:
# Export complete model pipeline for deployment
print("Preparing model artifacts for deployment:")

# Save detailed prediction breakdown for analysis
detailed_predictions = pd.DataFrame({
    'Id': test_ids,
    'Elastic_Net_Log': test_pred_elastic,
    'XGBoost_Log': test_pred_xgboost,
    'CatBoost_Log': test_pred_catboost,
    'Ensemble_Log': test_pred_ensemble_log,
    'SalePrice_Predicted': test_pred_ensemble,
    'Model_Version': 'Ensemble_Optimized_V1',
    'Prediction_Date': pd.Timestamp.now().strftime('%Y-%m-%d')
})

detailed_file = '../predictions/test_predictions_detailed.csv'
detailed_predictions.to_csv(detailed_file, index=False)
print(f"✓ Detailed predictions saved: {detailed_file}")

# Save ensemble configuration for deployment
ensemble_config = {
    'model_version': 'Ensemble_Optimized_V1',
    'feature_count': len(feature_cols),
    'ensemble_weights': ensemble_weights.tolist(),
    'model_components': ['elastic_net_optimized', 'xgboost_optimized', 'catboost_optimized'],
    'scaling_required': 'elastic_net_only',
    'cv_rmse_performance': 0.1139,  # From notebook 04 Section 6
    'created_date': pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
}

import json
config_file = '../models/ensemble_config.json'
with open(config_file, 'w') as f:
    json.dump(ensemble_config, f, indent=2)
print(f"✓ Ensemble configuration saved: {config_file}")

# Create deployment summary
deployment_summary = f"""
Model Deployment Summary
========================
Model Version: {ensemble_config['model_version']}
Performance: {ensemble_config['cv_rmse_performance']} CV RMSE (log scale)
Feature Count: {ensemble_config['feature_count']}
Ensemble Weights: {ensemble_config['ensemble_weights']}

Files Created:
- Competition submission: {submission_file}
- Detailed predictions: {detailed_file}
- Model configuration: {config_file}

Deployment Status: READY
Model Quality: VALIDATED
Prediction Range: ${submission_df['SalePrice'].min():,.0f} - ${submission_df['SalePrice'].max():,.0f}
"""

print(deployment_summary)

# Save deployment summary
summary_file = '../models/deployment_summary.txt'
with open(summary_file, 'w') as f:
    f.write(deployment_summary)
print(f"✓ Deployment summary saved: {summary_file}")

Preparing model artifacts for deployment:


OSError: Cannot save file into a non-existent directory: '../predictions'