# Final Predictions and Submission

This notebook generates predictions on the test set using the best-performing model. The process involves loading the test data, applying the same feature engineering steps as for the training data, loading the trained model, generating predictions, and saving the results to a submission file.

## 1. Load Data and Configuration

In [1]:
import os
import sys
import joblib
import pandas as pd
import yaml
import matplotlib.pyplot as plt

sys.path.append('..')

from src.utils.helpers import create_features
from src.data_processing.load_data import save_csv_data

# Load configuration
config_path = '../config.yaml'
assert os.path.exists(config_path), 'Missing config.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

# Define paths
test_path = config['data']['raw']['test']
features_path = config['data']['processed']['features_test']
model_path = os.path.join(config['models']['trained_models'], 'lightgbm_model.pkl') # Using the best model
submission_path = os.path.join(config['data']['submissions'], 'submission_final.csv')

## 2. Feature Engineering on Test Data

Apply the same feature engineering steps to the test data as were applied to the training data.

In [2]:
test_features = create_features(test_path, config['feature_engineering']['timestamp_col'],
                                config['feature_engineering']['cols_to_lag'],
                                config['feature_engineering']['window_sizes']) 
save_csv_data(test_features, features_path)
print("Test features created and saved.")
test_features.head()

FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/test.csv'

## 3. Load Best Model and Generate Predictions

Load the best-performing model (LightGBM) and use it to generate predictions on the test features.

In [None]:
if not os.path.exists(model_path):
    raise FileNotFoundError(f"Trained model not found at {model_path}")

model = joblib.load(model_path)
print("Model loaded successfully.")

numeric_features = test_features.select_dtypes(include=['number'])
predictions = model.predict(numeric_features)
print("Predictions generated successfully.")

## 4. Visualize Predictions

Plot the predictions to visually inspect their distribution and trend over time.

In [None]:
plt.figure(figsize=(15, 6))
plt.plot(predictions, label='Predictions')
plt.title('Test Set Predictions Over Time')
plt.xlabel('Time')
plt.ylabel('Cooling Load')
plt.legend()
plt.grid(True)
plt.show()

## 5. Save Submission File

Save the predictions to a CSV file in the required submission format.

In [None]:
submission = pd.DataFrame({'prediction': predictions})
save_csv_data(submission, submission_path)
print(f"Submission file saved to {submission_path}")
submission.head()

## 6. Ensemble Prediction (Optional)

If an ensemble model is available, generate predictions using the ensemble and save them to a separate submission file.

In [None]:
from src.models.ensemble import load_ensemble
ensemble_path = os.path.join(config['models']['trained_models'], 'ensemble_model.pkl')
if os.path.exists(ensemble_path):
    ensemble = load_ensemble(ensemble_path)
    ensemble_preds = ensemble.predict(test_features.select_dtypes(include=['number']))
    ensemble_df = pd.DataFrame({'prediction': ensemble_preds})
    save_csv_data(ensemble_df, os.path.join(config['data']['submissions'], 'submission_ensemble.csv'))
    print("Ensemble predictions generated and saved.")
    ensemble_df.head()
else:
    print('No ensemble model found.')