# Phase 3: Submitting to Kaggle

The only way for us to test the strength of our model is by uploading the test predictions to Kaggle

## Setting up Kaggle

If you haven't set up authentication with Kaggle yet (you can test this by running the cell below), follow these steps:

1. Go to the Account tab of your [Kaggle profile](https://www.kaggle.com/settings/account)
2. Select 'Create New Token' (which will download a file `kaggle.json`)
3. If you are on a UNIX-based OS, place this at `~/.kaggle/kaggle.json`
    - For Windows, place this at `C:\Users\<Windows-username>\.kaggle\kaggle.json`

In [1]:
from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

competition = "house-prices-advanced-regression-techniques"



## Generate Predictions for Test Set

Finally, we can use our built pipeline to generate predictions for the test set which can be uploaded to Kaggle.

In [None]:
import joblib
import pandas as pd

# add ../models to path
import os, sys
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Load the pipeline
full_pipeline = joblib.load("../models/full_pipeline.joblib")

print("Loading data...")
test_data = pd.read_csv('../data/house-prices-advanced-regression-techniques/test.csv')
print("Test shape:", test_data.shape)

# make predictions on test data
y_test = full_pipeline.predict(test_data)

# format for Kaggle
submission = pd.DataFrame({
    'Id': test_data.Id,
    'SalePrice': y_test
})

submission

Loading data...
Test shape: (1459, 80)




Unnamed: 0,Id,SalePrice
0,1461,121052.757812
1,1462,160252.671875
2,1463,176320.250000
3,1464,195326.171875
4,1465,187779.000000
...,...,...
1454,2915,86371.492188
1455,2916,88848.492188
1456,2917,169327.453125
1457,2918,123622.695312


In [3]:
from datetime import datetime
now = datetime.now().strftime("%D_%T").replace('/', '-')

# save submission file
os.makedirs('../submissions', exist_ok=True)
submission_filename = f"submission_{now}.csv"
submission_path = f"../submissions/{submission_filename}"
submission.to_csv(submission_path, index=False)
print(f"Submission file saved to {submission_path}")

print("\nFirst few predictions:")
print(submission.head())

Submission file saved to ../submissions/submission_11-03-25_10:45:52.csv

First few predictions:
     Id      SalePrice
0  1461  121052.757812
1  1462  160252.671875
2  1463  176320.250000
3  1464  195326.171875
4  1465  187779.000000


In [4]:
message = f"submission {now}"
response = api.competition_submit(submission_path, message, competition)
response

100%|██████████| 21.1k/21.1k [00:00<00:00, 49.6kB/s]


{"message": "Successfully submitted to House Prices - Advanced Regression Techniques", "ref": 47864936}

In [5]:
# to solve latency with submission/query
from time import sleep
sleep(3)

In [6]:
leaderboard = api.competition_submissions(competition)
submission = [s for s in leaderboard if s.ref == response.ref][0]
other_submissions = [s for s in leaderboard if s.ref != response.ref]
other_submissions.sort(key = lambda x: x.date, reverse=True)

score = float(submission.public_score)
print(f"submission returned score of {score}")

print("\nLast 5 submissions:")
for s in other_submissions[:5]:
    print(f"\tSCORE: {s.public_score}")
    print(f"\tref: {s.ref}")
    print(f"\tdate: {s.date}")
    print(f"\tfile name: {s.file_name}")
    print(f"\tsubmitted by {s.submitted_by}\n")

submission returned score of 0.12659

Last 5 submissions:
	SCORE: 0.12899
	ref: 47864318
	date: 2025-11-03 15:17:49.717000
	file name: xgboost_submission.csv
	submitted by nicbolton

	SCORE: 0.12722
	ref: 47789860
	date: 2025-10-31 02:37:38.533000
	file name: xgboost_submission.csv
	submitted by nicbolton

	SCORE: 0.12877
	ref: 47789477
	date: 2025-10-31 02:14:32.393000
	file name: xgboost_submission.csv
	submitted by nicbolton

	SCORE: 0.12934
	ref: 47788396
	date: 2025-10-31 01:13:59.853000
	file name: xgboost_submission.csv
	submitted by nicbolton

	SCORE: 0.12773
	ref: 47788257
	date: 2025-10-31 01:04:05.963000
	file name: xgboost_submission.csv
	submitted by nicbolton

