# Phase 3: Submitting to Kaggle

The only way for us to test the strength of our model is by uploading the test predictions to Kaggle

## Setting up Kaggle

If you haven't set up authentication with Kaggle yet (you can test this by running the cell below), follow these steps:

1. Go to the Account tab of your [Kaggle profile](https://www.kaggle.com/settings/account)
2. Select 'Create New Token' (which will download a file `kaggle.json`)
3. If you are on a UNIX-based OS, place this at `~/.kaggle/kaggle.json`
    - For Windows, place this at `C:\Users\<Windows-username>\.kaggle\kaggle.json`

In [1]:
from dotenv import load_dotenv
load_dotenv()

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

competition = "house-prices-advanced-regression-techniques"

## Generate Predictions for Test Set

Finally, we can use our built pipeline to generate predictions for the test set which can be uploaded to Kaggle.

In [3]:
import joblib
import pandas as pd
import ames_notebooks
from app.config.settings import settings
from app.data_ingestion.read_data import DataReader

# add ../models to path
import os, sys
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Load the pipeline
full_pipeline = joblib.load("../models/full_pipeline.joblib")

print("Loading data...")
reader = DataReader(settings=settings)
train_data, test_data = reader.load_train_test()
print("Test shape:", test_data.shape)

# make predictions on test data
y_test = full_pipeline.predict(test_data)

# format for Kaggle
submission = pd.DataFrame({
    'Id': test_data.index,
    'SalePrice': y_test
})

submission

Loading data...
Test shape: (1459, 79)




Unnamed: 0,Id,SalePrice
0,1461,118724.609375
1,1462,157654.156250
2,1463,178829.109375
3,1464,190633.359375
4,1465,186439.625000
...,...,...
1454,2915,86488.953125
1455,2916,84667.218750
1456,2917,177849.125000
1457,2918,121736.273438


In [4]:
from datetime import datetime
now = datetime.now().strftime("%D_%T").replace('/', '-')

# save submission file
os.makedirs('../submissions', exist_ok=True)
submission_filename = f"submission_{now}.csv"
submission_path = f"../submissions/{submission_filename}"
submission.to_csv(submission_path, index=False)
print(f"Submission file saved to {submission_path}")

print("\nFirst few predictions:")
print(submission.head())

Submission file saved to ../submissions/submission_11-10-25_13:53:28.csv

First few predictions:
     Id      SalePrice
0  1461  118724.609375
1  1462  157654.156250
2  1463  178829.109375
3  1464  190633.359375
4  1465  186439.625000


In [8]:
message = f"submission {now}"
response = api.competition_submit(submission_path, message, competition)

# to solve latency with submission/query
from time import sleep
sleep(3)

response

100%|██████████| 21.2k/21.2k [00:00<00:00, 46.9kB/s]


{"message": "Successfully submitted to House Prices - Advanced Regression Techniques", "ref": 48057103}

In [9]:
leaderboard = api.competition_submissions(competition)
submission = [s for s in leaderboard if s.ref == response.ref][0]
other_submissions = [s for s in leaderboard if s.ref != response.ref]
other_submissions.sort(key = lambda x: x.date, reverse=True)

score = float(submission.public_score)
print(f"submission returned score of {score}")

print("\nLast 5 submissions:")
for s in other_submissions[:5]:
    print(f"\tSCORE: {s.public_score}")
    print(f"\tref: {s.ref}")
    print(f"\tdate: {s.date}")
    print(f"\tfile name: {s.file_name}")
    print(f"\tsubmitted by {s.submitted_by}\n")

submission returned score of 0.12412

Last 5 submissions:
	SCORE: 0.12412
	ref: 48057094
	date: 2025-11-10 18:53:30.163000
	file name: submission_11-10-25_135328.csv
	submitted by nicbolton

	SCORE: 0.12623
	ref: 47994766
	date: 2025-11-08 20:01:21
	file name: submission_11-08-25_200120.csv
	submitted by nicbolton

	SCORE: 0.12977
	ref: 47991291
	date: 2025-11-08 17:02:32.513000
	file name: submission_11-08-25_170232.csv
	submitted by nicbolton

	SCORE: 0.12540
	ref: 47991268
	date: 2025-11-08 17:01:19
	file name: submission_11-08-25_170118.csv
	submitted by nicbolton

	SCORE: 0.12570
	ref: 47920761
	date: 2025-11-05 22:31:32
	file name: submission_11-05-25_223132.csv
	submitted by nicbolton

