# Phase 3: Submitting to Kaggle

The only way for us to test the strength of our model is by uploading the test predictions to Kaggle

## Setting up Kaggle

If you haven't set up authentication with Kaggle yet (you can test this by running the cell below), follow these steps:

1. Go to the Account tab of your [Kaggle profile](https://www.kaggle.com/settings/account)
2. Select 'Create New Token' (which will download a file `kaggle.json`)
3. If you are on a UNIX-based OS, place this at `~/.kaggle/kaggle.json`
    - For Windows, place this at `C:\Users\<Windows-username>\.kaggle\kaggle.json`

In [1]:
from dotenv import load_dotenv
load_dotenv()

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

competition = "house-prices-advanced-regression-techniques"

## Generate Predictions for Test Set

Finally, we can use our built pipeline to generate predictions for the test set which can be uploaded to Kaggle.

In [2]:
import ames_notebooks
from app.data_ingestion.read_data import DataReader
from app.pipelines.preprocessing import get_fitted_pipelines
TRACKING_URI = "sqlite:///../mlflow.db" #"./mlruns"

print("Loading data...")
reader = DataReader()
train_data, test_data = reader.load_train_test()
print("Test shape:", test_data.shape)

feature_preprocessor, target_transformer = get_fitted_pipelines(train_data)

from app.inference.predict import AmesPredictor
predictor = AmesPredictor(feature_engineer=feature_preprocessor, tracking_uri=TRACKING_URI, model_name="xgboost-optimized")
predictor.model

Loading data...
Test shape: (1459, 79)


2025/11/20 09:20:53 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/20 09:20:53 INFO mlflow.store.db.utils: Updating database tables
2025-11-20 09:20:53 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-20 09:20:53 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025-11-20 09:20:53 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-20 09:20:53 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
2025/11/20 09:20:53 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/11/20 09:20:53 INFO mlflow.store.db.utils: Updating database tables
2025-11-20 09:20:53 INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
2025-11-20 09:20:53 INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
  self.get_booster().load_model(fname)


mlflow.pyfunc.loaded_model:
  artifact_path: /Users/nic/git/AmesHousingPredictor/notebooks/mlruns/1/models/m-a81e284cb6ae4d88aef2876b59d6b3a3/artifacts
  flavor: mlflow.xgboost
  run_id: efa3c97342fc4878b55faebee6bbf5cd

In [7]:
import pandas as pd

y_test = predictor.predict(test_data, target_transform=target_transformer.inverse_transform)

# format for Kaggle
submission = pd.DataFrame({
    'Id': test_data.index,
    'SalePrice': y_test
})

submission



Unnamed: 0,Id,SalePrice
0,1461,121696.335938
1,1462,162980.406250
2,1463,180276.218750
3,1464,200621.031250
4,1465,182689.218750
...,...,...
1454,2915,90899.476562
1455,2916,87498.789062
1456,2917,171287.015625
1457,2918,125208.710938


In [8]:
import os

from datetime import datetime
now = datetime.now().strftime("%D_%T").replace('/', '-')

# save submission file
os.makedirs('../submissions', exist_ok=True)
submission_filename = f"submission_{now}.csv"
submission_path = f"../submissions/{submission_filename}"
submission.to_csv(submission_path, index=False)
print(f"Submission file saved to {submission_path}")

print("\nFirst few predictions:")
print(submission.head())

Submission file saved to ../submissions/submission_11-20-25_09:21:30.csv

First few predictions:
     Id      SalePrice
0  1461  121696.335938
1  1462  162980.406250
2  1463  180276.218750
3  1464  200621.031250
4  1465  182689.218750


In [8]:
message = f"submission {now}"
response = api.competition_submit(submission_path, message, competition)

# to solve latency with submission/query
from time import sleep
sleep(3)

response

100%|██████████| 21.1k/21.1k [00:00<00:00, 43.2kB/s]


{"message": "Successfully submitted to House Prices - Advanced Regression Techniques", "ref": 48333960}

In [9]:
leaderboard = api.competition_submissions(competition)
submission = [s for s in leaderboard if s.ref == response.ref][0]
other_submissions = [s for s in leaderboard if s.ref != response.ref]
other_submissions.sort(key = lambda x: x.date, reverse=True)

score = float(submission.public_score)
print(f"submission returned score of {score}")

print("\nLast 5 submissions:")
for s in other_submissions[:5]:
    print(f"\tSCORE: {s.public_score}")
    print(f"\tref: {s.ref}")
    print(f"\tdate: {s.date}")
    print(f"\tfile name: {s.file_name}")
    print(f"\tsubmitted by {s.submitted_by}\n")

submission returned score of 0.12921

Last 5 submissions:
	SCORE: 0.12412
	ref: 48057103
	date: 2025-11-10 18:53:51
	file name: submission_11-10-25_135328.csv
	submitted by nicbolton

	SCORE: 0.12412
	ref: 48057094
	date: 2025-11-10 18:53:30.163000
	file name: submission_11-10-25_135328.csv
	submitted by nicbolton

	SCORE: 0.12623
	ref: 47994766
	date: 2025-11-08 20:01:21
	file name: submission_11-08-25_200120.csv
	submitted by nicbolton

	SCORE: 0.12977
	ref: 47991291
	date: 2025-11-08 17:02:32.513000
	file name: submission_11-08-25_170232.csv
	submitted by nicbolton

	SCORE: 0.12540
	ref: 47991268
	date: 2025-11-08 17:01:19
	file name: submission_11-08-25_170118.csv
	submitted by nicbolton

