<a href="https://www.kaggle.com/code/danuherath/house-prices-regression-advanced?scriptVersionId=187654438" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<h1 align="center"> Iowa House Prices Prediction (Regression) </h1>

<img 
    src="https://storage.googleapis.com/kaggle-media/competitions/kaggle/5407/media/housesbanner.png"
    alt="" 
    height="300"
    width="500" 
    style="display: block; margin: 0 auto; border-radius:15px" 
/>

---

## Problem Definition

- Dataset

    - [House Prices - Advanced Regression Techniques](https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/data) dataset from Kaggle which contains 79 features explaining "(almost) every aspect of residential homes in Ames, Iowa". Train dataset contains 1,460 samples and each instance represents one house.

<br>

- Objective

    - The goal of this project is to predict the sales price for each house based on the above features.

<br>

- Algorithms

    - Following regression algorithms are used to train models. The models are evaluated using the  Root-Mean-Squared-Error (RMSE).
    
        - [TensorFlow Decision Forests (TFDF) - RandomForestModel](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/RandomForestModel)
        - [TensorFlow Decision Forests (TFDF) - GradientBoostedTreesModel](https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/GradientBoostedTreesModel)
    
<br>

---


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split

import tensorflow as tf
import tensorflow_decision_forests as tfdf
from keras_tuner import RandomSearch

# import warnings
# warnings.filterwarnings('ignore')

print("TensorFlow v" + tf.__version__)
print("TensorFlow Decision Forests v" + tfdf.__version__)


In [None]:
train_data = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/train.csv')
test_data = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/test.csv')


In [None]:
train_data.drop("Id", axis=1, inplace=True)
test_data.drop("Id", axis=1, inplace=True)


---

### No preprocessing needed for TFDF models. However, dataset must be converted to a [TF dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset).

---

In [None]:
train_df, val_df = train_test_split(train_data, test_size=0.2, random_state=42)

label = 'SalePrice'
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label=label, task = tfdf.keras.Task.REGRESSION)
val_ds = tfdf.keras.pd_dataframe_to_tf_dataset(val_df, label=label, task = tfdf.keras.Task.REGRESSION)


---

# Train Models

---

In [None]:
rf_model = tfdf.keras.RandomForestModel(task = tfdf.keras.Task.REGRESSION, verbose=0)

rf_model.compile(metrics=["mae"])

rf_model.fit(train_ds, verbose=0)

rf_score = rf_model.evaluate(val_ds, verbose=0)
print("Validation accuracy:", rf_score)


In [None]:
gbt_model = tfdf.keras.GradientBoostedTreesModel(task = tfdf.keras.Task.REGRESSION, verbose=0)

gbt_model.compile(metrics=["mae"])

gbt_model.fit(train_ds, verbose=0)

gbt_score = gbt_model.evaluate(val_ds, verbose=0)
print("Validation accuracy:", gbt_score)


---

# Tune Hyperparameters

---

In [None]:
# tuner_rf = tfdf.tuner.RandomSearch(num_trials=5, use_predefined_hps=True)

# rf_model_tuned = tfdf.keras.RandomForestModel(tuner=tuner_rf, task=tfdf.keras.Task.REGRESSION)
# rf_model_tuned.fit(train_ds)

# rf_tuned_score = rf_model_tuned.evaluate(val_ds)
# print("Validation accuracy:", rf_tuned_score)


In [None]:
tuner_gbt = tfdf.tuner.RandomSearch(num_trials=50, use_predefined_hps=True)

gbt_model_tuned = tfdf.keras.GradientBoostedTreesModel(tuner=tuner_gbt, task=tfdf.keras.Task.REGRESSION)
gbt_model_tuned.fit(train_ds)

gbt_tuned_score = gbt_model_tuned.evaluate(val_ds)
print("Validation accuracy:", gbt_tuned_score)


---

# Predict on Test Data

---

In [None]:
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_data, task = tfdf.keras.Task.REGRESSION)

# test_predictions = rf_model_tuned.predict(test_ds)
test_predictions = gbt_model_tuned.predict(test_ds)


In [None]:
submission = pd.read_csv("/kaggle/input/house-prices-advanced-regression-techniques/sample_submission.csv")
submission['SalePrice'] = test_predictions
submission.to_csv('submission.csv', index=False)

submission.head()
