Importing the necessary libraries:

- `tensorflow_decision_forests` for the Random Forest model
- `pandas` for data manipulation
- `numpy` for numerical operations

In [None]:
import tensorflow_decision_forests as tfdf
import pandas as pd
import numpy as np

First, we load the dataset

In [None]:
train_file_path = "/kaggle/input/house-prices-advanced-regression-techniques/train.csv"
test_file_path = "/kaggle/input/house-prices-advanced-regression-techniques/test.csv"

# train_file_path = "data/train.csv"
# test_file_path = "data/test.csv"

Now, we load the train dataset and test dataset into a pandas DataFrame and drop the `Id` column

In [None]:
dataset_df = pd.read_csv(train_file_path)
test_data = pd.read_csv(test_file_path)

dataset_df = dataset_df.drop('Id', axis=1)

Then, we split the dataset into training and validation sets using a `split_dataset` function.

The ratio of the test set is set to `0.30`.

In [None]:
def split_dataset(dataset, test_ratio=0.30):
  test_indices = np.random.rand(len(dataset)) < test_ratio
  return dataset[~test_indices], dataset[test_indices]

train_ds_pd, valid_ds_pd = split_dataset(dataset_df)

The label column is set to `SalePrice` since it is the target column.

In [None]:
label = 'SalePrice'

Now, we convert the `pandas` dataframes to `tf.data.Dataset` objects.

The `tfdf.keras.pd_dataframe_to_tf_dataset` function is used for this purpose.

In [None]:
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label=label, task=tfdf.keras.Task.REGRESSION)
valid_ds = tfdf.keras.pd_dataframe_to_tf_dataset(valid_ds_pd, label=label, task=tfdf.keras.Task.REGRESSION)

test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_data, task = tfdf.keras.Task.REGRESSION)

Finally, we create a Gradient Boosted Trees model and fit it to the training data.

In [None]:
model = tfdf.keras.GradientBoostedTreesModel(task = tfdf.keras.Task.REGRESSION)
model.compile(metrics=["mse"])

model.fit(x=train_ds)

The model is then used to make predictions on the test set.

In [None]:
predictions = model.predict(test_ds)

Finally, the predictions are saved to a CSV file.

In [None]:
sample_submission_df = pd.read_csv('/kaggle/input/house-prices-advanced-regression-techniques/sample_submission.csv')
# sample_submission_df = pd.read_csv('data/sample_submission.csv')
sample_submission_df['SalePrice'] = predictions
sample_submission_df.to_csv('/kaggle/working/submission.csv', index=False)
# sample_submission_df.to_csv('submission.csv', index=False)
sample_submission_df.head()