# Regression

This tutorial uses safeds on **house sales data** to predict house prices.


1. Load your data into a `Table`, the data is available under `docs/tutorials/data/pricing.csv`:


In [None]:
from safeds.data.tabular.containers import Table

pricing = Table.from_csv_file("data/house_sales.csv")
# For visualisation purposes we only print out the first 15 rows.
pricing.slice_rows(0,15)

2. Split the house sales dataset into two tables. A training set, that we will use later to implement a training model to predict the house price, containing 60% of the data, and a testing set containing the rest of the data.
Delete the column `price` from the test set, to be able to predict it later:


In [None]:
train_table, testing_table = pricing.split_rows(0.60)

test_table = testing_table.remove_columns(["price"]).shuffle_rows()

3. Mark the `price` `Column` as the target variable to be predicted. Include the `id` column only as an extra column, which is completely ignored by the model:

In [None]:
extra_names = ["id"]

train_tabular_dataset = train_table.to_tabular_dataset("price", extra_names)


4. Use `Decision Tree` regressor as a model for the regression. Pass the "train_tabular_dataset" table to the fit function of the model:


In [None]:
from safeds.ml.classical.regression import DecisionTreeRegressor

model = DecisionTreeRegressor()
fitted_model = model.fit(train_tabular_dataset)

5. Use the fitted decision tree regression model, that we trained on the training dataset to predict the price of a house in the test dataset.


In [None]:
prediction = fitted_model.predict(
    test_table
)
# For visualisation purposes we only print out the first 15 rows.
prediction.to_table().slice_rows(start=0, length=15)

6. You can test the mean absolute error of that model with the initial testing_table as follows:


In [None]:
test_tabular_dataset = testing_table.to_tabular_dataset("price", extra_names)

fitted_model.mean_absolute_error(test_tabular_dataset)
