# Regression

This tutorial uses Safe-DS on **house sales data** to predict house prices.


## File and Imports

Start by creating a Python-File with the suffix ``.py``.


## Reading Data

Download the house sales data from [here](https://github.com/Safe-DS/Datasets/blob/main/src/safeds_datasets/tabular/_house_sales/data/house_sales.csv) and load it into a `Table`:

In [None]:
from safeds.data.tabular.containers import Table

pricing = Table.from_csv_file("data/house_sales.csv")
# For visualisation purposes, we only print out the first 15 rows.
pricing.slice_rows(length=15)

## Cleaning your Data

At this point it is usual to clean the data. Here's an example how to do so:

In [None]:
pricing_columns = (
    # Removes columns "latitude" and "longitude" from table
    pricing.remove_columns(["latitude", "longitude"])
    # Removes rows which contain missing values
    .remove_rows_with_missing_values()
    # Removes rows which contain outliers
    .remove_rows_with_outliers()
)
# For visualisation purposes, we only print out the first 5 rows.
pricing_columns.slice_rows(length=5)

See how to perform further data cleaning in the dedicated [Data Processing Tutorial](../data_processing).

## Create Training and Testing Set

Split the house sales dataset into two tables. A training set, that will be used later to implement a training model to predict the house prices. It contains 60% of the data. The testing set contains the rest of the data.


In [None]:
train_table, testing_table = pricing_columns.split_rows(0.60)

Mark the `price` `Column` as the target variable to be predicted. Include the `id` column only as an extra column, which is completely ignored by the model:

In [None]:
extra_names = ["id"]

train_tabular_dataset = train_table.to_tabular_dataset("price", extra_names=extra_names)

## Creating and Fitting a Regressor

Use `Decision Tree` regressor as a model for the regression. Pass the "train_tabular_dataset" table to the fit function of the model:


In [None]:
from safeds.ml.classical.regression import DecisionTreeRegressor

fitted_model = DecisionTreeRegressor().fit(train_tabular_dataset)

## Predicting with the Fitted Regressor

Use the fitted decision tree regression model, that we trained on the training dataset to predict the price of a house in the test dataset.


In [None]:
prediction = fitted_model.predict(testing_table)
# For visualisation purposes we only print out the first 15 rows.
prediction.to_table().slice_rows(length=15)

## Evaluating the Fitted Regressor

You can test the mean absolute error of that model with the initial testing_table as follows:

In [None]:
fitted_model.mean_absolute_error(testing_table)

## Full Code

In [None]:
from safeds.data.tabular.containers import Table
from safeds.ml.classical.regression import DecisionTreeRegressor

pricing = Table.from_csv_file("data/house_sales.csv")

pricing_columns = (
    pricing.remove_columns(["latitude", "longitude"]).remove_rows_with_missing_values().remove_rows_with_outliers()
)

train_table, testing_table = pricing_columns.split_rows(0.60)

extra_names = ["id"]
train_tabular_dataset = train_table.to_tabular_dataset("price", extra_names=extra_names)

fitted_model = DecisionTreeRegressor().fit(train_tabular_dataset)
prediction = fitted_model.predict(testing_table)

fitted_model.mean_absolute_error(testing_table)