**NOTE: This notebook is written for the Google Colab platform, which provides free hardware acceleration. However it can also be run (possibly with minor modifications) as a standard Jupyter notebook, using a local GPU.**

In [None]:
#@title -- Installation of Packages -- { display-mode: "form" }
import sys
!{sys.executable} -m pip install skorch
!{sys.executable} -m pip install git+https://github.com/michalgregor/class_utils.git

In [None]:
#@title -- Import of Necessary Packages -- { display-mode: "form" }
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder, KBinsDiscretizer
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from skorch import NeuralNetRegressor
from class_utils import error_histogram
import torch.nn as nn
import torch

In [None]:
#@title -- Downloading Data -- { display-mode: "form" }
!mkdir -p output
!mkdir -p data/boston_housing
!wget -nc -O data/boston_housing.zip https://www.dropbox.com/s/3jnf3000vwaxtcg/boston_housing.zip?dl=1
!unzip -oq -d data/boston_housing data/boston_housing.zip

# A Real-Estate-Price Regression Model

In this notebook we will apply neural regression to the problem of real estate price prediction. We will make use of the [Boston housing dataset](https://www.kaggle.com/c/boston-housing).

## Loading and Splitting the Dataset

Let us start by displaying the description of the dataset:

In [None]:
with open("data/boston_housing/description.txt", "r") as file:
    print("".join(file.readlines()))

As the next step, we will load the dataset itself from a CSV file:

In [None]:
df = pd.read_csv("data/boston_housing/housing.csv")
df.head()

We will split the data into the train and test set, stratifying by the discretized version of the output column:

In [None]:
kbins = KBinsDiscretizer(10, encode='ordinal')
y_stratify = kbins.fit_transform(df[["medv"]])
df_train, df_test = train_test_split(df, stratify=y_stratify,
                        test_size=0.25, random_state=4)

---
## Task 1: Data Preprocessing

**Apply our standard preprocessing procedure for neural nets to the data and produce the training set ``X_train``, ``Y_train`` and the testing set ``X_test``, ``Y_test`` as the result: in the necessary form and cast to the appropriate data type.**

---

In [None]:
categorical_inputs = [          ] # -----

numeric_inputs = [              ] # -----

output = ["medv"]



# -----


output_preproc = StandardScaler()


# -----



---
## Task 2: Creation of Neural Net and Training

**Create a neural regressor and train it using the train set. The result should be a trained ``net`` object with a ``scikit-learn`` interface, the performance of which we will subsequently be able to test using the test set.**

---

In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"

num_inputs = X_train.shape[1]
num_outputs = Y_train.shape[1]

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # -----
        
        
        

net = NeuralNetRegressor(
    Net,
    max_epochs=200,
    batch_size=-1,
    optimizer=torch.optim.Adam,
    train_split=None,
    device=device
)

In [None]:
net.fit(X_train, Y_train)

## Testing

We verify generalization using the testing set:

In [None]:
#@title -- Testing -- { display-mode: "form" }
y_test = net.predict(X_test)
min_output = np.min(Y_test)
max_output = np.max(Y_test)

# we compute and display the MSE and the MAE
mse = mean_squared_error(Y_test, y_test)
print("MSE = {}".format(mse))

mae = mean_absolute_error(Y_test, y_test)
print("MAE = {}".format(mae))

plt.figure(figsize=(8, 6))
error_histogram(Y_test, y_test, Y_fit_scaling=Y_train)
plt.savefig("output/error_output_histogram.pdf", bbox_inches='tight', ppad_inches=0)