# Optional: House Prices using Regression

This notebook will focus on the application of Neural Networks for house price prediction. Instead of images as pixel-wise inputs, we will consider generic values that describe various features for houses.

![teaser](images/teaser.jpg)

In our previous exercises we omitted a detailed data preperation overview, as it was not the main focus of those exercises since it was disctracting your attention from the basics of neural networks. In this notebook, we will optionally look a little more in-depth in data analysis. In the end, the class will not focus tremendously on these tasks but they will be essential routines that you will encounter if you choose to work in the area of deep/machine learning.

The task itself is a *regression* problem and we will give you a few hinters how you can extend your build deep learning library of the first notebook to solve it, but you can also explore it as a *classification* task.

<div class="alert alert-danger">
    <h3>Warning</h3>
    <p>We will not explore neural network implementations in this exercise. If you want to train a network in the end, we suggest to work on "1_FullyConnectedNets.ipynb" first before starting the neural network part of this submission as we will be using the fully connected neural network class of the mentioned notebook.</p>
</div>

In [1]:
# As usual, a bit of setup

import time
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
%load_ext autoreload
%autoreload 2

# House Price Data
## Exploration

Make sure to run the *download_datasets.sh* script first before running the upcoming cell. Previously, we provided you with a data loading wrapper function to access the CIFAR10 data. This time around, our input is a csv file which we will load ourselves using [pandas](https://pandas.pydata.org) where we can easily access and alter entries in our data matrix. Let's have a small glimpse how the data looks like!

In [3]:
# Load the data
data = pd.read_csv("../datasets/house_prices_data.csv")
labels = pd.read_csv("../datasets/house_prices_labels.csv")

In [4]:
#You can easily get an overview of our features using .info(). Note that not all features are actually numbers!
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 79 columns):
MSSubClass       1000 non-null int64
MSZoning         1000 non-null object
LotFrontage      827 non-null float64
LotArea          1000 non-null int64
Street           1000 non-null object
Alley            64 non-null object
LotShape         1000 non-null object
LandContour      1000 non-null object
Utilities        1000 non-null object
LotConfig        1000 non-null object
LandSlope        1000 non-null object
Neighborhood     1000 non-null object
Condition1       1000 non-null object
Condition2       1000 non-null object
BldgType         1000 non-null object
HouseStyle       1000 non-null object
OverallQual      1000 non-null int64
OverallCond      1000 non-null int64
YearBuilt        1000 non-null int64
YearRemodAdd     1000 non-null int64
RoofStyle        1000 non-null object
RoofMatl         1000 non-null object
Exterior1st      1000 non-null object
Exterior2nd      1000 non-nu

In [5]:
# Using the describe function we can get an overview about numerical ranges
data.describe()

Unnamed: 0,MSSubClass,LotFrontage,LotArea,OverallQual,OverallCond,YearBuilt,YearRemodAdd,MasVnrArea,BsmtFinSF1,BsmtFinSF2,...,GarageArea,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,MiscVal,MoSold,YrSold
count,1000.0,827.0,1000.0,1000.0,1000.0,1000.0,1000.0,993.0,1000.0,1000.0,...,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,56.96,70.088271,10582.854,6.135,5.566,1971.083,1984.754,109.5428,456.256,42.119,...,477.31,97.492,47.073,22.154,4.078,14.684,2.771,32.134,6.216,2007.829
std,42.395233,24.727977,10423.604539,1.388057,1.118434,30.498234,20.710925,182.186082,474.339455,150.778045,...,217.629834,126.267618,64.82921,62.559267,33.389771,55.1937,39.32069,323.319126,2.722638,1.336992
min,20.0,21.0,1300.0,1.0,2.0,1872.0,1950.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2006.0
25%,20.0,59.0,7530.0,5.0,5.0,1953.0,1967.0,0.0,0.0,0.0,...,334.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,2007.0
50%,50.0,70.0,9475.0,6.0,5.0,1973.0,1993.0,0.0,385.0,0.0,...,480.0,0.0,28.0,0.0,0.0,0.0,0.0,0.0,6.0,2008.0
75%,70.0,80.0,11640.5,7.0,6.0,2001.0,2004.0,176.0,738.25,0.0,...,588.0,171.0,68.0,0.0,0.0,0.0,0.0,0.0,8.0,2009.0
max,190.0,313.0,215245.0,10.0,9.0,2010.0,2010.0,1378.0,5644.0,1085.0,...,1418.0,736.0,547.0,552.0,508.0,480.0,648.0,8300.0,12.0,2010.0


In [6]:
# Our target variable is the SalesPrice
# The pandas dataframe string it is data_train
# We explore it here
labels['SalePrice'].describe()

count      1000.000000
mean     182814.417000
std       81736.545419
min       35311.000000
25%      130000.000000
50%      165000.000000
75%      215000.000000
max      755000.000000
Name: SalePrice, dtype: float64

We will apply operations to a small subset of our data first.

In [7]:
# Create smaller test data frames
data_small = data[:5]
labels_small = labels[:5]

## Transforms

In comparison to the previous notebooks, we always provided the data and label transforms for your convenience. However, if you want to apply your model to real life scenarios, you will have to provide a transform function that prepares raw data for your network. We already wrote a short transform class for you which will provide initial data augmenations for you under **exercise_code/transforms.py**. However, if you want to improve your network's performance, you will have to edit this class and change it to your liking!

In the end, you will not only submit your network model file but als your transform class! Using this class we can alter the test data as well as labels so that your neural network can correctly classify it.

In [8]:
# Import the transform class
from exercise_code.transforms import Transforms

## Missing Data & Non-Numerical Values

Real life data is usually not perfect. There might be unreasonable or even missing entries. Let us first check out missing entries in our dataset.

In [9]:
# Explore missing data
total = data.isnull().sum().sort_values(ascending=False)
percent = (data.isnull().sum()/data.isnull().count()).sort_values(ascending=False)
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
# Only show top 20 entries
missing_data[:20]

Unnamed: 0,Total,Percent
PoolQC,995,0.995
MiscFeature,969,0.969
Alley,936,0.936
Fence,796,0.796
FireplaceQu,447,0.447
LotFrontage,173,0.173
GarageCond,56,0.056
GarageType,56,0.056
GarageYrBlt,56,0.056
GarageFinish,56,0.056


As we have seen before, there are some columns that contains string information which can not parsed that easily. In order to use this information, one has to transform them into a (for the network) read-able format. 

For the initial setup, we will do the following two steps:
- we will omit non-numerical columns
- we will set all numerical missing values to 0

Those decisions are obviously not optimal. You are free to explore how one should use non-numerical entries to your advantage.

<div class="alert alert-info">
    <h3>Inline Question</h3>
    <p>What solution would you propose to handle missing numerical values? Do you think we can just transform non-numerical values to integers or should one do something more elaborate? In addition to that there might be new non-numerical entries that are not present in the training set. What would you propose how we should handle those?</p>
    <p>**Your answer:** </p>
</div>

In [10]:
# For this notebook, we are performing a regression here for numerical attributes only
data_small = Transforms.get_only_numeric_attributes(data_small, verbose=True)
labels_small = Transforms.get_only_numeric_attributes(labels_small, verbose=True)

<class 'pandas.core.frame.DataFrame'>
Shape of the processed data with numerical features: (5, 36)
<class 'pandas.core.frame.DataFrame'>
Shape of the processed data with numerical features: (5, 1)


As you can see, we lost quite a bit of our features and are only left with 36 features.

## Normalization

As we have seen in our batch normalization notebook, it is of utmost important to properly normalize the input values for our network. We use our provided function.

In [11]:
# Based on the min and max values that were learned on the training set,
# we are going to apply the same on the validation set
data_small = Transforms.min_max_scalar(data_small, invoker='data')
labels_small = Transforms.min_max_scalar(labels_small, invoker='label')

## Finished Transform Class

Our final transform class should have a single call to transform data as well as labels. This function will be called by our test as well and you should change it if you want to apply other transforms.

In [12]:
# Create  a new small test set first
data_small = data[:5]
labels_small = labels[:5]

In [13]:
# Check the full transform function for data as well as labels
prepared_data_small = Transforms.apply_data_transforms(data_small)
prepared_labels = Transforms.apply_labels_transforms(labels_small)

<class 'pandas.core.frame.DataFrame'>


In [14]:
# Lets have a short look to compare our prepared labels
print("Original labels:", labels_small)
print("Normalized labels:", prepared_labels)

Original labels:    SalePrice
0     176485
1     385000
2     395000
3     230000
4     157000
Normalized labels: [[0.08186975]
 [0.95798319]
 [1.        ]
 [0.30672269]
 [0.        ]]


## Data split

As for every training task we have to split up our provided data to validate our trained models.

In [15]:
# Perform the split on data to create train and validation set
X_train = data[:800]
X_val = data[800:]
y_train = labels[:800]
y_val = labels[800:]
print(X_train.shape, X_val.shape, y_train.shape, y_val.shape)

(800, 79) (200, 79) (800, 1) (200, 1)


Check out the **min_max_scalar** function of the **Transforms** class. This function scales all values between zero and one as one can do for images. The difference here is that -- while images have a fixed minimum and maximum in 0 and 255 -- we don't know the theoretical minima and maxima for your data and thus our function is evoked differently for the training and validation/test set. 

You have to take those things into account for all transforms you'd like to implement!

In [16]:
X_train = Transforms.apply_data_transforms(X_train, mode='train')
y_train = Transforms.apply_labels_transforms(y_train, mode='train')

<class 'pandas.core.frame.DataFrame'>


In [17]:
# Shapes of the training data
X_train.shape, y_train.shape

((800, 36), (800, 1))

In [18]:
# Convert the validation data using our now initialized transform class
X_val = Transforms.apply_data_transforms(X_val,mode='val')
y_val = Transforms.apply_labels_transforms(y_val, mode='val')

<class 'pandas.core.frame.DataFrame'>


In [18]:
X_val.shape, y_val.shape

((200, 36), (200, 1))

Phew, that was a lot of work, but we are now done with our data preperation for now and can move to the actual network part!

# Next _optional_ steps:

## Classification of Mean

We are now ready to train networks. As a first step, you can simplify the task to a simple classification task that we can solve using our network structure from the first notebook.

In order to to this, we have to use different labels. We will alter our labels such that we return 1 if the entry is bigger than the mean of our training set or 0 if it is smaller. This can be done easily with our transform function.

### Data Preparation

In [19]:
# Transform the labels into binary values since this is a classification task
def convert_to_binary_label(input_vector):
    """
    :param input_vector: a vector of real numbers
    :return: vector of 0,1 depending upon if the value is greater than mean value of the vector
    """
    mean_value = np.mean(input_vector)
    label_vector = np.array(input_vector > mean_value)
    return label_vector.astype(int)

In [20]:
y_train_binary = convert_to_binary_label(y_train)
y_val_binary = convert_to_binary_label(y_val)

In [21]:
# Convert values into a dictionary of the required format
# This dictionary can be fed into the NN
input_data = Transforms.prepare_dictionary(X_train, X_val, y_train_binary, y_val_binary)

### Network

Now it is your turn! Initialize a neural network as in the first notebook using the **FullyConnectedNet** class that you wrote prior. This task should not be that hard but as a sanity check you should first try to overfit on a smaller data first.

In [22]:
# Import the previously used solver and network classes
from exercise_code.solver import Solver
from exercise_code.networks.fc_net import *

best_acc = 0
best_params = []
#lr_range = [1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5]
#lr_decay_range = [1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5]
#weight_scale_range = [1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5]
lr_range = [1e-4]
#lr_decay_range = [1e-10, 1e-7, 1e-4, 1e-1, 1, 1e3, 1e5]
weight_scale_range = [1e-4]
#lr_range = [1e-3]
lr_decay_range = [0.9]
#weight_scale_range = [5e-2]
total_iterations = len(lr_range) * len(lr_decay_range) * len(weight_scale_range)
iteration = 0
for lr in lr_range:
    for lr_decay in lr_decay_range:
        for weight_scale in weight_scale_range:
            iteration += 1
            print(f'\n******************\nStarting iteration: {iteration} / {total_iterations}')
            print(f'\nLearning rate: {lr} - Decay: {lr_decay} - Weight scale: {weight_scale}')
            model = FullyConnectedNet([100, 100, 100, 100, 100], input_dim=36, num_classes=2, 
                                      weight_scale=weight_scale,use_batchnorm=True, dropout=0.75)
            solver = Solver(model,input_data,update_rule='adam',optim_config={'learning_rate':lr},
                            lr_decay=lr_decay,num_epochs=20,batch_size=200,print_every=100
                           )
            solver.train()
            val_acc = solver.check_accuracy(input_data['X_val'], input_data['y_val'])
            if val_acc>best_acc:
                best_acc = val_acc
                best_model = model
                best_params = [lr, lr_decay, weight_scale]

print("best_acc = ", best_acc)
print("best_params = ", best_params)


******************
Starting iteration: 1 / 1

Learning rate: 0.0001 - Decay: 0.9 - Weight scale: 0.0001
(Iteration 1 / 80) loss: 138.630371
(Epoch 0 / 20) train acc: 0.620000; val_acc: 0.605000
(Epoch 1 / 20) train acc: 0.616700; val_acc: 0.603950
(Epoch 2 / 20) train acc: 0.566300; val_acc: 0.561950
(Epoch 3 / 20) train acc: 0.545300; val_acc: 0.533600
(Epoch 4 / 20) train acc: 0.517100; val_acc: 0.512600
(Epoch 5 / 20) train acc: 0.517400; val_acc: 0.523100
(Epoch 6 / 20) train acc: 0.516800; val_acc: 0.521000
(Epoch 7 / 20) train acc: 0.509300; val_acc: 0.516800
(Epoch 8 / 20) train acc: 0.511400; val_acc: 0.512600
(Epoch 9 / 20) train acc: 0.515000; val_acc: 0.508400
(Epoch 10 / 20) train acc: 0.512900; val_acc: 0.500000
(Epoch 11 / 20) train acc: 0.510200; val_acc: 0.504200
(Epoch 12 / 20) train acc: 0.511700; val_acc: 0.505250
(Epoch 13 / 20) train acc: 0.512900; val_acc: 0.511550
(Epoch 14 / 20) train acc: 0.515300; val_acc: 0.513650
(Epoch 15 / 20) train acc: 0.514700; val_acc

## A first look at: Regression

Previously, we approached the problem similarly to our previous tasks, i.e., we considered it like a classification problem. We can approximate the actual value by using more fine grained buckets but we have a hard time predicting an actual value.

For this task, we will explore regression to directly predict the actual numerical value. We have to make some changes to our loss function to use the **l2 loss**. Please take a look at the `l2_loss` function in `layers.py` for the implementation.

There are also some updates for the solver. As we directly predict float values, there is no easy notion of "accuracy". Thus, we will only consider the loss value and look for the model with the smallest loss.  Please take a look at `update_accuracy` function in `solver.py`.

In [23]:
# Convert values into a dictionary of the required format
# This dictionary can be fed into the NN
input_data_for_NN = Transforms.prepare_dictionary(X_train, X_val, y_train, y_val)

In [29]:
from exercise_code.solver import Solver
from exercise_code.networks.fc_net import *

best_acc = 999999999
best_params = []
#lr_range = [1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5]
#lr_decay_range = [1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5]
#weight_scale_range = [1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4, 1e5]
lr_range = [1e-4]
#lr_decay_range = [1e-10, 1e-7, 1e-4, 1e-1, 1, 1e3, 1e5]
weight_scale_range = [1e-10]
#lr_range = [1e-3]
lr_decay_range = [0.9]
#weight_scale_range = [5e-2]
total_iterations = len(lr_range) * len(lr_decay_range) * len(weight_scale_range)
iteration = 0
for lr in lr_range:
    for lr_decay in lr_decay_range:
        for weight_scale in weight_scale_range:
            iteration += 1
            print(f'\n******************\nStarting iteration: {iteration} / {total_iterations}')
            print(f'\nLearning rate: {lr} - Decay: {lr_decay} - Weight scale: {weight_scale}')
            model = FullyConnectedNet([100, 100, 100, 100, 100], input_dim=36, num_classes=2, 
                                      weight_scale=weight_scale, use_batchnorm=True, dropout=0.75, 
                                      loss_function='l2')
            solver = Solver(model,input_data,update_rule='adam',optim_config={'learning_rate':lr},
                            lr_decay=lr_decay,num_epochs=2000,batch_size=200,print_every=100
                           )
            solver.train()
            val_acc = solver.check_accuracy(input_data['X_val'], input_data['y_val'])
            if val_acc<best_acc:
                best_acc = val_acc
                best_model = model
                best_params = [lr, lr_decay, weight_scale]

print("best_acc = ", best_acc)
print("best_params = ", best_params)


******************
Starting iteration: 1 / 1

Learning rate: 0.0001 - Decay: 0.9 - Weight scale: 1e-10
(Iteration 1 / 8000) loss: 12.328828
(Epoch 0 / 2000) train loss: 0.610267; val_loss: 0.622192
(Epoch 1 / 2000) train loss: 0.592614; val_loss: 0.604145
(Epoch 2 / 2000) train loss: 0.555668; val_loss: 0.567399
(Epoch 3 / 2000) train loss: 0.499394; val_loss: 0.510340
(Epoch 4 / 2000) train loss: 0.464390; val_loss: 0.475572
(Epoch 5 / 2000) train loss: 0.434494; val_loss: 0.445482
(Epoch 6 / 2000) train loss: 0.408738; val_loss: 0.419323
(Epoch 7 / 2000) train loss: 0.386203; val_loss: 0.396255
(Epoch 8 / 2000) train loss: 0.366945; val_loss: 0.376534
(Epoch 9 / 2000) train loss: 0.350706; val_loss: 0.359679
(Epoch 10 / 2000) train loss: 0.337453; val_loss: 0.346025
(Epoch 11 / 2000) train loss: 0.326436; val_loss: 0.334483
(Epoch 12 / 2000) train loss: 0.317588; val_loss: 0.324998
(Epoch 13 / 2000) train loss: 0.310581; val_loss: 0.317243
(Epoch 14 / 2000) train loss: 0.305075; val

(Epoch 133 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 134 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 135 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 136 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 137 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 138 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 139 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 140 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 141 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 142 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 143 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 144 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 145 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 146 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 147 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 148 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 149 / 2000) train loss: 0.290166;

(Epoch 267 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 268 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 269 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 270 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 271 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 272 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 273 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 274 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 275 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 1101 / 8000) loss: 8.516347
(Epoch 276 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 277 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 278 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 279 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 280 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 281 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 282 / 2000) train loss: 0.290166; val_loss: 0.289283
(

(Epoch 401 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 402 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 403 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 404 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 405 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 406 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 407 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 408 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 409 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 410 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 411 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 412 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 413 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 414 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 415 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 416 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 417 / 2000) train loss: 0.290166;

(Epoch 535 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 536 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 537 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 538 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 539 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 540 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 541 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 542 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 543 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 544 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 545 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 546 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 547 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 548 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 549 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 550 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 2201 / 8000) loss: 9.723546
(

(Epoch 669 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 670 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 671 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 672 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 673 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 674 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 675 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 2701 / 8000) loss: 9.968705
(Epoch 676 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 677 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 678 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 679 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 680 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 681 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 682 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 683 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 684 / 2000) train loss: 0.290166; val_loss: 0.289283
(

(Epoch 802 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 803 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 804 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 805 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 806 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 807 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 808 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 809 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 810 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 811 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 812 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 813 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 814 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 815 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 816 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 817 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 818 / 2000) train loss: 0.290166;

(Epoch 936 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 937 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 938 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 939 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 940 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 941 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 942 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 943 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 944 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 945 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 946 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 947 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 948 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 949 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 950 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 3801 / 8000) loss: 9.665250
(Epoch 951 / 2000) train loss: 0.290166; val_loss: 0.289283
(

(Epoch 1070 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1071 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1072 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1073 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1074 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1075 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 4301 / 8000) loss: 9.634555
(Epoch 1076 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1077 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1078 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1079 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1080 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1081 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1082 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1083 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1084 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1085 / 2000) train loss: 0.290166; val_

(Epoch 1201 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1202 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1203 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1204 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1205 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1206 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1207 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1208 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1209 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1210 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1211 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1212 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1213 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1214 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1215 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1216 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1217 / 2000) trai

(Epoch 1333 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1334 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1335 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1336 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1337 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1338 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1339 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1340 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1341 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1342 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1343 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1344 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1345 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1346 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1347 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1348 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1349 / 2000) trai

(Epoch 1465 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1466 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1467 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1468 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1469 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1470 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1471 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1472 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1473 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1474 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1475 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 5901 / 8000) loss: 10.036571
(Epoch 1476 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1477 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1478 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1479 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1480 / 2000) train loss: 0.290166; val

(Epoch 1597 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1598 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1599 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1600 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 6401 / 8000) loss: 9.643799
(Epoch 1601 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1602 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1603 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1604 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1605 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1606 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1607 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1608 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1609 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1610 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1611 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1612 / 2000) train loss: 0.290166; val_

(Epoch 1728 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1729 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1730 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1731 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1732 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1733 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1734 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1735 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1736 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1737 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1738 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1739 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1740 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1741 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1742 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1743 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1744 / 2000) trai

(Epoch 1860 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1861 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1862 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1863 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1864 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1865 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1866 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1867 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1868 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1869 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1870 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1871 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1872 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1873 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1874 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1875 / 2000) train loss: 0.290166; val_loss: 0.289283
(Iteration 7501 / 8000) 

(Epoch 1992 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1993 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1994 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1995 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1996 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1997 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1998 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 1999 / 2000) train loss: 0.290166; val_loss: 0.289283
(Epoch 2000 / 2000) train loss: 0.290166; val_loss: 0.289283
best_acc =  0.28863359874478833
best_params =  [0.0001, 0.9, 1e-10]


## Save the model

When you are satisfied with your training, you can save the model but there will be no submission for this notebook. The required number of submissions for this class have been reduced to 6 from 7.

<div class="alert alert-danger">
    <h3>Warning</h3>
    <p>You might get an error like this:</p>
    <p>PicklingError: Can't pickle <class 'exercise_code.classifiers.softmax.SoftmaxClassifier'>: it's not the same object as exercise_code.classifiers.softmax.SoftmaxClassifier</p>
    <p>The reason is that we are using autoreload and working on this class during the notebook session. If you get this error simply restart the kernel and rerun the whole script (Kernel -> Restart & Run All) or only the important cells for generating your model.</p>
</div>

In [28]:
from exercise_code.model_savers import save_fully_connected_net
save_fully_connected_net(best_model, modelname='house_prices')