``Difference with previous versions:``
- Using a different approach to Encoding and imputing data, meaning that having more zeros for either the missing numerical values, and nan values in the categorical ones. Since I will be using all (or at least most) the feature in the dataset it could be helpfull to just have zeros rather values that are probably misleading. The columns with a low number of missing values will just imputed using the KNN algorithm.
- Using regularizers more extensively, as well as controlling the properties of Layers such as weight and bias-initializers more closely.
- Written some new utility functions that can help enhance EDA process.

In [1]:
# Imports:
import pandas as pd
import numpy as np
from utils import *
import seaborn as sns

In [2]:
# Retrieve Data
data = retrieve_data()
train = data['train'].copy()
test = data['test'].copy()

# The dependent feature
y_feat = 'SalePrice'

# Preprocessing:
The general strategy is to combine both the categorical and numerical values in the training and testing and then process them at the same time. For categorical variables we will be getting dictionaries from the training data and then process them for the combined dataframe. In the case of numerical features, imputation is going to be done using the KNN imputation.

1. Encoding categorical features: I'll be using some functions written in utils.py to come up with meaning values for the unique keys in each of the categorical features, then map them in the data given certain conditions.
2. Imputing numerical data: The numerical 

## In-depth analysis of categorical variables:
1. Compare the different NaNs for the same categories (and not) in the number of NaNs they have.
2. Given that 90% data is not missing for a given feature (column) map their encoded numerical values in the dataframe, otherwise, only impute non-nan values in the feature and then impute the rest of the missing values using any other technique. Dropping the column for values with too many missing might be a general option but in order to use the data for Nerual Networks, it would make sense to just impute the missing values with zeros.

## Encoding categorical variables:
In order to come up with a meaningful value for any given unique value in a categorical feature column, we will be considering the average SalePrice for each of those unique values and weight them relative to each other. The important thing to note would be that given that more than 90% exists in a column we could just impute the minor missing values with the average of SalePrice for those columns. But if less then 90% of the data existed then there would be a problem since our measures would not make sense and since we are using Neural Networks it would make more sense to impute them with zeros.


In [4]:
# Get the DataFrames
cat_info, num_info = missing_info(data)

In [5]:
cat_info

Unnamed: 0,Test,Train
Alley,1352,1369
MasVnrType,16,8
BsmtQual,44,37
BsmtCond,45,37
BsmtExposure,44,38
BsmtFinType1,42,37
BsmtFinType2,42,38
Electrical,0,1
FireplaceQu,730,690
GarageType,76,81


### Important to note for categorical features:
1. Alley, PoolQc, Fence, MiscFeature are the features with an ecessive number of missing values both in training and testing.
2. FireplaceQu is not as bad ass the described functions but it is going to be treated the same way.
3. Although for some these values NA means that they just don't have that feature: Alley, MiscFeature, PoolQc

Note: To conclude there are 5 features that the np.nan values in them should not be imputed with their given dictionary value but a zero.


In [6]:
num_info

Unnamed: 0,Test,Train
LotFrontage,259.0,227
MasVnrArea,8.0,15
GarageYrBlt,81.0,78
BsmtFinSF1,0.0,1
BsmtFinSF2,0.0,1
BsmtUnfSF,0.0,1
TotalBsmtSF,0.0,1
BsmtFullBath,0.0,2
BsmtHalfBath,0.0,2
GarageCars,0.0,1


## Numerical data:
Based on this dataframe, there some features missing in Training that are not missing in the test data. There is no need manually impute anything in the case of numerical values and I am just going to let KNN handle it.

In [3]:
# Get the length of training data to rebreak the combined data further along the way
train_len = train.shape[0]
test_len = test.shape[0]

# Combine the train and test:
# Note: Pass the copies so the actual dataframes won't change and we can still use them
feat_cols = combine_train_test(train.copy(), test.copy())

In [8]:
feat_cols

Unnamed: 0,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,Inside,...,0,0,,,,0,2,2008,WD,Normal
1,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,FR2,...,0,0,,,,0,5,2007,WD,Normal
2,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,Inside,...,0,0,,,,0,9,2008,WD,Normal
3,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,Corner,...,0,0,,,,0,2,2006,WD,Abnorml
4,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,FR2,...,0,0,,,,0,12,2008,WD,Normal
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2914,160,RM,21.0,1936,Pave,,Reg,Lvl,AllPub,Inside,...,0,0,,,,0,6,2006,WD,Normal
2915,160,RM,21.0,1894,Pave,,Reg,Lvl,AllPub,Inside,...,0,0,,,,0,4,2006,WD,Abnorml
2916,20,RL,160.0,20000,Pave,,Reg,Lvl,AllPub,Inside,...,0,0,,,,0,9,2006,WD,Abnorml
2917,85,RL,62.0,10441,Pave,,Reg,Lvl,AllPub,Inside,...,0,0,,MnPrv,Shed,700,7,2006,WD,Normal


In [4]:
# Get the needed dictionaries to be used for encoding categorical features
cat_dicts = get_encoding_dicts(train, data['train_cat_list'])  

Ignoring: Alley
Ignoring: FireplaceQu
Ignoring: PoolQC
Ignoring: Fence
Ignoring: MiscFeature


In [5]:
# Cheking one of the values.
cat_dicts['Alley']

{nan: 0, 'Grvl': 0.4211, 'Pave': 0.5789}

In [6]:
# Do the encoding
encoded_feat_cols = encode_categorical(feat_cols.copy(), cat_dicts)
encoded_feat_cols[data['train_cat_list']]

Unnamed: 0,MSZoning,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,...,GarageType,GarageFinish,GarageQual,GarageCond,PavedDrive,PoolQC,Fence,MiscFeature,SaleType,SaleCondition
0,0.2590,0.5818,0.0,0.1993,0.2376,0.5682,0.1826,0.3097,0.0430,0.1133,...,0.1817,0.2939,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1726
1,0.2590,0.5818,0.0,0.1993,0.2376,0.5682,0.1837,0.3097,0.0519,0.0875,...,0.1817,0.2939,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1726
2,0.2590,0.5818,0.0,0.2493,0.2376,0.5682,0.1826,0.3097,0.0430,0.1133,...,0.1817,0.2939,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1726
3,0.2590,0.5818,0.0,0.2493,0.2376,0.5682,0.1875,0.3097,0.0458,0.1133,...,0.1201,0.2067,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1443
4,0.2590,0.5818,0.0,0.2493,0.2376,0.5682,0.1837,0.3097,0.0729,0.1133,...,0.1817,0.2939,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1726
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2914,0.1713,0.5818,0.0,0.1993,0.2376,0.5682,0.1826,0.3097,0.0214,0.1133,...,0.0925,0.1503,0.1064,0.1263,0.4298,0.0,0.000,0.000,0.1035,0.1726
2915,0.1713,0.5818,0.0,0.1993,0.2376,0.5682,0.1826,0.3097,0.0214,0.1133,...,0.0985,0.2067,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1443
2916,0.2590,0.5818,0.0,0.1993,0.2376,0.5682,0.1826,0.3097,0.0340,0.1133,...,0.1201,0.2067,0.1930,0.2296,0.4298,0.0,0.000,0.000,0.1035,0.1443
2917,0.2590,0.5818,0.0,0.1993,0.2376,0.5682,0.1826,0.3097,0.0340,0.1133,...,0.0925,0.1503,0.1064,0.1263,0.4298,0.0,0.247,0.227,0.1035,0.1726


#### Note: They were some features which did not have any missing values in the training dataset however they did in test set. Hence they are going to be some missing values in the previousley categorical features and from now on they are going to be imputed the same way the numerical features will be imputed, in other words, they will be treated as numerical.

## Imputing Data with KNN:
- Both the features of train and test are going to be implemented at the same time together using the KNN algorithm

In [7]:
# Impute the missing values with KNNImputer
from sklearn.impute import KNNImputer

# Get the list of columns with missing values
missing_features = encoded_feat_cols.columns[encoded_feat_cols.isna().any()].tolist()

# The number of neighbors that the function look for is the 1/3 of the whole dataframe
num = (train_len + test_len) // 3

# Instantiate the Imputer object
imputer = KNNImputer(n_neighbors=num, weights="distance")
# Fit and transform using the imputer on the missing data and get the imputed combined data
imputed_combined = pd.DataFrame()
imputed_combined[encoded_feat_cols.columns.to_list()] = pd.DataFrame(imputer.fit_transform(encoded_feat_cols))

# Check the imputation:
True in imputed_combined.isna().any().values

False

In [11]:
# Get the fitting and predicting datasets:

# features:
features = imputed_combined.columns.to_list()

train_part = pd.DataFrame()
train_part = imputed_combined.iloc[:train_len]
# Add the y_feat column (for use further along the way)
train_part.loc[:, (y_feat)] = train[y_feat]

# X would be the features that will be used for both prediction and training
X = train_part[features]
y = train[y_feat] # y, the dependent column of the dataset

# The dataset used for prediction
X_test = imputed_combined[train_len: ].reset_index()
X_test.drop(['index'], inplace=True, axis=1)

# Normalized version of datasets
norm_X = normalize(X.copy())
norm_y = normalize(y.copy())
norm_X_test = normalize(X_test.copy())

In [12]:
train_part.corr()[y_feat].nlargest(13)[1:]

OverallQual     0.790982
Neighborhood    0.738629
GrLivArea       0.708624
ExterQual       0.690933
BsmtQual        0.681904
KitchenQual     0.675721
GarageCars      0.640409
GarageArea      0.623431
TotalBsmtSF     0.613581
1stFlrSF        0.605852
FullBath        0.560664
GarageFinish    0.553058
Name: SalePrice, dtype: float64

#### Bias and variance detection using validation data:

##### `Note:` One way of validating the accuracy of the neural networks is by using a validation set. In tensorflow's optimizer's they are two options for validation
    - Validation_split: This will split a portion of the data being trained to be used as the validation set.
    - Validation_data: This will use a given data (from the dataset) to evaluate the predictions.
##### Now it would make sense for me to have a number of different randomly selected validation batches to run on the model after being trained to see if there is a variance or bias in the prediction.

In [13]:
# Chunks of data used to check for overfitting and undefitting
portion = 0.3
num = 5
devs = []
dev_batch_size = int(train_part.shape[0] * portion)

for i in range(num):
    dev_data = train_part.sample(n=dev_batch_size, random_state=i)
    dev_x = dev_data[features]
    dev_y = dev_data[y_feat]
    devs.append((dev_x, dev_y))

## Fitting parts

- Some usefull functions used for hyperparameter tuning
- Imports used throughout

In [14]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers, regularizers, losses, metrics
from tensorflow.keras.regularizers import l1, l2, l1_l2, L1L2
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, InputLayer,LeakyReLU
from tensorflow.keras.optimizers.schedules import ExponentialDecay, InverseTimeDecay
from tensorflow.keras.optimizers import Adam, RMSprop
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.losses import MeanSquaredLogarithmicError
from tensorflow.keras.initializers import TruncatedNormal

import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling

In [15]:
# Scheduler objects to control the optimizer learning rate:
from tensorflow.keras.optimizers.schedules import InverseTimeDecay, ExponentialDecay

def TimeDecayScheduler(learning_rate=0.001, decay_steps=200, decay_rate=1.2, name=""):
    """ Returns an InverseTimeDecay object with the given properties to be used in the optimizer. """
    return InverseTimeDecay(
        initial_learning_rate=learning_rate, 
        decay_steps=decay_steps,
        decay_rate=decay_rate,
        name=name
    )


def ExponentialScheduler(initial_learning_rate, decay_steps, decay_rate, name=""):
    """ Returns an ExponentialDecay object with the given properties to be used in the optimizer. """
    return InverseTimeDecay(
        initial_learning_rate=initial_learning_rate, 
        decay_steps=decay_steps,
        decay_rate=decay_rate,
        name=name
    )


# Actual Optimizers: Adam and RMSprop are the main two optimizers that are going to be used for this project since they accept schedulers and happen to be effective.

from tensorflow.keras.optimizers import Adam, RMSprop

def AdamOptimizer(learning_rate=0.001, scheduler=None):
    """
        # params:
        learning_rate: the initial learning rate to be used
        scheduler: If this is passed by the user then use it in the optimizer instead of the learning rate

        # returns: an Adam optimizer
    """
    if scheduler == None:
        return Adam(learning_rate)
    else:
        return Adam(scheduler)
    

def RMSpropOptimizer(learning_rate=0.001, scheduler=None):
    """
        # params:
            learning_rate: the initial learning rate to be used
            scheduler: If this is passed by the user then use it in the 
            optimizer instead of the learning rate
        
        # returns: an RMSprop optimizer
    """
    if scheduler == None:
        return RMSprop(learning_rate)
    else:
        return RMSprop(scheduler)

# CallBacks:
from tensorflow.keras.callbacks import EarlyStopping

def EarlyStopCallBack(patience=100):
    """
        # params: patience of the object for the number of epochs passed with no improvement
        # returns: a EarlyStopping callback object 
    """
    return EarlyStopping(monitor='val_loss', patience=patience)


# Models: 
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization  # Layers 
from tensorflow.keras.regularizers import l2, l1, l1_l2, L1L2  # Regularizer
from tensorflow.keras.losses import MeanSquaredLogarithmicError # Error-metric
import tensorflow_docs as tfdocs # For logging puposes

def Model01(config):
    """
        # params: 
        config: uses the configuration dictionary to compile and fit the model accordingly
        
        # returns a history object when the fitting is done
    """
    pass

## Ideas to try out and improve the model
1. Weight-Initializers:
    - Use tf.keras.initializers.RandomNormal and tf.keras.initializers.RandomUniform
    - Tweak their properties and see how they would work.
2. Bias in Dense layers:
    - Setup an initiallizer and regularizer for the bias of the layer
    - Also use it those for the weights too
    - Tweak arguments of earlt-stop call back
3. Layers:
    - Use LeakyRelu/TreshholdRelu/PRelu as a layer
    - Maybe try-out tf.keras.layers.experimental.preprocessing.Normalization*
    - Tweak BatchNormalization layer arguments
4. Overfitting:
    - use tf.keras.layers.GaussianDropout and tf.keras.layers.GaussianNoise ( which could be viewed as a Data augmentation method.)

In [None]:
# A samll neural net to test how does the keras-tuner work
def test_model(hp):
    
    model = keras.Sequential([
        Dense(64, activation='relu'),
        Dense(128, activation='relu'),
        
        Dense(1)
    ])
    
    time_lr = InverseTimeDecay(
      0.15,
      decay_steps=20,
      decay_rate=0.4
    )
    
    optimizer = Adam(time_lr)
        
    model.compile(
        loss=MeanSquaredLogarithmicError(name='MSLE'), 
        optimizer=optimizer,
    )
  
    return model



In [63]:
# All the models should have the kernel_initializer and bias_initializer, bias_regularizer
def build_model01():
    """
        The architecture of this model consists of three batches of three Dense layers where:
            - The layer in between is linear and does not have an activation function, the other
                two layers have the relu activation function. The number of layer in between is four times
                the number of its sourounding layers.
            - The regularazation term for the middle layer is both higher and uses the l2 for its weights.
                There is more bias being introduced in the middle layer as well, the weights have a higher range
                in the midddle layer.
            - Added BatchNormalization layer after each three batches(testing phase).
            - The last three layers will be highly biased with high regualarization terms
    """
    model = keras.Sequential([
        InputLayer(input_shape=[len(X.keys())]),
        
        Dense(64, activation='elu', 
              kernel_regularizer=l1(0.001),
              bias_regularizer=l2(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=0.005),
              kernel_initializer=TruncatedNormal(mean=0, stddev=25)
        ),
        Dense(256, 
              kernel_regularizer=l2(0.01),
              bias_regularizer=l1(0.01),
              bias_initializer=TruncatedNormal(mean=0, stddev=0.5),
              kernel_initializer=TruncatedNormal(mean=0, stddev=5)
        ),
        BatchNormalization(),
        Dense(64, activation='elu',
              bias_regularizer=l1(0.001), 
              kernel_regularizer=l2(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=0.005),
              kernel_initializer=TruncatedNormal(mean=0, stddev=25)
        ),
        
        BatchNormalization(),
        Dense(128, activation = 'relu', 
              bias_regularizer=l1(0.001), 
              kernel_regularizer=l1(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=1.5),
              kernel_initializer=TruncatedNormal(mean=0, stddev=1)
        ),
        Dense(512, 
              kernel_regularizer=l2(0.01),
              bias_regularizer=l1(0.01), 
              bias_initializer=TruncatedNormal(mean=0, stddev=1.5),
              kernel_initializer=TruncatedNormal(mean=0, stddev=5)
        ),
        BatchNormalization(),
        Dense(128, activation = 'elu',
              kernel_regularizer=l2(0.01),
              bias_regularizer=l1(0.001), 
              bias_initializer=TruncatedNormal(mean=0, stddev=1),
              kernel_initializer=TruncatedNormal(mean=0, stddev=1)
        ),
        
        BatchNormalization(),
        Dense(8,  activation = 'elu',
              kernel_regularizer=l1(0.001),
              bias_regularizer=l1(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=1),
              kernel_initializer=TruncatedNormal(mean=0, stddev=1.75)
        ),
        Dense(32,
              kernel_regularizer=l2(0.01), 
              bias_regularizer=l1(0.01), 
              bias_initializer=TruncatedNormal(mean=0, stddev=1.5),
              kernel_initializer=TruncatedNormal(mean=0, stddev=4)  
        ),
    
        BatchNormalization(),
        Dense(8, activation = 'elu',
              kernel_regularizer=l1(0.001),
              bias_regularizer=l1(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=1),
              kernel_initializer=TruncatedNormal(mean=0, stddev=1.75)
        ),
        
        BatchNormalization(),
        Dense(128, 
              activation = 'elu',
              kernel_regularizer=l1(0.001),
              bias_regularizer=l1(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=1.8),
              kernel_initializer=TruncatedNormal(mean=0, stddev=2.5)
        ),
        Dense(1024, 
              kernel_regularizer=l2(0.01), 
              bias_regularizer=l1(0.01), 
              bias_initializer=TruncatedNormal(mean=0, stddev=0.5),
              kernel_initializer=TruncatedNormal(mean=0, stddev=6)
        ),
        Dropout(0.5),
        
        BatchNormalization(),
        Dense(128, 
              activation = 'elu', 
              bias_regularizer=l1(0.001),
              bias_initializer=TruncatedNormal(mean=0, stddev=1.8),
              kernel_initializer=TruncatedNormal(mean=0, stddev=2.5),
              kernel_regularizer=l2(0.001)
        ),
        
        Dense(4, 
              kernel_regularizer=L1L2(0.04, 0.04), 
              bias_regularizer=l2(0.01), 
              bias_initializer=TruncatedNormal(mean=0, stddev=5), 
              kernel_initializer=TruncatedNormal(mean=0, stddev=2.5)
        ),
        Dense(4, 
              kernel_regularizer=L1L2(0.05, 0.05), 
              bias_regularizer=l2(0.1), 
              bias_initializer=TruncatedNormal(mean=0, stddev=0.05), 
              kernel_initializer=TruncatedNormal(mean=0, stddev=4)
        ),
        Dense(4, 
              kernel_regularizer=L1L2(0.6, 0.6), 
              bias_regularizer=l2(0.2), 
              bias_initializer=TruncatedNormal(mean=0, stddev=5), 
              kernel_initializer=TruncatedNormal(mean=0, stddev=2.5)
        ),
        
        Dense(1)
      ])
    
    time_lr = TimeDecayScheduler(learning_rate=0.018, decay_steps=500, decay_rate=0.35, name="")
    
    optimizer = Adam(time_lr)
        
    model.compile(
        loss=MeanSquaredLogarithmicError(name='MSLE'), 
        optimizer=optimizer, 
    )
  
    return model

model = build_model01()

In [48]:
EPOCHS = 2000

# The patience parameter is the amount of epochs to check for improvement
early_stop = EarlyStopping(monitor='val_loss', patience=55, mode='min', restore_best_weights=True)

In [59]:
decay_rate = 0.8
lr = 0.18
step = 500
for i in range(2000):
    print(i , lr)
    lr = lr / (1 + decay_rate * (i / step))

0 0.18
1 0.18
2 0.17971246006389774
3 0.17913921457725052
4 0.17828345399805984
5 0.17714969594401814
6 0.17574374597620848
7 0.17407264855012725
8 0.172144628708591
9 0.16996902518620757
10 0.16755621568040968
11 0.16491753511851345
12 0.16206518781300455
13 0.1590121544476104
14 0.1557720948742265
15 0.15235924772518242
16 0.14878832785662346
17 0.14507442263711334
18 0.1412328880813019
19 0.1372792458021986
20 0.13322908171797226
21 0.12909794740113592
22 0.12490126490047979
23 0.12065423580030893
24 0.1163717552086313
25 0.11206833128720271
26 0.10775801085307952
27 0.10345431149489201
28 0.09917016055875386
29 0.09491784126986395
30 0.09070894616768344
31 0.08655433794626281
32 0.08246411770794856
33 0.07844760055931181
34 0.07451329840360164
35 0.07066890971510019
36 0.0669213160180873
37 0.06327658473722324
38 0.05973997803740865
39 0.0563159672298347
40 0.05300825228711851
41 0.04981978598413393
42 0.046752802162287846
43 0.043808847603343186
44 0.040988816994145943
45 0.038292

### Testing different hyperparameters
- Run number of different fitting and see their results.
- Normalized X works way betetr!! The training time is way faster and the loss and val_loss decrease close to each other.
- Using BatchNormalizations before each layer would be very helpful.

Note: A lower loss value would not necessary mean that the model's prediction would improve
##### Important Note: when we increase the amount of regularazation, the loss will be increase naturally. Hence, if one were to compare different model epochs, this should be taken into account.

In [64]:
""" 
    Model01
    prev-err: 11.125
    
    more regularazation at the last layers
"""
history = model.fit(norm_X, y, epochs=EPOCHS,
          verbose=0, validation_split=0.20,
          callbacks=[early_stop, tfdocs.modeling.EpochDots()])
print()
validate(quantize(pd.DataFrame(model.predict(norm_X_test, verbose=0))[0]))


Epoch: 0, loss:56154.5586,  val_loss:50657.1211,  
....................................................................................................
Epoch: 100, loss:471.4744,  val_loss:466.2067,  
....................................................................................................
Epoch: 200, loss:52.5536,  val_loss:51.9563,  
....................................................................................................
Epoch: 300, loss:5.1245,  val_loss:5.1230,  
....................................................................................................
Epoch: 400, loss:1.6325,  val_loss:58.6813,  
....................................................................................................
Epoch: 500, loss:1.0239,  val_loss:1.0482,  
....................................................................................................
Epoch: 600, loss:0.8372,  val_loss:0.8500,  
.................................................................

In [52]:
""" 
    Model01
    prev-err: 11.125
    
    0.018 # fixed
    decay_steps=500
    decay_rate=0.35
    
    error:10.872, 0.13119
"""
history = model.fit(norm_X, y, epochs=EPOCHS,
          verbose=0, validation_split=0.20,
          callbacks=[early_stop, tfdocs.modeling.EpochDots()])
print()
validate(quantize(pd.DataFrame(model.predict(norm_X_test, verbose=0))[0]))


Epoch: 0, loss:48199.0391,  val_loss:42734.3555,  
....................................................................................................
Epoch: 100, loss:8.4621,  val_loss:8.3970,  
....................................................................................................
Epoch: 200, loss:1.1681,  val_loss:1.1689,  
....................................................................................................
Epoch: 300, loss:0.3892,  val_loss:0.3936,  
....................................................................................................
Epoch: 400, loss:0.3198,  val_loss:0.3128,  
....................................................................................................
Epoch: 500, loss:0.2920,  val_loss:0.2993,  
....................................................................................................
Epoch: 600, loss:0.2756,  val_loss:0.2716,  
........................................................................

In [20]:
""" 
    Model01
    prev-err: 10.676
    added initializers and regularizers to all of the layers
    changed decay_rate from 0.6 to 0.5
    using batchNormalization
"""
history = model.fit(norm_X, y, epochs=EPOCHS,
          verbose=0, validation_split=0.20,
          callbacks=[early_stop, tfdocs.modeling.EpochDots()])
print()
validate(quantize(pd.DataFrame(model.predict(norm_X_test, verbose=0))[0]))


Epoch: 0, loss:23061.1875,  val_loss:7358.2192,  
....................................................................................................
Epoch: 100, loss:0.7990,  val_loss:0.8313,  
....................................................................................................
Epoch: 200, loss:0.4338,  val_loss:0.4407,  
....................................................................................................
Epoch: 300, loss:0.3621,  val_loss:0.3572,  
....................................................................................................
Epoch: 400, loss:0.3286,  val_loss:0.3325,  
....................................................................................................
Epoch: 500, loss:0.3178,  val_loss:0.3448,  
....................................................................................................
Epoch: 600, loss:0.2939,  val_loss:0.3007,  
.........................................................................

In [26]:
b011 = load_bench_data(file_name='011978.csv', root='./submissions/')['SalePrice']
pred_y = quantize(pd.DataFrame(model.predict(norm_X_test, verbose=0))[0])

from sklearn.metrics import mean_squared_log_error as MSLE,mean_absolute_error as MAE
print(MSLE(b011, pred_y))
print(MAE(b011, pred_y))

0.0067319995507884095
11134.41652298235


In [50]:
def validate(y_pred):
    """ Prints out the data validation with respect to the highest submissions. """
    from sklearn.metrics import mean_absolute_error as MAE
    from sklearn.metrics import mean_squared_log_error as MSLE
    # Import the base_validation submititions
    b012 = load_bench_data(file_name='012008.csv', root='./submissions/')['SalePrice']
    b011 = load_bench_data(file_name='011978.csv', root='./submissions/')['SalePrice']
    
    # Print out the differences
    print('MAEs:')
    print('b011:', int(MAE(b011, y_pred)) / 1000)
    print('b012:', int(MAE(b012, y_pred)) / 1000)
    print('-----------------------------------')
    print('base-differences:', int(MAE(b011, b012)) / 1000)
    print('###############################################')
    print('Lograithmic Error:')
    print('MSLE:')
    print('b011:', MSLE(b011, y_pred))
    print('b012:', MSLE(b012, y_pred))
    print('-----------------------------------')
    print('base-differences:', MSLE(b011, b012))

In [23]:
train_stop = EarlyStopping(monitor='loss', patience=5, mode='min', restore_best_weights=True)
# Fit on whole model then predict
history = model.fit(norm_X, y, epochs=4000,
          verbose=0,
          callbacks=[train_stop, tfdocs.modeling.EpochDots()])
print()
predictions = quantize(pd.DataFrame(model.predict(norm_X_test, verbose=0))[0])
# See how accurate it is
validate(predictions)


Epoch: 0, loss:0.2575,  
.......
b011: 11.134
b012: 12.177
-----------------------------------
base-differences: 5.76


In [90]:
output = pd.DataFrame({'Id': test.Id,
                      'SalePrice': quantize(pd.DataFrame(model.predict(norm_X_test, verbose=0))[0])})
output.to_csv('submissions/submission.csv', index=False)