Kaggle: House Price Prediction
- Droped
'Id'
. - One hot encoded all none neumerical features.
- Replace all none neumerical features
Nan
with'None'
and one hot encoded them. - Filled all neumerical data
Nan
with means of the column. - Schewed Year data to be base on the minium of that column.
- Data contains outliers.
- Some numerical features are catergical.
- Fill numerical data with means is not a good approch because:
- Numerical that contains
Nan
usualy becasue the house does not have this feature. - Outliers' effect the means greatly.
- Numerical that contains
- Target collumn
'SalePrice'
is not in a normal disturbation. - Data that are highly correlated have repeted impact on the model.
- One hot encoded all catergical features.
- Normoralized
SalePrice
distrubition to normal curve by taking.
train['LogSalePrice'] = np.log(train['SalePrice'])
Use:
train['SalePrice'] = np.exp(train['LogSalePrice'])
to return to orignal distuibition.
- Reomved one feature from each set of features that have a corlation above 0.8, base on the disturbition graph.
The feature that have the highest corlation with'SalePrice'
out of the two is removed. - Fill all numerical feature Nan with 0.
All creddit of this methods gose to @Golden and her notebook
- Filled all numerical
Nan
with 0. - Filled all categorical
Nan
with'None'
. - Removed outliers recomended by author:
train = train[train['GrLivArea']<4000]
- Normoralized
SalePrice
. - One hot encoded all catergical features.
- Used hyperparameter tuning to tune a sklearn linear regression model.
- Used polynomial features to expand feature space.
- Use Root Mean Square Error (RMSE) as lost function since it is what the data is evluated by.
- First degree poly feature showed the best result.
- Optominal Alpha is less then 1000.
- Scores:
- Datas from 1: 0.24922.
- Datas from 3: 0.31011.
- Implemented RMSE for both the default
'SalePrice'
and'LogSalePrice'
:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true)))
def exp_root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(K.exp(y_pred)-K.exp(y_true))))
- Established bsae line model:
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 512) 207872
_________________________________________________________________
re_lu_1 (ReLU) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 512) 262656
_________________________________________________________________
re_lu_2 (ReLU) (None, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 512) 262656
_________________________________________________________________
re_lu_3 (ReLU) (None, 512) 0
_________________________________________________________________
dense_4 (Dense) (None, 1) 513
=================================================================
Total params: 733,697
Trainable params: 733,697
Non-trainable params: 0
_________________________________________________________________
- 3 layers of 512 Dense ReLU neurons and one output neuron.
- Train untile
'val_loss'
stop increasing for 50 epochs. - Default
'adam'
optimizer. - RMSE
root_mean_squared_error
as loss.
- Structurs:
- Increase / Decrease neuron number of each layer.
- Increase / Decrease depths of the network.
- Activitation functions:
- Sigmoid.
- Default LeakyReLU
alpha = 0.1
. - LeakyReLU
alpha = 0.5
.
- Optimizers:
- Adam with increas / decrease learning rates.
- Default SGD.
- Base Line Score: 0.21801.
- Structurs:
- Increasing model size and num of neurons resulted in the exact same score.
- Decreasing it result in sigenfigent score.
- Activitation functions:
- Sigmoid did not converge under 10000 epochs.
- Default LeakyReLU resulted in slightly better score: 0.21259.
- LeakyReLU with
alpha = 0.5
performed less then default, scored: 0.21337.
- Optimizer:
- Most optomal Adam
learning_rate = 0.0001
,scored of 0.21106. - SGD did not converge under 10000 epochs.
- Most optomal Adam
- Combined Model:
- Parameters: Defalut LeakyReLU, Adam
learning_rate = 0.0001
, 3 layers of 521 neurons. - Score: 0.21406, some how a combenation of these has increased the score.
- Parameters: Defalut LeakyReLU, Adam
This approch is build upon @Golden's notebook.
- Used hyperparameter to turn a Lasso Regression model.
- Golden's Parameter.
- Golden's Score: 0.11888.
- Hyper tuned best parameters.
Lasso(alpha = 0.0005, fit_intercept = True, normalize = False)
- Score: 0.11744.
- Neural Network is not an all-powerful solution.
- Better data cleaning and feature engineering with a simple model could result in a much better model then neural networks can be.
- The complexity of this data is manageable by humans, thus careful data cleaning and feature engineering should be done.
- Traditional approach should be considered first before deep learning in these types of data.