# Example 3-1: Predicting house prices



## The Boston Housing Price dataset


- Predict the median price of homes in a given Boston suburb in the mid-1970s
- 404 training samples and 102 test samples, and each "feature" in the input data 
- Each feature has has a different scale

In [1]:
from keras.datasets import boston_housing

(train_data, train_targets), (test_data, test_targets) =  boston_housing.load_data()

Using TensorFlow backend.


In [2]:
train_data.shape

(404, 13)

In [3]:
test_data.shape

(102, 13)


- 13 features of the input data

    1. Per capita crime rate.
    2. Proportion of residential land zoned for lots over 25,000 square feet.
    3. Proportion of non-retail business acres per town.
    4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
    5. Nitric oxides concentration (parts per 10 million).
    6. Average number of rooms per dwelling.
    7. Proportion of owner-occupied units built prior to 1940.
    8. Weighted distances to five Boston employment centres.
    9. Index of accessibility to radial highways.
    10. Full-value property-tax rate per $10,000.
    11. Pupil-teacher ratio by town.
    12. 1000 * (Bk - 0.63) ** 2 where Bk is the proportion of Black people by town.
    13. % lower status of the population.


- target: the median values of owner-occupied homes, in thousands of dollars

In [4]:
train_targets

array([15.2, 42.3, 50. , 21.1, 17.7, 18.5, 11.3, 15.6, 15.6, 14.4, 12.1,
       17.9, 23.1, 19.9, 15.7,  8.8, 50. , 22.5, 24.1, 27.5, 10.9, 30.8,
       32.9, 24. , 18.5, 13.3, 22.9, 34.7, 16.6, 17.5, 22.3, 16.1, 14.9,
       23.1, 34.9, 25. , 13.9, 13.1, 20.4, 20. , 15.2, 24.7, 22.2, 16.7,
       12.7, 15.6, 18.4, 21. , 30.1, 15.1, 18.7,  9.6, 31.5, 24.8, 19.1,
       22. , 14.5, 11. , 32. , 29.4, 20.3, 24.4, 14.6, 19.5, 14.1, 14.3,
       15.6, 10.5,  6.3, 19.3, 19.3, 13.4, 36.4, 17.8, 13.5, 16.5,  8.3,
       14.3, 16. , 13.4, 28.6, 43.5, 20.2, 22. , 23. , 20.7, 12.5, 48.5,
       14.6, 13.4, 23.7, 50. , 21.7, 39.8, 38.7, 22.2, 34.9, 22.5, 31.1,
       28.7, 46. , 41.7, 21. , 26.6, 15. , 24.4, 13.3, 21.2, 11.7, 21.7,
       19.4, 50. , 22.8, 19.7, 24.7, 36.2, 14.2, 18.9, 18.3, 20.6, 24.6,
       18.2,  8.7, 44. , 10.4, 13.2, 21.2, 37. , 30.7, 22.9, 20. , 19.3,
       31.7, 32. , 23.1, 18.8, 10.9, 50. , 19.6,  5. , 14.4, 19.8, 13.8,
       19.6, 23.9, 24.5, 25. , 19.9, 17.2, 24.6, 13

## Preparing the data

- Feaure-wise normalization 
    - 각 feature 별로 평균을 빼고 표준편차로 나눠줌으로써 평균 0, 표준편차 1이 되도록 만듬
    - train set의 평균과 표준편차 만을 활용하여 test와 train  set 모두 표준화


In [5]:
mean = train_data.mean(axis=0)
train_data -= mean
std = train_data.std(axis=0)
train_data /= std

test_data -= mean
test_data /= std

## Building our network
<font color=blue>
TO DO: 
 
- layer 두 개를 사용하는 모델을 생성하고 compile하는 `build_model` 이름의 함수를 만드시오.
- `build_model` 함수의 input: node의 수, optimizer 종류
- `build_model` 함수의 output: compile된 model



In [6]:
from keras import models
from keras import layers

def build_model(optimizer='rmsprop', nodes=32):
    model = models.Sequential()
    model.add(layers.Dense(nodes, activation='relu',
                           input_shape=(train_data.shape[1],)))
    model.add(layers.Dense(nodes, activation='relu'))
    model.add(layers.Dense(1))
    model.compile(optimizer=optimizer, loss='mse', metrics=['mae'])
    return model

- Split train and validation set

In [7]:
val_data = train_data[:80]
partial_train_data = train_data[80:]
val_targets = train_targets[:80]
partial_train_targets = train_targets[80:]

## Compare batch sizes

<font color=blue>
    
TO DO: `adam` opimizer를 사용하고 node수를 64로 고정한 후 batch size를 1-50 사이의 값에서 변화시키며 learning curve(training and validation loss)의 변화를 관찰하시오. 
- `partial_train_data`와 `partial_train_targets`를 train set으로 사용
- `val_data`와 `val_targets`를 validation set으로 사용 

In [None]:
from keras import models
from keras import layers

def build_model(optimizer='rmsprop', nodes=32):
    model = models.Sequential()
    model.add(layers.Dense(nodes, activation='relu',
                           input_shape=(train_data.shape[1],)))
    model.add(layers.Dense(nodes, activation='relu'))
    model.add(layers.Dense(1))
    model.compile(optimizer=optimizer, loss='mse', metrics=['mae'])
    return model

<font color=blue>
TO DO: validation loss 관점에서 최적의 모형을 결정하고 전체 `train_data`와 `train_targets`를 사용하여 모형을 학습시킨 뒤 test set에 대한 loss와 mae 값을 출력하시오.

In [None]:

import time 
from keras.callbacks import ModelCheckpoint, TensorBoard
import keras.backend as K 

batch_size = 1  #10, 50으로 바꿔가며 시도
K.clear_session()
now = time.strftime("%c")
callbacks_list = [
#   EarlyStopping(monitor='val_loss',patience=1),
    ModelCheckpoint(filepath='models/house_batchsize_'+str(batch_size)+now+'.h5',monitor='val_loss',save_best_only=True),
    TensorBoard(log_dir='./logs/house/batchsize_'+str(batch_size)+now, histogram_freq=1)
]

model = build_model(optimizer='adam', nodes=64)
model.fit(partial_train_data, partial_train_targets, epochs=500, batch_size=batch_size,verbose=2,
          validation_data=(val_data, val_targets), callbacks=callbacks_list)


batch_size=1, 10, 50을 비교한 결과 batch_size=10인 경우 epochs=150 부근에서 validation loss가 최소를 가짐. 최적 조합에 대해 다시 model fitting 시행 후 model evaluation

In [None]:
model = build_model(optimizer='adam')
model.fit(train_data, train_targets, epochs=150, batch_size=10, verbose=2)
model.evaluate(test_data, test_targets)

## K-fold CV using Scikit-learn

In [None]:
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.model_selection import cross_val_score 
from keras.wrappers.scikit_learn import KerasRegressor
from keras.callbacks import EarlyStopping, TensorBoard

In [None]:
model = KerasRegressor(build_fn=build_model, epochs=100, batch_size=50, verbose=0)  
kfold = KFold(n_splits=5, shuffle=True, random_state=7) 
results = cross_val_score(model, train_data, train_targets, cv=kfold, scoring='neg_mean_squared_error')


In [None]:
print("Results: %.2f (%.2f)" % (results.mean(), results.std()))
