# K-means on a wines dataset SOLUTION
We will neeed a few libraries for building this neural network:
* Numpy and Pandas for data handling
* Matplotlib for visualisation
* Keras for the DL functionality:
    * models for the sequential (standard) DL frame
    * layers for the standard dense layers
    * datasets for the boston dataset

In [145]:
import pandas as pd
import numpy as np
from keras import models, layers, datasets
import matplotlib.pyplot as plt
%matplotlib notebook

## Data Prep
Let's pull down our data and get it ready. The data doesn't come with its column headers, but they are:
* CRIM: This is the per capita crime rate by town
* ZN: This is the proportion of residential land zoned for lots larger than 25,000 sq.ft.
* INDUS: This is the proportion of non-retail business acres per town.
* CHAS: This is the Charles River dummy variable (this is equal to 1 if tract bounds river; 0 otherwise)
* NOX: This is the nitric oxides concentration (parts per 10 million)
* RM: This is the average number of rooms per dwelling
* AGE: This is the proportion of owner-occupied units built prior to 1940
* DIS: This is the weighted distances to five Boston employment centers
* RAD: This is the index of accessibility to radial highways
* TAX: This is the full-value property-tax rate per \$10,000
* PTRATIO: This is the pupil-teacher ratio by town
* B : This is calculated as 1000(Bk — 0.63)², where Bk is the proportion of people of African American descent by town
* LSTAT: This is the percentage lower status of the population
* MEDV: This is the median value of owner-occupied homes in \$1000s

MEDV is the y value for this analysis.

As the magnitudes and ranges of each column are quite different, we need to regularise them. We can do this with either a standardisation or a normalisation.

A standardisation ($z$) pulls every data series into a comparable range dependent on its $\sigma$ - standard deviation.

A normalisation ($x$) pulls every data series into a range of between 0 and 1.

The equations are as follows:

$$ z = \frac{x_i-\bar{x}}{\sigma}$$

$$ x = \frac{x_i - x_{min}}{x_{max}-x_{min}} $$

In [52]:
(X_train, y_train), (X_test, y_test) = datasets.boston_housing.load_data()

columns = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT"]

train_df = pd.DataFrame(X_train, columns=columns)

test_df = pd.DataFrame(X_test, columns=columns)

In [53]:
def standardise_a_series(series):
    
    def standardise(x, mean, sigma):
        return (x - mean)/sigma
    
    sigma = np.std(series)
    mean = np.mean(series)
    
    output_series = [standardise(x, mean, sigma) for x in series]
    
    return output_series

In [54]:
train_df = train_df.apply(standardise_a_series)
test_df = test_df.apply(standardise_a_series)

In [55]:
train_df

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT
0,-0.272246,-0.483615,-0.435762,-0.256833,-0.165227,-0.176443,0.813062,0.116698,-0.626249,-0.595170,1.148500,0.448077,0.825220
1,-0.403427,2.991784,-1.333912,-0.256833,-1.215182,1.894346,-1.910361,1.247585,-0.856463,-0.348433,-1.718189,0.431906,-1.329202
2,0.124940,-0.483615,1.028326,-0.256833,0.628642,-1.829688,1.110488,-1.187439,1.675886,1.565287,0.784476,0.220617,-1.308500
3,-0.401494,-0.483615,-0.869402,-0.256833,-0.361560,-0.324558,-1.236672,1.107180,-0.511142,-1.094663,0.784476,0.448077,-0.652926
4,-0.005634,-0.483615,1.028326,-0.256833,1.328612,0.153642,0.694808,-0.578572,1.675886,1.565287,0.784476,0.389882,0.263497
...,...,...,...,...,...,...,...,...,...,...,...,...,...
399,-0.381973,-0.483615,-0.616568,-0.256833,-0.933487,-0.938177,-0.251223,1.157680,-0.741356,-1.040501,-0.262093,0.448077,0.477421
400,-0.388221,0.358906,-0.609218,-0.256833,-0.796907,-0.038202,-1.888860,0.339660,-0.741356,-1.100681,0.056428,0.448077,-0.848908
401,-0.402030,0.990797,-0.741515,-0.256833,-1.019702,-0.333021,-1.638018,1.430403,-0.971569,-0.613224,-0.717123,0.079439,-0.677769
402,-0.172920,-0.483615,1.245881,-0.256833,2.677335,-0.787241,1.056737,-1.044075,-0.511142,-0.017443,-1.718189,-0.987644,0.420835


## Model

To build this model we will need create a function that holds the model components. By putting it in a function we can't rerun part of the code and accidentally replicate parts of the structure.

We're using the "adam" optimiser - more on this later - and Mean Squared Error for the loss. This is a L2 stastical metric, and often performs better than the Mean Absolute Error - our L1 additional metric.

We will capture the performance of the model with the "history" variable, which will then tell the change over time of the model.

In [140]:
def make_model():

    model = models.Sequential()

    model.add(layers.Dense(13, input_dim=13, activation="relu"))
    model.add(layers.Dense(6, activation="relu"))
    model.add(layers.Dense(1))

    model.compile(loss="mse", optimizer="adam", metrics=["mae"])
    
    return model

In [141]:
first_model = make_model()

In [142]:
train_df.values.shape

(404, 13)

In [143]:
y_train.shape

(404,)

In [144]:
history = first_model.fit(train_df.values, y_train, validation_data=(test_df.values, y_test), epochs=100, batch_size=10)

Train on 404 samples, validate on 102 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100


Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 79/100
Epoch 80/100
Epoch 81/100
Epoch 82/100
Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


In [151]:

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

<IPython.core.display.Javascript object>

In [150]:
# Plot training & validation mae values
plt.plot(history.history['mean_absolute_error'])
plt.plot(history.history['val_mean_absolute_error'])
plt.title('Model MAE')
plt.ylabel('MAE')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

<IPython.core.display.Javascript object>

In [136]:
outputs = first_model.predict(test_df.values)

In [138]:
outputs.mean()

22.361946

In [139]:
y_train.mean()

22.395049504950492