## One-dimensional examples Neural networks and Manifolds

In [1]:
import numpy as np
from numpy.random import uniform

import keras
from keras import backend as K
from keras.models import Sequential
from keras.layers import Input
from keras.layers.core import Dense, Activation
from keras.optimizers import SGD, RMSprop

from bokeh.layouts import column
from bokeh.models import CustomJS, ColumnDataSource, Slider
from bokeh.plotting import figure, output_notebook, show

Using Theano backend.


In [2]:
#embed figures in the notebook
output_notebook()

### Separable 1D example

This is a simple one-dimensional data set that is **linearly** separable and hence does not require a hidden layer. This is equivalent to a 1D perceptron.

In [12]:
x1 = uniform(-1.0, -0.001, 30)
x2 = uniform(0.001, 1.0, 30)

labels1 = np.zeros(x1.shape)
labels2 = np.ones(x2.shape)

x = np.concatenate([x1, x2])
labels = np.concatenate([labels1, labels2])

In [13]:
p = figure(plot_width=400, plot_height=100)
p.circle(x1, np.zeros(x1.shape), size=20, color="red", alpha=0.5)
p.circle(x2, np.zeros(x2.shape), size=20, color="blue", alpha=0.5)
show(p)

In [4]:
classifier_1d = Sequential([Dense(1, input_shape=(1,), activation='sigmoid')])
classifier_1d.compile(optimizer=RMSprop(lr=1.0), loss='binary_crossentropy', metrics=['accuracy'])

In [5]:
classifier_1d.fit(x, labels, batch_size=1, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1135ec990>

In [6]:
classifier_1d.get_weights()

[array([[ 72.12062073]], dtype=float32), array([ 0.08075394], dtype=float32)]

Generating some test data to verify that the neural network can classify successfully.

In [7]:
x1_test = uniform(-1.0, -0.001, 5)
x2_test = uniform(0.001, 1.0, 5)

labels1_test = classifier_1d.predict(x1_test)
labels2_test = classifier_1d.predict(x2_test)

print("prediction x1_test")
print(labels1_test) #this should give zero
print("prediction x2_test")
print(labels2_test) #this should give one

prediction x1_test
[[  1.77056131e-06]
 [  4.31603850e-29]
 [  2.47026917e-23]
 [  3.02348882e-01]
 [  1.31669779e-31]]
prediction x2_test
[[ 1.        ]
 [ 1.        ]
 [ 0.99777007]
 [ 1.        ]
 [ 1.        ]]


### Unseparable 1D example 

This is a one-dimensional data set that is **not** linearly separable (cfr. Christopher Olah's blog) in one dimension. Hence it will require a layer with at least **two** units to transform the data onto a **two-dimensional** manifold to make it separable. <br> <br>
First we will try to classify it with the same one-unit network architecture as in the previous example. This will result in non-convergence. 

In [14]:
inner_bound = 1.0/3.0
outer_bound = 2.0/3.0

x1 = uniform(-inner_bound, inner_bound, 30)
x2 = np.concatenate([uniform(-1.0, -outer_bound, 15), uniform(outer_bound, 1.0, 15)])

In [15]:
p = figure(plot_width=400, plot_height=100)
p.circle(x1, np.zeros(x1.shape), size=20, color="red", alpha=0.5)
p.circle(x2, np.zeros(x2.shape), size=20, color="blue", alpha=0.5)
show(p)

In [16]:
label1 = np.zeros(x1.shape)
label2 = np.ones(x2.shape)

x = np.concatenate([x1, x2])
labels = np.concatenate([label1, label2])

In [17]:
classifier_1d_fail = Sequential([Dense(1, input_shape=(1,), activation='sigmoid')])
classifier_1d_fail.compile(optimizer=RMSprop(lr=0.01), loss='binary_crossentropy', metrics=['accuracy'])

In [18]:
classifier_1d_fail.fit(x, labels, batch_size=1, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x113a60110>

In order to classify successfully, we need an input layer with at least two units (you can most likely get better performance by adding more units and/or by adding more layers). We will plot the evolution of the manifold transformation later on, so we define a callback to extract and store the weights during learning.

In [19]:
class WeightsHistory(keras.callbacks.Callback):
    def __init__(self, model):
        self.model = model
        
    def on_train_begin(self, logs={}):
        self.weights_layer0 = []
        self.weights_layer1 = []

    def on_batch_end(self, batch, logs={}):
        self.weights_layer0.append(self.model.layers[0].get_weights())
        self.weights_layer1.append(self.model.layers[1].get_weights())

In [30]:
classifier_1d_success = Sequential()
classifier_1d_success.add(Dense(2, input_shape=(1,), activation='tanh'))
classifier_1d_success.add(Dense(1, activation='sigmoid'))
classifier_1d_success.compile(optimizer=RMSprop(lr=0.1), loss='binary_crossentropy', metrics=['accuracy'])

In [31]:
weights_history = WeightsHistory(classifier_1d_success)
classifier_1d_success.fit(x, labels, batch_size=1, epochs=5, callbacks=[weights_history])

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1151cd350>

During learning, the training data are made linearly separable on a manifold with higher dimensionality (in this case 2D). To visualize this evolution, we can plot the output of the input-layer and the separation boundary, which is given by the weights of the output-layer. To get the output of the input-layer we need to set the weights in the model and apply a Keras function. <br>

Some steps you can do to explore the impact of different parameters on the evolution of the manifold during learning: <br>
* Turn off the bias in the input-layer.
* Randomize input (the default sampling is random, but this may improve performance).
* Try Stochastic Gradient Descent (SGD) instead of RMSprop.
* Add more layers with variable number of units. Just make sure that the last layer before the output layer has two units, otherwise you cannot visualize it in 2D.

To visualize the evolution, we will make an interactive plot in Bokeh using a slider to demonstrate the evolution of the manifold transformation during learning. <br>
For this we will have to store all the relevant data in sources that can be accessed in the slider callback. <br>
Note that there is probably a more efficient way of implementing this using a Bokeh server, but we will do it here with a simple javaScript callback.

So first we compute all the transformations and store them in dictionaries.

In [22]:
grid = np.array([np.linspace(-1.0, 1.0, num=50)]).transpose()

x_limits = (-1.0, 1.0)
y_limits = (-1.0, 1.0)

data_train = {'x1': [], 'y1': [],
              'x2': [], 'y2': []} 

data_grid = {'x': [], 'y': []}    

data_sep = {'x': [], 'y': []}

data_all = {'x_grid': [], 'y_grid': [],
            'x1': [], 'y1': [],
            'x2': [], 'y2': [],
            'x_sep': [], 'y_sep': []}        

get_first_layer_output = K.function([classifier_1d_success.layers[0].input],
                                    [classifier_1d_success.layers[0].output])

for w_L0, w_L1 in zip(weights_history.weights_layer0, weights_history.weights_layer1):
    classifier_1d_success.layers[0].set_weights(w_L0)
    
    layer_output1 = get_first_layer_output([np.array([x1]).transpose()])[0]
    layer_output2 = get_first_layer_output([np.array([x2]).transpose()])[0]
    layer_output = get_first_layer_output([grid])[0]
    
    #lambda to create end points of the separation boundary in the plot.
    #boundary is given by: w[0]*x + w[1]*y + bias = 0.
    #lambda computes y for point on the boundary given x.
    make_lin_sep = lambda x : -(w_L1[1][0] + w_L1[0][0,0] * x) / w_L1[0][1,0] 
    
    data_all['x1'].append(layer_output1[:,0])
    data_all['y1'].append(layer_output1[:,1])
    data_all['x2'].append(layer_output2[:,0])
    data_all['y2'].append(layer_output2[:,1])
    data_all['x_grid'].append(layer_output[:,0])
    data_all['y_grid'].append(layer_output[:,1])
    data_all['x_sep'].append(x_limits)
    data_all['y_sep'].append(np.array([make_lin_sep(x_limits[0]), make_lin_sep(x_limits[1])]))
    
#initialize with first data    
data_train['x1'] = data_all['x1'][0]
data_train['y1'] = data_all['y1'][0]
data_train['x2'] = data_all['x2'][0]
data_train['y2'] = data_all['y2'][0]
data_grid['x'] = data_all['x_grid'][0]
data_grid['y'] = data_all['y_grid'][0]
data_sep['x'] = data_all['x_sep'][0]
data_sep['y'] = data_all['y_sep'][0]

Now we create the figure in Bokeh. Move the slider to see the evolution of the data transformation during learning.

In [24]:
source_train = ColumnDataSource(data=data_train)
source_grid = ColumnDataSource(data=data_grid)
source_sep = ColumnDataSource(data=data_sep)
source_all = ColumnDataSource(data=data_all)

p = figure(plot_width=400, plot_height=400, x_range=(-1.2, 1.2), y_range=(-1.2, 1.2))

p.circle('x1', 'y1', source=source_train, size=5, color='red', alpha=0.5)
p.circle('x2', 'y2', source=source_train, size=5, color='blue', alpha=0.5)
p.line('x', 'y', source=source_grid)
p.line('x', 'y', source=source_sep)

#note: "source.trigger('change')" should be replaced with "source.change.emit()" when using bokeh 0.12.6+
callback = CustomJS(args=dict(src_train=source_train, 
                              src_grid=source_grid, 
                              src_sep=source_sep,
                              src_all=source_all), code="""
                        var index = cb_obj.value
                        
                        var data_train = src_train.data
                        var data_grid = src_grid.data
                        var data_sep = src_sep.data
                        var data_all = src_all.data
                        
                        data_train['x1'] = data_all['x1'][index]
                        data_train['y1'] = data_all['y1'][index]
                        data_train['x2'] = data_all['x2'][index]
                        data_train['y2'] = data_all['y2'][index]
                        data_grid['x'] = data_all['x_grid'][index]
                        data_grid['y'] = data_all['y_grid'][index]
                        data_sep['x'] = data_all['x_sep'][index]
                        data_sep['y'] = data_all['y_sep'][index]
                        
                        src_train.trigger('change')
                        src_grid.trigger('change')
                        src_sep.trigger('change')
                    """)

max_val = len(data_all['x_grid'])-1                
slider = Slider(start=0, end=max_val, value=0, step=1)
slider.js_on_change('value', callback)

show(column(slider,p))


<br>
Generating some test data to verify that model classifies successfully

In [25]:
x1_test = uniform(-1.0, -outer_bound, 5)
x2_test = uniform(-inner_bound, inner_bound, 5)
x3_test = uniform(outer_bound, 1.0, 5)

labels1_test = classifier_1d_success.predict(x1_test)
labels2_test = classifier_1d_success.predict(x2_test)
labels3_test = classifier_1d_success.predict(x3_test)

print("prediction x1_test")
print(labels1_test) #should give one
print("prediction x2_test")
print(labels2_test) #should give zero
print("prediction x3_test")
print(labels3_test) #should give one

prediction x1_test
[[ 0.88146764]
 [ 0.80083561]
 [ 0.71486747]
 [ 0.93030071]
 [ 0.74744898]]
prediction x2_test
[[ 0.01056369]
 [ 0.00789942]
 [ 0.00746443]
 [ 0.02158814]
 [ 0.00641478]]
prediction x3_test
[[ 0.92977649]
 [ 0.93910033]
 [ 0.9429183 ]
 [ 0.94571906]
 [ 0.94741213]]
