# Training a commute prediction network, and visualizing learning!  
<ul> latest version available from: https://github.com/miroenev/teach_DL , prerequisites:
* Matplotlib, Numpy, Keras, and <a href="https://github.com/K3D-tools/K3D-jupyter">K3D</a> for realtime training 3D surface visualization
* TensorFlow as the Keras backend for NN graph (queries model weights)

A video walkthrough of this notebook is <a href='https://youtu.be/HgbGJn9yz30'> available on YouTube</a>.

In [1]:
%matplotlib notebook

In [2]:
import numpy as np
import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D

# Define the problem

Lets try to predict commute duration from two observable independent variables: the time of day and the weather conditions.

<img src='figures/commute.png' width='400'/>
<img src='figures/target_distribution.png' width='1000'/>
In this toy example we'll first take on the role of the 'traffic gods' and decree that commute duration is defined through a linear mixture of the two independent variables. Later we'll sample from the distribution defined by these variables and generate a training dataset. This sampling procedure will be analogous to keeping a journal of all of our commutes for some [ long ] period of time, where each log entry consists of a set of  
* <b>X</b>: [ time-of-departure, weather-condition ], and the associated  
* <b>Y</b>: [ commute-duration ].

<img src='figures/x_y_mapping.png' width='900'/>

Given such a journal [dataset], we'll split it into training (75%) and testing (25%) subsets which we'll use to train and evaulate our model respectively. Specifically, we'll build a neural network model whose weights are initially randomly initialized, but are trained/updated as we stream the training data through (via the backpropagation learning algorithm). Each update will get us closer to having a model that has learned the relationship between X and Y or ([ time-of-departure, weather-condition ] to [ commute-duration] ).

<img src='figures/process.png' width='800'/>

During the training process we'll try to visualize the network's behavior by asking it to predict all the entries in our logbook using its current parameters/weights. As the training process unfolds, you should be able to see how the network adapts itself to the target surface/function that we determined for the commute duration.

<img src='figures/training_progress.png' width='700'/>

# Determine underlying relationship
We'll start by esablishing (as traffic gods) the relationships between:  
* 1) the time a commute starts (time-of-departure variable) and commute-duration
* 2) the weather when a commute is started (weather-condtion variable) and commute-duration

Note that as data scientists we never get to see this function, but we try to learn it from data.

In [3]:
# define data coordinates
xRange = [0,10]; 
yRange = [0,10]; numSteps = 100

x, y = np.meshgrid( np.linspace(xRange[0], xRange[1], numSteps),
                    np.linspace(yRange[0], yRange[1], numSteps), indexing='ij' )

def normalize_domain (x):
    x = x + np.abs(np.min(x))
    x = x / (np.max(x) + .001)
    return x
    
# define 1D relationships to target
xComponent = np.sin( x ) * 4
yComponent = np.exp( y / 4 )

# define 2D joint distribution
z = xComponent + yComponent
z = normalize_domain(z)

# plot independent variables
plt.figure( figsize = ( 7, 7) )
plt.subplots_adjust( left = 0.1, right = 0.9, top = 0.9, bottom = 0.1, wspace = 0.2 )
plt.subplot(2,1,1); plt.plot(normalize_domain(xComponent[:,0])); plt.xlabel('time-of-day'), plt.ylabel('commute duration')
plt.xticks([]), plt.yticks([])
plt.subplot(2,1,2); plt.plot(normalize_domain(yComponent[0,:])); plt.xlabel('weather [ severity ]'), plt.ylabel('commute duration')
plt.xticks([]), plt.yticks([])

# plot target [dependent] variable
plt.figure( figsize = (9, 9) )
plt.subplots_adjust( left = 0.1, right = 0.9, top = 0.9, bottom = 0.1 )
ax = plt.subplot(1,1,1, projection='3d');
ax.plot_surface ( x[0::1], y[0::1], z[0::1], color = 'blue', alpha = 1, antialiased = False )
ax.set_xlabel('time of day')
ax.set_ylabel('severity of weather')
ax.set_zlabel('commute length')
plt.show()


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Generate  dataset
Lets generate a dataset by randomly sampling from the target distribution [ with some noise ].

In [4]:
NSamples = 5000
noiseScaling = 1/8.

shuffledDataIndsX = np.random.randint(x.shape[0], size=(NSamples,1))
shuffledDataIndsY = np.random.randint(y.shape[0], size=(NSamples,1))

trainData = np.zeros( ( NSamples, 2 ) )
targetValues = np.zeros( (NSamples, 1 ))
noiseAmount = noiseScaling * ( np.random.rand(NSamples) - .5 )

for iSample in range (NSamples):
    trainData[iSample, 0] = x[ shuffledDataIndsX[iSample], 0 ]
    trainData[iSample, 1] = y[ 0, shuffledDataIndsY[iSample] ]    
    targetValues[iSample] = z[ shuffledDataIndsX[iSample], shuffledDataIndsY[iSample] ] + noiseAmount[iSample]


# Plot dataset samples (red dots) overlayed onto target distribution (blue)

In [5]:
from k3d import K3D

plot = K3D()
zScaling = 5

offset = np.hstack( ( np.ones((trainData.shape[0], 1)) * -5, 
                      np.ones((trainData.shape[0], 1)) + 4, 
                      np.zeros((trainData.shape[0], 1)) ) ) * np.abs(xRange[1]-xRange[0])

plot += K3D.points ( np.hstack( ( trainData, targetValues*zScaling) ) + offset, color=0xFF0000, point_size = .2, shader = 'flat' )
plot += K3D.surface ( z*zScaling, color=0x0055FF, xmin=np.min(trainData[:,0]+offset[::,0]), xmax=np.max(trainData[:,0]+offset[::,0]), ymin=np.min(trainData[:,1]+offset[::,1]), ymax=np.max(trainData[:,1]+offset[::,1]))

plot.display()

# Define model structure

In [10]:
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras import metrics
import keras

model = Sequential()
model.add( Dense( 5, input_dim = 2 ))
model.add( Dense( 27, activation='sigmoid' ))
model.add( Dense( 20, activation='sigmoid' ))
model.add( Dense( 40, activation='sigmoid' ))
model.add( Dense( 1, activation='linear' ))

model.compile(optimizer='adam', loss='mse')

# Visualize network structure

In [11]:
plt.figure()
import nnViz
nnViz.visualize_model(model)

<IPython.core.display.Javascript object>

Model structure [loosely] inspired by NVIDIA's new HQ ;]  
http://c.ymcdn.com/sites/aiascv.org/resource/resmgr/meeting_images/2017/March/Nv2.jpeg

# Train Network (and dynamically visualize)

In [12]:
NEpochs = 350

displayUpdateInterval = 10

currentNN = {}

xOffset = np.zeros( (trainData.shape[0], 1))
yOffset = np.zeros( (trainData.shape[0], 1))
zOffset = np.zeros( (trainData.shape[0], 1))

xModifier = 1 * np.abs(xRange[1]-xRange[0])*1.2; yModifier = 0; zModifier = 0; 
plotCount = 0

evalLoss = np.empty((NEpochs))
evalLoss[:] = np.NaN

for iEpoch in range(NEpochs):
    
    model.fit ( trainData, targetValues, 
               epochs = 1, 
               batch_size = 256, 
               verbose = 0, 
               shuffle = True )
     
    evalLoss[iEpoch] = model.evaluate( trainData, targetValues )

    if iEpoch % displayUpdateInterval == 0:
        comboOffset = np.hstack( (xOffset + xModifier, yOffset + yModifier, zOffset + zModifier) )
        
        currentNN[plotCount] = model.predict( trainData )
        
        print('\n current error: ' + str( evalLoss[plotCount]) )
        plot += K3D.points ( np.hstack( ( trainData, currentNN[plotCount] * zScaling) ) + comboOffset + offset, color=0xA9A9FF, point_size = .2, shader = 'flat' )        
        plot += K3D.text2d ( str( round(evalLoss[plotCount], 4 )), comboOffset + offset + (0, 0, 3), color=0xff00ff, size=.5, reference_point='rb')
        
        plotCount += 1
        if plotCount % 8 == 0:
            xModifier = 1 * np.abs(xRange[1]-xRange[0])*1.2
            yModifier -= 1 * np.abs(yRange[1]-yRange[0])*1.2
        else:
            xModifier += 1 * np.abs(xRange[1]-xRange[0])*1.2

 current error: 0.733682709312
 current error: 0.104272731221
 current error: 0.0475512020946
 current error: 0.0459488663763
 current error: 0.042507170561
 current error: 0.0406228621274
 current error: 0.0387433059573
 current error: 0.0369785617173
 current error: 0.0353498178124
 current error: 0.0339596960008
 current error: 0.0328091323793
 current error: 0.0318602052659
 current error: 0.0311142929673
 current error: 0.0304981160253
 current error: 0.0299908007354
 current error: 0.0295477644444
 current error: 0.0291379277945
 current error: 0.0287509903669
 current error: 0.0283834910154
 current error: 0.0280215913683
 current error: 0.0276708921552
 current error: 0.0273066036224
 current error: 0.0269445549637
 current error: 0.0265955869466

 current error: 0.0262283552229
 current error: 0.0258682247549
 current error: 0.0255047362983
 current error: 0.025186060077
 current error: 0.0248171356678
 current error: 0.0244557149082
 current error: 0.0241287386939
 current er

# Plot learning curve

In [13]:
plt.figure()
plt.plot(evalLoss[:], 'b')
plt.plot(evalLoss[:], 'or')
plt.xlabel('epochs')
plt.ylabel('error')

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x1feb9d2d7f0>

# Visualize network predictions before any training

In [14]:
plot = K3D()
plot += K3D.points ( np.hstack( ( trainData, targetValues*zScaling) ), color=0xFF0000, point_size = .1, shader = 'flat' )
plot += K3D.points ( np.hstack( ( trainData, currentNN[0]*zScaling) ), color=0xFF00FF, point_size = .3, shader = 'flat' )        
plot += K3D.surface ( z*zScaling, color=0x0000FF, xmin=np.min(xRange), xmax=np.max(xRange), ymin=np.min(yRange), ymax=np.max(yRange))
plot.display()

# Visualize network predictions midway through training

In [15]:
plot = K3D()
plot += K3D.points ( np.hstack( ( trainData, targetValues*zScaling) ), color=0xFF0000, point_size = .1, shader = 'flat' )
plot += K3D.points ( np.hstack( ( trainData, currentNN[int(plotCount/2)]*zScaling) ), color=0xFF00FF, point_size = .3, shader = 'flat' )        
plot += K3D.surface ( z*zScaling, color=0x0000FF, xmin=np.min(xRange), xmax=np.max(xRange), ymin=np.min(yRange), ymax=np.max(yRange))
plot.display()

# Visualize network predictions after training is complete

In [16]:
plot = K3D()
plot += K3D.points ( np.hstack( ( trainData, targetValues*zScaling) ), color=0xFF0000, point_size = .1, shader = 'flat' )
plot += K3D.points ( np.hstack( ( trainData, currentNN[plotCount-1]*zScaling) ), color=0xFF00FF, point_size = .3, shader = 'flat' )        
plot += K3D.surface ( z*zScaling, color=0x0000FF, xmin=np.min(xRange), xmax=np.max(xRange), ymin=np.min(yRange), ymax=np.max(yRange))
plot.display()

# Plot network graph (trained weights)

In [17]:
plt.figure()
import nnViz
nnViz.visualize_model(model)

<IPython.core.display.Javascript object>

In [None]:
print('thanks!')

In [None]:
'''
Miro Enev -- menev@nvidia.com
Ph.D, Sr. Solution Architect @ NVIDIA
'''