In this tutorial we will cover these topics:

1- A brief overview of available libraries for deeplearning.

2- Basic theano concepts and functionalities including computation graphs, shared variables, theano function, scan, etc.

3- Simple examples for Logistic Regression, Neural Networks and Recurrent Neural Networks. 

Overview of different Packages:
<img src="Selection_020.png">

Computation graph is the core idea behind Theano's architecture. All the functions that the user wants to give to theano must be declared in a computation graph first. The main reason is that it allows Theano to compute analytical derivation for these graphs that later will be used for optimization. Having computation graph also provides a convenient way for code optimization before that the actual process starts, similar to compiler based languages.

In [1]:
import numpy as np
import theano.tensor as T
import theano

# Symbolic Variables 
x= T.iscalar('x')
y= T.iscalar('y')
z= T.iscalar('z')
a= T.iscalar('a')

# Construct computation Graph
x= y+z
a= x**2

# Compile the function
f= theano.function([y,z], a)

# Create real values and evaluate the function
y_real= 10
z_real= 20

a_real= f(y_real, z_real)

print('Output is ', a_real)

('Output is ', array(900, dtype=int32))


Computation Graph Exercise:
Create a function that computes the logistic function

In [2]:

# Write your code here


Learned Concepts:
1- Shared Variables 
2- Updates

In [3]:
state= theano.shared(0.0)
inc= T.iscalar('inc')
accumulator= theano.function([inc], state, updates= [( state, state+inc)])
incc= 5
print('State = ', accumulator(incc))
print('State = ', accumulator(incc))
print('State = ', accumulator(incc))
print('State = ', accumulator(incc))

('State = ', array(0.0))
('State = ', array(5.0))
('State = ', array(10.0))
('State = ', array(15.0))


Logistic Regression Tutorial:

This uses the shared variables, copmmutation graph, updates concept

In [4]:
N = 400                                   # training sample size
feats = 784                               # number of input variables

# generate a dataset: D = (input_values, target_class)
D = (np.random.randn(N, feats), np.random.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.dmatrix("x")
y = T.dvector("y")

# initialize the weight vector w randomly
#
# this and the following bias variable b
# are shared so they keep their values
# between training iterations (updates)
w = theano.shared(np.random.randn(feats), name="w")

# initialize the bias term
b = theano.shared(0., name="b")

print("Initial model:")
#print(w.get_value())
print(b.get_value())

Initial model:
0.0


In [5]:
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
cent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = cent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize

gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # w.r.t weight vector w and
                                          # bias term b
                                          # (we shall return to this in a
                                          # following section of this tutorial)

# Compile
train = theano.function(
          inputs= [x,y],
          outputs= [prediction, cent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)

In [6]:
# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])

print("Final model:")
#print(w.get_value())
print(b.get_value())
print("target values for D:")
print(D[1])
print("prediction on D:")
print(predict(D[0]))

Final model:
-0.178874207758
target values for D:
[1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 1 0 1
 0 1 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 1 1 1 0 0 1 0 1
 1 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0
 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0 1 1 1
 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 0
 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 1 0 1
 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 0 1 1 0 1 0 0
 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0
 0 1 1 1 0 1 1 0 1 1 0 1 0 1 0 0 1 1 0 1 0 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 1
 1 0 1 1 0 0 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1
 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1]
prediction on D:
[1 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 1 0 1
 0 1 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0

Exercise Write your own simple neural network (Fully Connected one). 

It has two layers: 

Layer1: y1= tanh(w1*x+b1)

Layer2: y2= w2*y1+b2

Layer3: y3= sigmoid(y2)

Cost is cross entropy


In [7]:
# 1- Create shared Variables for weights and initialize them to random + symbolic variables
N, D, H, C= 10, 1000, 100, 1

x= T.matrix('x')
y= T.vector('y', dtype= 'int64')

w1= theano.shared(np.random.randn(D, H),name='w1')
w2= theano.shared(np.random.randn(H, C),name='w2')
b1= theano.shared(np.zeros((H,)), name= 'b1')
b2= theano.shared(np.zeros((C,)), name= 'b2')

# 2- Forward Pass Computation Graph
# Write Code Here

# 3- Backward Pass Compute Gradients
# Write Code Here

# 4- Define your train function with the proper updates using sgd
# Write Code Here

# 5- Evaluate your train function
xx= np.random.randn(N, D)
yy= np.random.randint(size= N, low= 0.0, high= C+1)

# Write Code Here


Solution to the Simple Neural Network Example

In [8]:
import numpy as np
import theano.tensor as T
import theano
# 1- Create shared Variables for weights and initialize them to random + symbolic variables
N, D, H, C= 10, 1000, 100, 1

x= T.matrix('x')
y= T.vector('y', dtype= 'int64')

w1= theano.shared(np.random.randn(D, H),name='w1')
w2= theano.shared(np.random.randn(H, C),name='w2')
b1= theano.shared(np.zeros((H,)), name= 'b1')
b2= theano.shared(np.zeros((C,)), name= 'b2')

# 2- Forward Pass Computation Graph
y1= T.tanh(T.dot(x,w1)+b1)
y2= T.dot(y1, w2)+b2
p_1 = 1 / (1 + T.exp(-y2))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
cent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = cent.mean()

# 3- Backward Pass Compute Gradients
dw1, dw2, db1, db2= T.grad(cost, [w1, w2, b1, b2])

# 4- Define your train function with the proper updates using sgd
lr= 0.001
train= theano.function(inputs=[x,y],
                       outputs=[cost, prediction],
                       updates= [(w1, w1-lr*dw1), (w2, w2-lr*dw2), (b1, b1-lr*db1), (b2, b2-lr*db2)])

# 5- Evaluate your train function
xx= np.array(np.random.randn(N, D), dtype="float32")
yy= np.random.randint(size= N, low= 0.0, high= C+1)

iters=100
for i in range(iters):
    for j in range(N):
        cost, preds = train(xx[j, :].reshape((1,D)),yy[j].reshape((1,)) )
        if i == iters-1:
            print('Target Labels are', yy[j])
            print('Current predictions are ', preds)

('Target Labels are', 0)
('Current predictions are ', array([[0]], dtype=int8))
('Target Labels are', 0)
('Current predictions are ', array([[0]], dtype=int8))
('Target Labels are', 1)
('Current predictions are ', array([[0]], dtype=int8))
('Target Labels are', 0)
('Current predictions are ', array([[0]], dtype=int8))
('Target Labels are', 1)
('Current predictions are ', array([[1]], dtype=int8))
('Target Labels are', 1)
('Current predictions are ', array([[1]], dtype=int8))
('Target Labels are', 1)
('Current predictions are ', array([[1]], dtype=int8))
('Target Labels are', 1)
('Current predictions are ', array([[1]], dtype=int8))
('Target Labels are', 0)
('Current predictions are ', array([[0]], dtype=int8))
('Target Labels are', 0)
('Current predictions are ', array([[0]], dtype=int8))


Scan in theano:

Scan function works as a loop for computational graphs. Similar to regular loops it has an iterable  object to iterate through, a stopping criteria and output. It also can have states to get updated at each iteration and initial values for those state.The difference to regular loop is that in scan everything is in computation graph and therefore derivable.

In [9]:
k = T.iscalar("k")
A = T.vector("A")

def inner_fct(prior_result, B):
    return prior_result * B

# Symbolic description of the result
result, updates = theano.scan(fn=inner_fct,
                            outputs_info=T.ones_like(A),
                            non_sequences=A, n_steps=k)

# Scan has provided us with A ** 1 through A ** k.  Keep only the last
# value. Scan notices this and does not waste memory saving them.
final_result = result[-1]

power = theano.function(inputs=[A, k], outputs=final_result,
                      updates=updates)

print(power(range(10), 2))

[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]


A simple example of Recurrent Neural Networks (RNN):

Theano is one the most advantageous libraries when it comes to RNNs. Mostly thanks to the Scan function that facilitates recursion. A very simple example of RNN (You can say THE simplest) is brought here. Even though it is a very straight forward network implementing it in other frameworks (Caffe for instance) is quite troublesome.  

In [13]:
"""
Implementing a RNN netowrk with single scalar hidden state.
It takes a vector of floats as input and outputs a single float.
Data (input vector and output lable) is generated on the fly. 
"""
import numpy as np
import theano.tensor as T
import theano

class network():
    """
    Encapsulate the networks architecture (layers), function compilation and the train function. 
    """
    def __init__ (self):
        """
        Initilizing the network with a fixed two layers(one RNN layer and one Euclidian loss layer.) structure.
        Generates theano functions for forward pass and update. 
        Simple Stochastic Gradient Descent for optimization.
        """
        #Networks learnable parameters
        self.params = {}
        
        #learning rate for SGD
        #Play with for fast and smooth convergance (specifically try 0.01 and 0.005 and 0.001). 
        #Try to make it adaptive.
        lr = np.array(0.005, dtype="float32")
        
        #Input vector and label
        self.x = T.vector('x')
        self.y = T.scalar('y')
        
        #Creating the network                                                                                                                                                    
        self.RNN_out = self.RNN(self.x)
        self.loss = self.euc_loss(self.RNN(self.x), self.y)
        
        #Computing gradients w.r.t network params.
        self.grads = T.grad(self.loss, wrt = list(self.params.values()) )
        
        #SGD optimization
        gshared = [theano.shared(p.get_value() * np.array(0.0, dtype="float32"), name='%s_grad' % k) for k, p in self.params.items()]
        gsup = [(gs, g) for gs, g in zip(gshared, self.grads)]
        pup = [(param, param - lr*g) for param, g in zip(self.params.values(), gshared)]
        
        #Forward pass fucntion for testing
        self.f_forward = theano.function([self.x], outputs=[self.RNN_out])
        
        #Train function
        self.train = theano.function([self.x, self.y], outputs=[self.RNN_out, self.loss], updates=gsup + pup)
    
        
    def RNN(self, X):
        """
        A simple one node RNN. There are many good references on RNN. This is an interesting one : 
        http://karpathy.github.io/2015/05/21/rnn-effectiveness/
        """
        
        #defining shared variables to be used as weights
        self.params['W'] = theano.shared(value=np.array(np.random.rand(), dtype="float32"), name= 'W')
        self.params['U'] = theano.shared(value=np.array(np.random.rand(), dtype="float32"), name= 'U')
        self.params['b'] = theano.shared(value=np.array(np.random.rand(), dtype="float32"), name= 'b')
        self.params['Wo'] = theano.shared(value=np.array(np.random.rand(), dtype="float32"), name= 'Wo')
        
        #stopping criteria
        n_steps = X.shape[0]
        
        #RNN model
        def step(x, _h): 
            h = T.tanh(self.params['W']*_h + self.params['U']*x + self.params['b'])
            y = self.params['Wo'] * h
            return y, h
            
        #Scan function that carries out recurssion
        results, update = theano.scan(step, sequences = X, outputs_info =[None,  np.array(0.0, dtype="float32")], n_steps = n_steps)
        return results[0][-1]
            

    def euc_loss(self, inp, label):
        """
        Euclidian loss function
        """
        return (inp-label)**2

def dummy(x):
    """
    Dummy function for generating labels given x vector.
    Change it to other fun stuff.
    """
    return x.sum()

#Initializing the netowrk object
net = network()

#Maximum number of iterations. Its optimum value is closely related to the learning rate.
max_iter = 100000
disp_freq = 2000

loss = np.zeros(disp_freq)
for i in range(1,max_iter):
    x = np.array(np.random.rand(2), dtype="float32")
    y = dummy(x)
    loss[i%disp_freq] = net.train(x,y)[1]
    if i%disp_freq == 0:
        print ("loss at iter%s = %s"%(i,loss.mean()))

loss at iter2000 = 0.0809584768036
loss at iter4000 = 0.00856017187026
loss at iter6000 = 0.00607692270947
loss at iter8000 = 0.00503809614983
loss at iter10000 = 0.00441473602942
loss at iter12000 = 0.00352438902958
loss at iter14000 = 0.00302730530358
loss at iter16000 = 0.00296299066379
loss at iter18000 = 0.00264900848079
loss at iter20000 = 0.00261838275108
loss at iter22000 = 0.0023242129243
loss at iter24000 = 0.00226879716425
loss at iter26000 = 0.0020320633858
loss at iter28000 = 0.00194237323452
loss at iter30000 = 0.00180640419356
loss at iter32000 = 0.00177867998803
loss at iter34000 = 0.00168008935829
loss at iter36000 = 0.00173192730423
loss at iter38000 = 0.0015834778767
loss at iter40000 = 0.0016342571617
loss at iter42000 = 0.00155431930359
loss at iter44000 = 0.00156270330159
loss at iter46000 = 0.0013798623641
loss at iter48000 = 0.00135397259618
loss at iter50000 = 0.00130974625783
loss at iter52000 = 0.001281944656
loss at iter54000 = 0.001388680535
loss at iter560