# Symbolic Computation

In this lesson, we will explore how symbolic software packages, such as Theano, can help us in optimization and machine learning, by automatically computing the gradients of a given function. We will use an artificial set of 50 data points $(x,y)$ where $x = (x_3, x_2, x_1, x_0) \in \mathbb{R}^4$ with $x_0 =1$, and $y \in \mathbb{R}$. Our goal is to perform gradient descent and linear regression on this data set. First, we load the data set.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
csv = 'https://www.dropbox.com/s/oqoyy9p849ewzt2/linear.csv?dl=1'
data = np.genfromtxt(csv,delimiter=',')
X = data[:,1:]
Y = data[:,0].reshape(-1,1)
print(X.shape)
print(Y.shape)

(50, 4)
(50, 1)


We now load the Theano package, and set some global constants. Note that a tensor is a multi-dimensional matrix. The `theano.tensor` module will be used for symbolic computations involving tensors.

In [2]:
import theano
import theano.tensor as T
d = X.shape[1] # dimension of feature vectors
n = X.shape[0] # number of training samples
learn_rate = 0.5 # learning rate for gradient descent

Next, we create some symbolic variables. These symbolic variables act as algebraic objects in Theano, so that Theano will know how to differentiate functions involving these objects. For example, if we create a symbolic variable called $t$, and ask Theano to differentiate the symbolic function $f=t^2$ with respect to $t$, it will return the symbolic function $2t$. The output of the `pp` function is a little difficult to parse. The function `fill(a,b)` creates a matrix/tensor that is the same shape as `a` and fills its entries with `b`.

In [3]:
from theano import pp      # pp for printing symbolic objects
t = T.scalar(name='t')     # symbolic variable
f = t**2                   # symbolic function
fgrad = T.grad(f, wrt=t)
print(type(t))
print(type(f))
print(type(fgrad))
print(pp(fgrad))

<class 'theano.tensor.var.TensorVariable'>
<class 'theano.tensor.var.TensorVariable'>
<class 'theano.tensor.var.TensorVariable'>
((fill((t ** TensorConstant{2}), TensorConstant{1.0}) * TensorConstant{2}) * (t ** (TensorConstant{2} - TensorConstant{1})))


We can turn a symbolic function into a compiled function. The first argument `[t]` specifies a list of symbolic variables which will become the inputs of the compiled function. It is difficult to display what the compiled function is doing, because the function is represented as a computational graph, so that it can be parallelized later if necessary. By applying the function `pp` to `g.maker.fgraph.outputs[0]`, we get a simplified glimpse of this graph.

In [4]:
g = theano.function([t], fgrad)
print(g(3))
print(type(g))
print(pp(g.maker.fgraph.outputs[0]))

6.0
<class 'theano.compile.function_module.Function'>
(TensorConstant{2.0} * t)


There are many types of symbolic variables: scalars, vectors, matrices, and so on. For our linear regression problem, we will create a matrix `x` and a vector `y`. We do not need to specify their dimensions. There is also a special kind of symbolic variable, called a shared variable. It is a symbolic variable that also stores a value for the variable. Theano figures out what kind of tensor to create for the shared variable when you give it a numpy object such a matrix of zeros.

In [5]:
x = T.matrix(name='x')                       # feature matrix
y = T.matrix(name='y')                       # response vector
w = theano.shared(np.zeros((d,1)),name='w')  # model parameters
print(w.get_value())

[[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]]


We now write down the empirical risk as a symbolic function. Note that we have to use the functions `T.sum` and `T.dot` instead of `np.sum` and `np.dot` to construct this symbolic function. We let Theano compute the gradient of the risk.

In [6]:
risk = T.sum((T.dot(x,w) - y)**2)/2/n      # empirical risk
grad_risk = T.grad(risk, wrt=w)            # gradient of the risk

Next, we construct a compiled function that performs one step of the gradient descent. It does not take any inputs, and outputs the value of the symbolic function `risk`. Since `risk` depends on symbolic variables `x, y, w`, we need to specify their values. The values of `x, y` are specified by the `givens` argument. The value of `w` will be obtained from that stored in the shared variable. The compiled function also performs an addition step each time it is called. This step involves updating the value of the shared variable `w` with the value `w-learn_rate*grad_risk`.

In [7]:
train_model = theano.function(inputs=[],
                              outputs=risk,
                              updates=[(w, w-learn_rate*grad_risk)],
                              givens={x:X, y:Y})

Finally, we perform the gradient descent algorithm. The compiled function `train_model` is called until the difference between successive training risks/losses is less than the specified tolerance. We also put a limit on the maximum number of iterations in case the gradient descent takes too long. 

In [8]:
max_iter = 50
num_iter = 1
tol = 10**(-12)
diff = tol + 1
prev_loss = train_model()
while (num_iter < max_iter and diff > tol):
    loss = train_model()
    diff = prev_loss-loss
    print('{0:2d}  loss: {1:.15f}  diff: {2:.15f}'.format(num_iter, loss.item(), diff.item()))
    prev_loss = loss
    num_iter += 1

 1  loss: 0.758755942272566  diff: 1.860566066312890
 2  loss: 0.235428122249730  diff: 0.523327820022836
 3  loss: 0.079395760473308  diff: 0.156032361776422
 4  loss: 0.030096896364577  diff: 0.049298864108731
 5  loss: 0.013695201975103  diff: 0.016401694389473
 6  loss: 0.008006092308871  diff: 0.005689109666232
 7  loss: 0.005970446508700  diff: 0.002035645800171
 8  loss: 0.005225932117259  diff: 0.000744514391441
 9  loss: 0.004949550127904  diff: 0.000276381989355
10  loss: 0.004845924776166  diff: 0.000103625351738
11  loss: 0.004806813029967  diff: 0.000039111746199
12  loss: 0.004791984242872  diff: 0.000014828787095
13  loss: 0.004786344307211  diff: 0.000005639935661
14  loss: 0.004784194268801  diff: 0.000002150038410
15  loss: 0.004783373170821  diff: 0.000000821097979
16  loss: 0.004783059133879  diff: 0.000000314036943
17  loss: 0.004782938874884  diff: 0.000000120258995
18  loss: 0.004782892769416  diff: 0.000000046105468
19  loss: 0.004782875074349  diff: 0.000000017