# Baby Steps -Algebra
## Adding two Scalars
To get us started with Theano and get a feel of what we’re working with, let’s make a simple function: add two numbers together. Here is how you do it:



In [20]:
import numpy
import theano.tensor as T
from theano import function
import theano

In [2]:
x=T.dscalar('x')
y=T.dscalar('y')
z=x+y
f=function([x,y],z)

In [3]:
f(2,4)

array(6.0)

In [4]:
numpy.allclose(f(16.3,12.1),28.4)

True

## Adding two Matrices


In [5]:
x=T.dmatrix('x')
y=T.dmatrix('y')
z=x+y
f=function([x,y],z)

In [6]:
f([[1,2],[3,4]],[[1,2],[3,4]])

array([[ 2.,  4.],
       [ 6.,  8.]])

The following types are available:

byte: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4

16-bit integers: wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4

32-bit integers: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4

64-bit integers: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4

float: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4

double: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4

complex: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4

In [7]:
#Excercise Problem

#Elementwise squaring

a=T.vector()#declaring variable
b=T.vector()
out=a**2+b**2+2*a*b
f=theano.function([a,b],out)
print(f([1,2,1],[1,2,3]))

NameError: name 'theano' is not defined

## Logistic Function

In [None]:
x=T.dmatrix('x')
s=1/(1+T.exp(-x))
logistic=function([x],s)

In [None]:
logistic([[0,1],[2,3]])

In [None]:
#Altnative form to compute log
s2=(1+T.tanh(x/2))/2
logistic2=function([x],s2)

In [None]:
logistic([[0,1],[2,3]])

## Computing More than one Thing at the Same Time
Theano supports functions with multiple outputs. For example, we can compute the elementwise difference, absolute difference, and squared difference between two matrices a and b at the same time:

In [None]:
a,b=T.dmatrices('a','b')
diff=a-b
abs_diff=abs(diff)
diff_squared=diff**2
f=function([a,b],[diff,abs_diff,diff_squared])

In [None]:
f([[1,2],[3,4]],[[5,6],[7,8]])

## Setting a Default Value for an Argument
Let’s say you want to define a function that adds two numbers, except that if you only provide one number, the other input is assumed to be one. You can do it like this:

In [8]:
from theano import In

x,y=T.dscalars('x','y')
z=x+y
f=function([x,In(y,value=1)],z)


In [9]:
f(33)


array(34.0)

In [10]:
f(33,2)

array(35.0)

This makes use of the In class which allows you to specify properties of your function’s parameters with greater detail. Here we give a default value of 1 for y by creating a In instance with its value field set to 1.

## Using Shared Variables
It is also possible to make a function with an internal state. For example, let’s say we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function’s argument.

In [11]:
from theano import shared
state=shared(0)
inc=T.iscalar('inc')
accumulator=function([inc],state,updates=[(state,state+inc)])

In [12]:
print(state.get_value())

0


In [13]:
accumulator(1)

array(0)

In [14]:
accumulator(1)

array(1)

In [15]:
print state.get_value()

2


In [16]:
#resetting value
state.set_value(0)
print state.get_value()

0


In [17]:
decrementor = function([inc], state, updates=[(state, state-inc)])

In [18]:
decrementor(1)

array(0)

## Copying Functions
Theano functions can be copied, which can be useful for creating similar functions but with different shared variables or updates. This is done using the copy() method of function objects. The optimized graph of the original function is copied, so compilation only needs to be performed once.

In [21]:
state=theano.shared(0)
inc=T.iscalar('inc')
accumulator=function([inc],state,updates=[(state,state+inc)])

In [23]:
accumulator(10)
print state.get_value()

20


We can use copy() to create a similar accumulator but with its own internal state using the swap parameter, which is a dictionary of shared variables to exchange:



In [28]:
new_state=theano.shared(0)
new_accumulator=accumulator.copy(swap={state:new_state})
new_accumulator(100)
new_state.get_value()

array(100)

In [33]:
#We now create a copy with updates removed using the delete_updates parameter, which is set to False by default:
null_accumulator=accumulator.copy(delete_updates=False)#Error when switched to true
null_accumulator(9123)

[array(9143)]

## Using Random Numbers
The way to think about putting randomness into Theano’s computations is to put random variables in your graph. Theano will allocate a NumPy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well. Theanos’s random objects are defined and implemented in RandomStreams and, at a lower level, in RandomStreamsBase.

In [34]:
from theano.tensor.shared_randomstreams import RandomStreams
srng=RandomStreams(seed=234)
rv_u=srng.uniform((2,2))
rv_n=srng.normal((2,2))
f=function([],rv_u)
g=function([],rv_n,no_default_updates=True)
nearly_zeroes=function([],rv_u+rv_u-2*rv_u)

In [35]:
fval0=f()
fval1=f()

In [36]:
print fval0

[[ 0.12672381  0.97091597]
 [ 0.13989098  0.88754825]]


In [37]:
"""When we add the extra argument no_default_updates=True to function (as in g), then the random number generator 
state is not affected by calling the returned function. So, for example, calling g multiple times will return 
the same numbers."""

gval0=g()
gval1=g()
print gval0
print gval1


[[ 0.37328447 -0.65746672]
 [-0.36302373 -0.97484625]]
[[ 0.37328447 -0.65746672]
 [-0.36302373 -0.97484625]]


## Seeding Streams
Random variables can be seeded individually or collectively.



In [41]:
rng_val=rv_u.rng.get_value(borrow=True)
rng_val.seed(123)
rv_u.rng.set_value(rng_val,borrow=True)

## Copying Random State Between Theano Graphs
An example of how “random states” can be transferred from one theano function to another is shown below.



In [42]:
from theano.sandbox.rng_mrg import MRG_RandomStreams
from theano.tensor.shared_randomstreams import RandomStreams
from __future__ import print_function

In [54]:
class Graph():
    def __init__(self,seed):
        self.rng=RandomStreams(seed)
        self.y=self.rng.uniform(size=(1,))

In [61]:
g1=Graph(seed=123)
f1=function([],g1.y)
g2=Graph(seed=987)
f2=function([],g2.y)

In [62]:
def copy_random_state(g1,g2):
    if isinstance(g1.rng,MRG_RandomStreams):
        g2.rng.rstate=g1.rng.rstate
    for (su1,su2) in zip(g1.rng.state_updates,g2.rng.state_updates):
        su2[0].set_value(su1[0].get_value())

In [69]:
copy_random_state(g1,g2)


In [70]:
f1()

array([ 0.23715077])

In [71]:
f2()

array([ 0.23715077])

# Derivatives in Theano
## Computing Gradients
Now let’s use Theano for a slightly more sophisticated task: create a function which computes the derivative of some expression y with respect to its parameter x. To do this we will use the macro T.grad

In [72]:
from theano import pp
x=T.dscalar('x')
y=x**2
gy=T.grad(y,x)
f=theano.function([x],gy)


In [75]:
pp(gy) # print out the gradient prior to optimization


'((fill((x ** TensorConstant{2}), TensorConstant{1.0}) * TensorConstant{2}) * (x ** (TensorConstant{2} - TensorConstant{1})))'

In [74]:
f(4)

array(8.0)

In [77]:
#more complex grads
x=T.dmatrix('x')
s=T.sum(1/1+T.exp(-x))
gs=T.grad(s,x)
dlogistic=function([x],gs)
dlogistic([[0, 1], [-1, -2]])

array([[-1.        , -0.36787944],
       [-2.71828183, -7.3890561 ]])

## Computing Jacobian
In Theano’s parlance, the term Jacobian designates the tensor comprising the first partial derivatives of the output of a function with respect to its inputs. (This is a generalization of to the so-called Jacobian matrix in Mathematics.) Theano implements the theano.gradient.jacobian() macro that does all that is needed to compute the Jacobian. 

In [78]:
x=T.dvector('x')
y=x**2
J,updates=theano.scan(lambda i, y,x : T.grad(y[i],x),sequences=T.arange(y.shape[0]),non_sequences=[y,x])
f=theano.function([x],J,updates=updates)


In [79]:
f([4,4])

array([[ 8.,  0.],
       [ 0.,  8.]])

## Computing the Hessian
In Theano, the term Hessian has the usual mathematical acception: It is the matrix comprising the second order partial derivative of a function with scalar output and vector input. Theano implements theano.gradient.hessian() macro that does all that is needed to compute the Hessian. The following text explains how to do it manually.

In [81]:
x=T.dvector('x')
y=x**2
cost=y.sum()
gy=T.grad(cost,x)
H,updates=theano.scan(lambda i,gy,x: T.grad(gy[i],x) , sequences=T.arange(gy.shape[0]),non_sequences=[gy,x])
f=function([x],H,updates=updates)


In [82]:
f([4,4])

array([[ 2.,  0.],
       [ 0.,  2.]])

In [None]:
##Not tested out L & R -