In [3]:
from theano import *

In [4]:
import theano.tensor as T

# NumPy refresher

## Matrix conventions for machine learning

Rows : 행 (가로)  
Columns : 열 (세로)  

inputs[10,5]  
matrix of 10 examples , dimension 5

In [5]:
numpy.asarray([[1., 2], [3, 4], [5, 6]])

array([[ 1.,  2.],
       [ 3.,  4.],
       [ 5.,  6.]])

In [7]:
numpy.asarray([[1., 2], [3, 4], [5, 6]]).shape

(3, 2)

3x2 matrix  
3 rows and 2 columns (3행 2열)

In [9]:
numpy.asarray([[1., 2], [3, 4], [5, 6]])[2, 0]

5.0

3 rows and 2 columns

[2,0]  3번째행, 1번째열

## Broadcasting

In [15]:
a = numpy.asarray([1.0, 2.0, 3.0])
b = numpy.asarray([2.0, 2.0, 2.0])
a*b

array([ 2.,  4.,  6.])

Numpy does broadcasting of arrays of different shapes during arithmetic operations. What this means in general is that the smaller array (or scalar) is broadcasted across the larger array so that they have compatible shapes. The example below shows an instance of broadcastaing:

In [12]:
a = numpy.asarray([1.0, 2.0, 3.0])

In [13]:
b = 2.0

In [14]:
a * b

array([ 2.,  4.,  6.])

The smaller array b (actually a scalar here, which works like a 0-d array) in this case is broadcasted to the same size as a during the multiplication. This trick is often useful in simplifying how expression are written. More detail about broadcasting can be found in the numpy user guide.

http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

# Baby Steps - Algebra

## Adding two Scalars¶

To get us started with Theano and get a feel of what we’re working with, let’s make a simple function: add two numbers together. Here is how you do it:

In [16]:
import theano.tensor as T

In [17]:
from theano import function

In [18]:
x = T.dscalar('x')

In [19]:
y = T.dscalar('y')

In [20]:
z = x + y

In [22]:
f = function([x, y], z)

In [23]:
f(2, 3)

array(5.0)

In [24]:
f(16.3, 12.1)

array(28.4)

Let’s break this down into several steps. The first step is to define two symbols (Variables) representing the quantities that you want to add. Note that from now on, we will use the term Variable to mean “symbol” (in other words, x, y, z are all Variable objects). The output of the function f is a numpy.ndarray with zero dimensions.

If you are following along and typing into an interpreter, you may have noticed that there was a slight delay in executing the function instruction. Behind the scene, f was being compiled into C code.

## Step 1

In [25]:
x = T.dscalar('x')

In [26]:
y = T.dscalar('y')

In Theano, all symbols must be typed. In particular, T.dscalar is the type we assign to “0-dimensional arrays (scalar) of doubles (d)”. It is a Theano Type.

dscalar is not a class. Therefore, neither x nor y are actually instances of dscalar. They are instances of TensorVariable. x and y are, however, assigned the theano Type dscalar in their type field, as you can see here:

In [27]:
type(x)

theano.tensor.var.TensorVariable

In [28]:
x.type

TensorType(float64, scalar)

In [29]:
T.dscalar

TensorType(float64, scalar)

In [30]:
x.type is T.dscalar

True

By calling T.dscalar with a string argument, you create a Variable representing a floating-point scalar quantity with the given name. If you provide no argument, the symbol will be unnamed. Names are not required, but they can help debugging.

More will be said in a moment regarding Theano’s inner structure. You could also learn more by looking into Graph Structures.

## Step 2

The second step is to combine x and y into their sum z:

In [31]:
z = x + y

z is yet another Variable which represents the addition of x and y. You can use the pp function to pretty-print out the computation associated to z.

In [32]:
from theano import pp

In [33]:
print pp(z)

(x + y)


## Step 3

The last step is to create a function taking x and y as inputs and giving z as output:

In [35]:
f = function([x, y], z)

The first argument to function is a list of Variables that will be provided as inputs to the function. The second argument is a single Variable or a list of Variables. For either case, the second argument is what we want to see as output when we apply the function. f may then be used like a normal Python function.

### Note

As a shortcut, you can skip step 3, and just use a variable’s eval method. The eval() method is not as flexible as function() but it can do everything we’ve covered in the tutorial so far. It has the added benefit of not requiring you to import function() . Here is how eval() works:

In [36]:
import theano.tensor as T

In [37]:
x = T.dscalar('x')

In [38]:
y = T.dscalar('y')

In [39]:
z = x + y

In [40]:
z.eval({x : 16.3, y : 12.1})

array(28.4)

We passed eval() a dictionary mapping symbolic theano variables to the values to substitute for them, and it returned the numerical value of the expression.

eval() will be slow the first time you call it on a variable – it needs to call function() to compile the expression behind the scenes. Subsequent calls to eval() on that same variable will be fast, because the variable caches the compiled function.

## Adding two Matrices

You might already have guessed how to do this. Indeed, the only change from the previous example is that you need to instantiate x and y using the matrix Types:

In [41]:
x = T.dmatrix('x')

In [42]:
y = T.dmatrix('y')

In [43]:
z = x + y

In [44]:
f = function([x, y], z)

dmatrix is the Type for matrices of doubles. Then we can use our new function on 2D arrays:

In [45]:
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

array([[ 11.,  22.],
       [ 33.,  44.]])

The variable is a NumPy array. We can also use NumPy arrays directly as inputs:

In [46]:
import numpy

In [47]:
f(numpy.array([[1, 2], [3, 4]]), numpy.array([[10, 20], [30, 40]]))

array([[ 11.,  22.],
       [ 33.,  44.]])

It is possible to add scalars to matrices, vectors to matrices, scalars to vectors, etc. The behavior of these operations is defined by broadcasting.

The following types are available:

* byte: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
* 16-bit integers: wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4
* 32-bit integers: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
* 64-bit integers: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
* float: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
* double: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* complex: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4

You, the user—not the system architecture—have to choose whether your program will use 32- or 64-bit integers (i prefix vs. the l prefix) and floats (f prefix vs. the d prefix).

http://deeplearning.net/software/theano/library/tensor/basic.html#libdoc-tensor-creation

## Exercise

In [48]:
import theano

In [49]:
a = theano.tensor.vector() # declare variable
out = a + a ** 10               # build symbolic expression
f = theano.function([a], out)   # compile function

In [50]:
print(f([0, 1, 2]))

[    0.     2.  1026.]


Modify and execute this code to compute this expression: a ** 2 + b ** 2 + 2 * a * b.

In [None]:
#!/usr/bin/env python
# Theano tutorial
# Solution to Exercise in section 'Baby Steps - Algebra'

from __future__ import print_function
import theano
a = theano.tensor.vector()  # declare variable
b = theano.tensor.vector()  # declare variable
out = a ** 2 + b ** 2 + 2 * a * b  # build symbolic expression
f = theano.function([a, b], out)   # compile function
print(f([1, 2], [4, 5]))  # prints [ 25.  49.]