#Blocks tutorial

##Bricks

###Introduction

Blocks provides instruments to extend Theano. The core entity of Blocks is a `Brick` which is a parametrized Theano operation.

Bricks can be applied to Theano variables and output Theano variables.

In [1]:
from __future__ import print_function
import theano
from theano import tensor
from blocks.bricks import Linear
x = tensor.matrix('features') # dim: (batch, features)
linear = Linear(input_dim=784, output_dim=10)
y_hat = linear.apply(x)
y_hat = abs(2 * y_hat)
isinstance(y_hat, theano.Variable)

Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 30 days


True

Now we can compile a Theano function

In [2]:
import numpy
from theano import function
f = function([x], y_hat)
f(numpy.zeros((10, 784)))

array([[ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan],
       [ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan]])

The function works like we expected except that the output is NaNs. The reason for this is that all the shared variables are initialized with NaN at the beginning. So, if your output is NaN, check that you didn't forget to initialize all the bricks properly.

In [3]:
from blocks.initialization import Constant
linear.weights_init = Constant(1.)
linear.biases_init = Constant(0.)
linear.initialize()
f(numpy.zeros((10, 784)))

array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

Every brick has a list of parameters and a list of its children

In [4]:
print(linear.parameters)
print(linear.children)

[W, b]
[]


`Linear` brick doesn't have any children. `MLP` is a sequence of linear transformations and activations

In [6]:
from blocks.bricks import MLP, Tanh, Softmax
from blocks.initialization import IsotropicGaussian
mlp = MLP([Tanh(), Softmax()], [784, 100, 10],
          weights_init=IsotropicGaussian(0.01),
          biases_init=Constant(0))
mlp.initialize()
probs = mlp.apply(tensor.flatten(x, outdim=2))
mlp.children

[<blocks.bricks.Linear object at 0x7f37f4d08b50: name=linear_0>, <blocks.bricks.Tanh object at 0x7f37f4d08990: name=tanh>, <blocks.bricks.Linear object at 0x7f37f4d08bd0: name=linear_1>, <blocks.bricks.Softmax object at 0x7f37f4d089d0: name=softmax>]

Note that activations and costs are also bricks

In [7]:
from blocks.bricks.cost import CategoricalCrossEntropy
y = tensor.lmatrix('targets')
cost = CategoricalCrossEntropy().apply(y.flatten(), probs)

###Brick lifecycle
The life-cycle of a brick is as follows:

1. **Configuration:** set (part of) the *attributes* of the brick. Can take
   place when the brick object is created, by setting the arguments of the
   constructor, or later, by setting the attributes of the brick object. No
   Theano variable is created in this phase.

2. **Allocation:** (optional) allocate the Theano shared variables for the
   *parameters* of the Brick. When `Brick.allocate` is called, the
   required Theano variables are allocated and initialized by default to ``NaN``.

3. **Application:** instantiate a part of the Theano computational graph,
   linking the inputs and the outputs of the brick through its *parameters*
   and according to the *attributes*. Cannot be performed (i.e., results in an
   error) if the Brick object is not fully configured.

4. **Initialization:** set the **numerical values** of the Theano variables
   that store the *parameters* of the Brick. The user-provided value will
   replace the default initialization value.

####Note
   If the Theano variables of the brick object have not been allocated when 
   `Application.apply` is called, Blocks will quietly call 
   `Brick.allocate`.

For details see [this](http://blocks.readthedocs.org/en/latest/bricks_overview.html#bricks-life-cycle) tutorial.

In [8]:
mlp.children[0].weights_init

<blocks.initialization.IsotropicGaussian at 0x7f37f4d08a50>

##Graph filtering and modifications
Using brick annotations one can easily extract variables from the computation graph

In [9]:
from blocks.graph import ComputationGraph
from blocks.filter import VariableFilter
from blocks.roles import WEIGHT

cg = ComputationGraph([cost])
W1, W2 = VariableFilter(roles=[WEIGHT])(cg.variables)
print("W1 brick:", W1.tag.annotations[0].name)
print("W2 brick:", W2.tag.annotations[0].name)

W1 brick: linear_0
W2 brick: linear_1


Now we can apply L2 regularization

In [10]:
cost = cost + .00005 * (W1 ** 2).sum() + .00005 * (W2 ** 2).sum()
cost.name = 'final_cost'

##Main loop

In [11]:
from fuel.datasets.mnist import MNIST
mnist_train = MNIST(("train",))
mnist_test = MNIST(("test",))

IOError: mnist.hdf5 not found in Fuel's data path

In [68]:
from blocks.main_loop import MainLoop
main_loop = MainLoop(
        algorithm,
        Flatten(
            DataStream.default_stream(
                mnist_train,
                iteration_scheme=SequentialScheme(
                    mnist_train.num_examples, 50)),
            which_sources=('features',)),
        model=Model(cost),
        extensions=[inishAfter(after_n_batches=5)])

main_loop.run()

NameError: name 'algorithm' is not defined

###Algorithms

In [71]:
from blocks.algorithms import GradientDescent, Scale
algorithm = GradientDescent(
    cost=cost, parameters=cg.parameters,
    step_rule=Scale(learning_rate=0.1))

TypeError: copy() got an unexpected keyword argument 'name'

###Logging

###Monitoring