## [Chainer Deep Learning Framework](http://chainer.org/)

In [1]:
import numpy as np
from chainer import Variable, FunctionSet
from chainer import functions, optimizers

In [50]:
import cPickle
(train_X, train_y), (valid_X, valid_y), (test_X, test_y) = cPickle.load(open("../data/mnist.pkl"))
print train_X.shape, train_y.shape, valid_X.shape, valid_y.shape, test_X.shape, test_y.shape

(50000, 784) (50000,) (10000, 784) (10000,) (10000, 784) (10000,)


### Fundamentals 
- Much like in theano, the "minions" in chainer are `Variable`s, which are wrappers of numpy.ndarray (so far only float32 supported due to cuda limit).
- forward/backward computation of `Variable`s
    - forward computation results can be retrived from `data` memeber of `Variable`
    - the variables record both its data and its "computation network"
    - backword computation happends by calling `backward()` on a variable, and the result is in `grad` member
- parameterized functions
    - ***Most functions in chainer accept mini-batch input, which are matrices of shape (N, d), where N is the batchs ize, and d is the input dimension of input vectors***
    - most of them are defined in `functions` module, and can be extended by inheritating `Function` class in chainer
    - it provides a way to calulate the gradient w.r.t to parameters (instead of just inputs)
    - the parameters in those functions are fixed by names, e.g., `f.W` or `f.b`, and their gradients are `f.gW` and `f.gb`
    - steps of calculating parameter gradients: see code below for details
- `FunctionSet` as neural networks - it is essentially a set of functions, which wraps up all parameters and their gradients in an interface that can be used with an optimzier. As a *benefit*, the parameters of the model can be automatically updated within one call.

In [12]:
## forward and backward caclulation of variables (including vectors)


x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype = np.float32))
y = x**2 + 2*x + 1
print "forward computation of y"
print y.data
## ITS NECESSARY TO INITIALIZZE the OUTPUT graident for vector data
y.grad = np.ones((2, 3), dtype = np.float32)
y.backward()
print "gradient of x w.r.t y"
print x.grad

forward computation of y
[[  4.   9.  16.]
 [ 25.  36.  49.]]
gradient of x w.r.t y
[[  4.   6.   8.]
 [ 10.  12.  14.]]


In [20]:
## parameterized functions - forward and backward


f = functions.Linear(3, 2) ## inputsize = 3, outputsize = 2
## parameters W, b are initalized in specific way
print "initialized parameters"
print f.W
print f.b
## forward
y = f(x)
print y.data
## backward, w.r.t parameters
y.grad = np.ones(y.data.shape)
f.gW.fill(0)
f.gb.fill(0)
y.backward()
print f.gW
print f.gb

initialized parameters
[[-0.35998002  0.8607012   0.24826239]
 [ 0.10942104  0.25323802  0.14954998]]
[ 0.  0.]
[[ 2.10620952  1.06454706]
 [ 4.35316038  2.60117412]]
[[ 5.  7.  9.]
 [ 5.  7.  9.]]
[ 2.  2.]


In [30]:
## set of functions - wrapping parameters in a unified interface with optimizers

model = FunctionSet(
    l1 = functions.Linear(4, 3),
    l2 = functions.Linear(3, 2)
)
## layers starting from l1, ...
model.l3 = functions.Linear(2, 2)
## design matrix representing minibatch data
x = Variable(np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype = np.float32))
## forward calculation, layer by layer
h1 = model.l1(x)
h2 = model.l2(h1)
y = model.l3(h2)
print y.data

[[ 0.22214875  1.1364491 ]
 [-0.16544521  3.53551483]]


In [44]:
## model working with optimizers

## connect with parameters
optimizer = optimizers.SGD()
optimizer.setup(model.collect_parameters())
## zeroize every gradients via optimizer now
optimizer.zero_grads()

### MLP 

mlp with three hidden layers by ReLU activations, working on mnist classification

- same logic as with theano - wrapper objects (minions) around numpy/cuda array, which supports backpropagation via dependency network; as well as a set of functions that can be applied to those objects
- richer support for build-in functions
- a model is a chain of parameterized functions, and everythign, including inputs, outputs and parameters are chainer variables. optimizers decide the way of using those gradients.

In [58]:
from sklearn import utils

In [69]:
## 1. define the arthitecuter of model
model = FunctionSet(
    l1 = functions.Linear(784, 100) # 784 input, 100 hidden
    , l2 = functions.Linear(100, 100) # another layer of 100 hidden
    , l3 = functions.Linear(100, 10) # 10 output
)


## 2. you need to do the forward calculation manually, as a price of being flexible
## Note activation is not part of model in chainer, as they dont have any params
def forward(model, x_data, y_data):
    """
    x_data, y_data: numpy array (or cuda array), design matrix format
    """
    x = Variable(x_data)
    t = Variable(y_data)
    h1 = functions.leaky_relu(model.l1(x)) # no way of iterating all layers??
    h2 = functions.leaky_relu(model.l2(h1))
    y = model.l3(h2)
    cost = functions.softmax_cross_entropy(y, t)
    return cost, functions.accuracy(y, t)

## 3. set an optimizer
optimizer = optimizers.SGD()
optimizer.setup(model.collect_parameters())

## 4. learning loop with (1) forward cal, (2) backward cal, and (3) optimizer's update
batch_size = 100
for epoch in xrange(40):
    index = utils.shuffle(xrange(train_X.shape[0]))
    for b in xrange(0, train_X.shape[0], batch_size):
        batchx, batchy = train_X[b:b+batch_size, :], train_y[b:b+batch_size]
        ## forward calculation
        cost, acc = forward(model, batchx, batchy)
        ## backward calculation
        optimizer.zero_grads() ## preventing accumulating
        cost.backward() 
        ## parameter updates
        optimizer.update()
    if (epoch % 5 == 0):
        print 'epoch', epoch, 
        _, train_acc = forward(model, train_X, train_y)
        _, valid_acc = forward(model, valid_X, valid_y)
        print "train accuracy %g, validation accuracy %g" % (train_acc.data, valid_acc.data)
    
## prediction and test on new data
_, test_acc = forward(model, test_X, test_y)
print "accuracy on test data", test_acc.data

epoch 0 train accuracy 0.83508, validation accuracy 0.8555
epoch 5 train accuracy 0.9097, validation accuracy 0.9152
epoch 10 train accuracy 0.92524, validation accuracy 0.9269
epoch 15 train accuracy 0.93502, validation accuracy 0.936
epoch 20 train accuracy 0.94252, validation accuracy 0.9445
epoch 25 train accuracy 0.94846, validation accuracy 0.9498
epoch 30 train accuracy 0.95416, validation accuracy 0.9552
epoch 35 train accuracy 0.959, validation accuracy 0.9573
accuracy on test data 0.9589


### Recurrent NN