# 第8回講義 宿題

## 課題. Theanoを用いて, CIFAR-10を畳み込みニューラルネットワーク(CNN)で学習せよ

### 注意

- homework関数を完成させて提出してください
    - 訓練データはtrain_X, train_y, テストデータはtest_Xで与えられます
    - train_Xとtrain_yをtrain_X, train_yとvalid_X, valid_yに分けるなどしてモデルを学習させてください
    - test_Xに対して予想ラベルpred_yを作り, homework関数の戻り値としてください\
- pred_yのtest_yに対する精度(F値)で評価します
- 全体の実行時間がiLect上で60分を超えないようにしてください
- homework関数の外には何も書かないでください

次のような内容のコードが**事前**に実行されます

```python
from collections import OrderedDict
from sklearn.utils import shuffle
from sklearn.metrics import f1_score
from sklearn.cross_validation import train_test_split
from theano.tensor.nnet import conv2d
from theano.tensor.signal import pool
from theano.tensor.shared_randomstreams import RandomStreams

import pickle
import numpy as np
import theano
import theano.tensor as T

rng = np.random.RandomState(1234)


def unpickle(file):
    with open(file, 'rb') as f:
        data = pickle.load(f, encoding='latin-1')
    return data

trn = [unpickle('/home/ubuntu/cifar_10/data_batch_%d' % i) for i in range(1, 6)]
cifar_X_1 = np.concatenate([d['data'] for d in trn]).astype('float32')
cifar_y_1 = np.concatenate([d['labels'] for d in trn]).astype('int32')

tst = unpickle('/home/ubuntu/cifar_10/test_batch')
cifar_X_2 = tst['data'].astype('float32')
cifar_y_2 = np.array(tst['labels'], dtype='int32')

cifar_X = np.r_[cifar_X_1, cifar_X_2]
cifar_y = np.r_[cifar_y_1, cifar_y_2]

cifar_X = cifar_X / 255.

train_X, test_X, train_y, test_y = train_test_split(cifar_X, cifar_y,
                                                    test_size=0.2,
                                                    random_state=??)
```

次のセルのhomework関数を完成させて提出してください

- **上記のコード以外で必要なもの**は全て書いてください

In [2]:
def homework(train_X, train_y, test_X):
    import time
    start_time = time.time()
    import gc
    train_y = np.eye(10)[train_y].astype('int32')
    train_X = train_X.reshape((train_X.shape[0], 3, 32, 32))
    test_X = test_X.reshape((test_X.shape[0], 3, 32, 32))
    train_X = np.r_[train_X,train_X[:, :, :, ::-1]]
    train_y = np.r_[train_y, train_y]
    train_X, valid_X, train_y, valid_y = train_test_split(train_X, train_y,
                                                          test_size=0.04,
                                                          random_state=42)
     
    def gcn(a):
        average = np.mean(a, axis=(1, 2, 3), keepdims=True)
        std = np.std(a, axis=(1, 2, 3), keepdims=True)
        return (a - average)/std
    
    class ZCAWhitening:
        def __init__(self, epsilon=1e-4):
            self.epsilon = epsilon
            self.mean = None
            self.ZCA_matrix = None

        def fit(self, x):
            x = x.reshape(x.shape[0], -1)
            #self.mean = np.mean(x, axis=0)
            #x -= self.mean
            cov_matrix = np.dot(x.T, x) / x.shape[0]
            A, d, _ = np.linalg.svd(cov_matrix)
            self.ZCA_matrix = np.dot(np.dot(A,
                                            np.diag(1. / np.sqrt(d + self.epsilon))
                                            ), A.T)

        def transform(self, x):
            shape = x.shape
            x = x.reshape(x.shape[0], -1)
            #x -= self.mean
            x = np.dot(x, self.ZCA_matrix.T)
            return x.reshape(shape)
    
    class BatchNorm:
    # Constructor
        def __init__(self, shape, epsilon=np.float32(1e-5)):
            self.shape = shape
            self.epsilon = epsilon

            self.gamma = theano.shared(np.ones(self.shape, dtype="float32"),
                                       name="gamma")
            self.beta = theano.shared(np.zeros(self.shape, dtype="float32"),
                                      name="beta")
            self.params = [self.gamma, self.beta]

    # Forward Propagation
        def f_prop(self, x):
            if x.ndim == 2:
                mean = T.mean(x, axis=0, keepdims=True)
                std = T.sqrt(T.var(x, axis=0, keepdims=True) + self.epsilon)
            elif x.ndim == 4:
                mean = T.mean(x, axis=(0, 2, 3), keepdims=True)
                std = T.sqrt(T.var(x, axis=(0, 2, 3), keepdims=True) +
                             self.epsilon)

            normalized_x = (x - mean) / std
            self.z = self.gamma * normalized_x + self.beta
            return self.z
        
    class Conv:
    # Constructor
        def __init__(self, filter_shape, function=lambda x: x, border_mode="valid",
                     subsample=(1, 1)):
            self.function = function
            self.border_mode = border_mode
            self.subsample = subsample

            fan_in = np.prod(filter_shape[1:])
            fan_out = (filter_shape[0] * np.prod(filter_shape[2:]))
            self.W = theano.shared(np.random.uniform(
                        low=-np.sqrt(6. / (fan_in + fan_out)),
                        high=np.sqrt(6. / (fan_in + fan_out)),
                        size=filter_shape
                    ).astype("float32"), name="W")
            self.b = theano.shared(np.zeros((filter_shape[0],), dtype="float32"),
                                   name="b")
            self.params = [self.W, self.b]

    # Forward Propagation
        def f_prop(self, x):
            conv_out = conv2d(x, self.W, border_mode=self.border_mode,
                              subsample=self.subsample)
            self.z = self.function(conv_out +
                                   self.b[np.newaxis, :, np.newaxis, np.newaxis])
            return self.z
    
    class Pooling:
    # Constructor
        def __init__(self, pool_size=(2, 2), padding=(0, 0), mode='max'):
            self.pool_size = pool_size
            self.mode = mode
            self.padding = padding
            self.params = []

    # Forward Propagation
        def f_prop(self, x):
            return pool.pool_2d(input=x, ds=self.pool_size, padding=self.padding,
                                mode=self.mode, ignore_border=True)
        
    class Flatten:
    # Constructor
        def __init__(self, outdim=2):
            self.outdim = outdim
            self.params = []

    # Forward Propagation
        def f_prop(self, x):
            return T.flatten(x, self.outdim)
        
    class Layer:
    # Constructor
        def __init__(self, in_dim, out_dim, function, possibility):
            self.in_dim = in_dim
            self.out_dim = out_dim
            self.function = function
            self.possibility = possibility
            self.W = theano.shared(np.random.uniform(
                        low=-np.sqrt(6. / (in_dim + out_dim)),
                        high=np.sqrt(6. / (in_dim + out_dim)),
                        size=(in_dim, out_dim)
                    ).astype("float32"), name="W")

            self.b = theano.shared(np.zeros(out_dim).astype("float32"), name="b")
            self.params = [self.W, self.b]

    # Forward Propagation
        def f_prop(self, x):
            self.z = self.function(T.dot(x, self.W) + self.b)
            return self.z
        
        def get_mask(self):
            a = np.random.rand(self.out_dim) < self.possibility
            return a*np.float32(1.0)
        
    class Activation:
    # Constructor
        def __init__(self, function):
            self.function = function
            self.params = []

    # Forward Propagation
        def f_prop(self, x):
            self.z = self.function(x)
            return self.z
    
    activation = T.nnet.relu
    
    def build_shared_zeros(shape, name):

        return theano.shared(value=np.zeros(shape, dtype=theano.config.floatX), 
                             name=name, borrow=True)
    
    class Adam:
        def __init__(self, params, alpha=0.0005, beta1=0.9, beta2=0.999, eps=1e-8, gamma=1-1e-8):
            self.alpha = alpha
            self.b1 = beta1
            self.b2 = beta2
            self.gamma = gamma
            self.t = theano.shared(np.float32(1))
            self.eps = eps

            self.ms = [build_shared_zeros(t.shape.eval(), 'm') for t in params]
            self.vs = [build_shared_zeros(t.shape.eval(), 'v') for t in params]
        
        def updates(self, g_params, cost):
            self.b1_t = self.b1 * self.gamma ** (self.t - 1)
            self.updates = OrderedDict()
            for m, v, param, g_param in zip(self.ms, self.vs, params, g_params):
                _m = self.b1_t * m + (1 - self.b1_t) * g_param
                _v = self.b2 * v + (1 - self.b2) * g_param ** 2

                m_hat = _m / (1 - self.b1 ** self.t)
                v_hat = _v / (1 - self.b2 ** self.t)

                self.updates[param] = param - self.alpha*m_hat / (T.sqrt(v_hat) + self.eps)
                self.updates[m] = _m
                self.updates[v] = _v
            self.updates[self.t] = self.t + 1.0

            return self.updates
        
    layers = [                                                    # (チャネル数)x(縦の次元数)x(横の次元数)
        Conv((32, 3, 3, 3), border_mode = (1,1)),                 # 3x32x32  ->  32x32x32 No.0
        BatchNorm((32, 32, 32)),
        Activation(activation),
        Conv((64, 32, 3, 3), border_mode = (1,1)),                # 32x32x32 ->  64x32x32 No.3
        BatchNorm((64, 32, 32)),
        Activation(activation),
        Pooling((2, 2)),                                          # 64x32x32 ->  64x16x16 No.6
        BatchNorm((64, 16, 16)),
        Conv((128, 64, 3, 3), border_mode = (1,1)),               # 64x16x16   -> 128x16x16　No.8
        BatchNorm((128,16,16)),
        Activation(activation),
        Conv((128,128, 3, 3), border_mode = (1,1)),               # 128x16x16 -> 128x16x16 No.11
        BatchNorm((128,16,16)),
        Activation(activation),
        Conv((128,128, 3, 3), border_mode = (1,1)),               # 128x16x16 -> 128x16x16 No.14
        BatchNorm((128,16,16)),
        Activation(activation),
        Pooling((2, 2)),                                          # 128x16x16 -> 128x8x8 No.17
        BatchNorm((128, 8, 8)),
        Activation(activation),
        Conv((128, 128, 3, 3), border_mode = (1,1)),              # 128x8x8  ->  128x8x8 No.20
        BatchNorm((128, 8, 8)),
        Activation(activation),
        Conv((128, 128, 3, 3), border_mode = (1,1)),              # 128x8x8  ->  128x8x8 No.23
        BatchNorm((128, 8, 8)),
        Activation(activation),
        Conv((128, 128, 3, 3), border_mode = (1,1)),              # 128x8x8 ->  128x8x8 No.26
        BatchNorm((128, 8, 8)),
        Activation(activation),
        Pooling((2, 2)),                                          # 128x8x8  -> 128x4x4 No.29
        BatchNorm((128, 4, 4)),
        Activation(activation),
        Conv((128, 128, 3, 3), border_mode = (1,1)),              # 128x4x4  ->  128x4x4 No.32
        BatchNorm((128, 4, 4)),
        Activation(activation),
        Conv((128, 128, 3, 3), border_mode = (1,1)),              # 128x4x4 ->  128x4x4 No.35
        BatchNorm((128, 4, 4)),
        Activation(activation),
        Pooling((2, 2)),                                          # 128x4x4  -> 128x2x2 No.38
        Flatten(2),
        Layer(128*2*2, 256, activation, 0.75),
        Layer(256, 10, T.nnet.softmax, 1.00)
    ]
    
    x = T.ftensor4('x')
    t = T.imatrix('t')

    params = []
    layer_out = x
    for (i, layer) in enumerate(layers):
        params += layer.params
        if i < 40:
            layer_out = layer.f_prop(layer_out)
        else:
            layer.mask = layer.get_mask()
            layer_out = layer.f_prop(layer_out)*layer.mask
    y = layers[-1].z

    cost = T.mean(T.nnet.categorical_crossentropy(y, t))

    g_params = T.grad(cost, params)
    updates = Adam(params).updates(g_params, cost)
    
    train = theano.function(inputs=[x, t], outputs=cost, updates=updates,
                            allow_input_downcast=True, name='train')
    valid = theano.function(inputs=[x, t], outputs=[cost, T.argmax(y, axis=1)],
                            allow_input_downcast=True, name='valid')
    test = theano.function(inputs=[x], outputs=T.argmax(y, axis=1), allow_input_downcast=True,
                           name='test')
    
    zca = ZCAWhitening()
    zca.fit(gcn(train_X))
    zca_train_X = zca.transform(gcn(train_X))
    zca_train_y = train_y[:]
    zca_valid_X = zca.transform(gcn(valid_X))
    zca_valid_y = valid_y[:]
    zca.fit(gcn(test_X))
    zca_test_X = zca.transform(gcn(test_X))
    
    del train_X, train_y
    
    batch_size = 100
    n_batches = zca_train_X.shape[0]//batch_size
    epoch = 1
    while time.time() - start_time < 45*60 and epoch < 20:
        zca_train_X, zca_train_y = shuffle(zca_train_X, zca_train_y)
        for i in range(n_batches):
            start = i*batch_size
            end = start + batch_size
            cost = train(zca_train_X[start:end], zca_train_y[start:end])
        print('Training cost: %.3f' % cost)
        valid_cost, pred_y = valid(zca_valid_X, zca_valid_y)
        print('EPOCH:: %i, Validation cost: %.3f, Validation F1: %.3f' %
                (epoch , valid_cost,
                f1_score(np.argmax(zca_valid_y, axis=1).astype('int32'),
                        pred_y, average='macro')))
        
        print('Running time is %i minutes' % int((time.time()- start_time)/60))
        epoch += 1
        
    del zca, zca_valid_X, zca_train_X, zca_valid_y, zca_train_y
    gc.collect()
    batch_size = 100
    n_batches = test_X.shape[0]//batch_size
    pred_y = []
    for i in range(n_batches):
        start = i*batch_size
        end = start + batch_size
        pred_y = np.r_[pred_y, test(zca_test_X[start:end])]
        
    return pred_y

In [2]:
from collections import OrderedDict
from sklearn.utils import shuffle
from sklearn.metrics import f1_score
from sklearn.cross_validation import train_test_split
from theano.tensor.nnet import conv2d
from theano.tensor.signal import pool
from theano.tensor.shared_randomstreams import RandomStreams

import pickle
import numpy as np
import theano
import theano.tensor as T

rng = np.random.RandomState(1234)


def unpickle(file):
    with open(file, 'rb') as f:
        data = pickle.load(f, encoding='latin-1')
    return data


def load_cifar():
    trn = [unpickle('/home/ubuntu/cifar_10/data_batch_%d' % i) for i in range(1, 6)]
    cifar_X_1 = np.concatenate([d['data'] for d in trn]).astype('float32')
    cifar_y_1 = np.concatenate([d['labels'] for d in trn]).astype('int32')

    tst = unpickle('/home/ubuntu/cifar_10/test_batch')
    cifar_X_2 = tst['data'].astype('float32')
    cifar_y_2 = np.array(tst['labels'], dtype='int32')

    cifar_X = np.r_[cifar_X_1, cifar_X_2]
    cifar_y = np.r_[cifar_y_1, cifar_y_2]

    cifar_X = cifar_X / 255.

    train_X, test_X, train_y, test_y = train_test_split(cifar_X, cifar_y,
                                                        test_size=0.2,
                                                        random_state=42)

    return (train_X, test_X, train_y, test_y)


def check_homework():
    train_X, test_X, train_y, test_y = load_cifar()

    # validate for small dataset
    train_X_mini = train_X[:1000]
    train_y_mini = train_y[:1000]
    test_X_mini = test_X[:1000]
    test_y_mini = test_y[:1000]

    pred_y = homework(train_X_mini, train_y_mini, test_X_mini)
    return f1_score(test_y_mini, pred_y, average='macro')

if 'homework' in globals():
    result = check_homework()

    print("No Error Occured!")

ERROR (theano.sandbox.cuda): ERROR: Not using GPU. Initialisation of device gpu failed:
initCnmem: cnmemInit call failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY. numdev=1



RuntimeError: Cuda error: kernel_reduce_ccontig_node_meb404c8cd39208f6884dd773b584b7d7_0: out of memory. (grid: 1 x 1; block: 256 x 1 x 1)

Apply node that caused the error: GpuCAReduce{add}{1}(<CudaNdarrayType(float32, vector)>)
Toposort index: 0
Inputs types: [CudaNdarrayType(float32, vector)]
Inputs shapes: [(10000,)]
Inputs strides: [(1,)]
Inputs values: ['not shown']
Outputs clients: [[HostFromGpu(GpuCAReduce{add}{1}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

In [1]:
from collections import OrderedDict
from sklearn.utils import shuffle
from sklearn.metrics import f1_score
from sklearn.cross_validation import train_test_split
from theano.tensor.nnet import conv2d
from theano.tensor.signal import pool
#from theano.tensor.shared_randomstreams import RandomStreams

import pickle
import numpy as np
import theano
import theano.tensor as T

#rng = np.random.RandomState(1234)


def unpickle(file):
    with open(file, 'rb') as f:
        data = pickle.load(f, encoding='latin-1')
    return data

trn = [unpickle('/home/ubuntu/cifar_10/data_batch_%d' % i) for i in range(1, 6)]
cifar_X_1 = np.concatenate([d['data'] for d in trn]).astype('float32')
cifar_y_1 = np.concatenate([d['labels'] for d in trn]).astype('int32')

tst = unpickle('/home/ubuntu/cifar_10/test_batch')
cifar_X_2 = tst['data'].astype('float32')
cifar_y_2 = np.array(tst['labels'], dtype='int32')

cifar_X = np.r_[cifar_X_1, cifar_X_2]
cifar_y = np.r_[cifar_y_1, cifar_y_2]

cifar_X = cifar_X / 255.

train_X, test_X, train_y, test_y = train_test_split(cifar_X, cifar_y,
                                                    test_size=0.2,
                                                    random_state=123)

Using gpu device 0: GRID K520 (CNMeM is enabled with initial size: 95.0% of memory, cuDNN 4007)


In [6]:
pred_y = homework(train_X, train_y, test_X)
print(sum(pred_y==test_y)/len(test_y),pred_y,test_y)

Hello,world!


INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock


Training cost: 0.745
EPOCH:: 1, Validation cost: 0.667, Validation F1: 0.768
Running time is 5 minutes
Training cost: 0.590
EPOCH:: 2, Validation cost: 0.522, Validation F1: 0.815
Running time is 9 minutes
Training cost: 0.520
EPOCH:: 3, Validation cost: 0.464, Validation F1: 0.841
Running time is 13 minutes
Training cost: 0.540
EPOCH:: 4, Validation cost: 0.407, Validation F1: 0.861
Running time is 17 minutes
Training cost: 0.167
EPOCH:: 5, Validation cost: 0.395, Validation F1: 0.872
Running time is 21 minutes
Training cost: 0.200
EPOCH:: 6, Validation cost: 0.404, Validation F1: 0.876
Running time is 24 minutes
Training cost: 0.148
EPOCH:: 7, Validation cost: 0.450, Validation F1: 0.873
Running time is 28 minutes
Training cost: 0.180
EPOCH:: 8, Validation cost: 0.449, Validation F1: 0.879
Running time is 32 minutes
Training cost: 0.117
EPOCH:: 9, Validation cost: 0.448, Validation F1: 0.885
Running time is 36 minutes
Training cost: 0.100
EPOCH:: 10, Validation cost: 0.471, Validatio

In [23]:
print(sum(pred_y==test_y)/len(test_y))

0.736666666667


In [None]:
pred_y = homework(train_X, train_y, test_X)
print(sum(test_y==pred_y)/len(test_y))

INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock


Training cost: 0.801
EPOCH:: 1, Validation cost: 0.658, Validation F1: 0.765
Running time is 4 minutes
Training cost: 0.617
EPOCH:: 2, Validation cost: 0.513, Validation F1: 0.822
Running time is 7 minutes


In [32]:
pred_y = homework(train_X, train_y, test_X)
print(sum(pred_y==test_y)/len(test_y),pred_y,test_y)

Training cost: 0.642
EPOCH:: 1, Validation cost: 0.652, Validation F1: 0.766
Running time is 4 minutes
Training cost: 0.485
EPOCH:: 2, Validation cost: 0.521, Validation F1: 0.819
Running time is 6 minutes
Training cost: 0.572
EPOCH:: 3, Validation cost: 0.486, Validation F1: 0.833
Running time is 9 minutes
Training cost: 0.364
EPOCH:: 4, Validation cost: 0.461, Validation F1: 0.845
Running time is 12 minutes
Training cost: 0.229
EPOCH:: 5, Validation cost: 0.470, Validation F1: 0.852
Running time is 14 minutes
Training cost: 0.162
EPOCH:: 6, Validation cost: 0.487, Validation F1: 0.853
Running time is 17 minutes
Training cost: 0.161
EPOCH:: 7, Validation cost: 0.566, Validation F1: 0.846
Running time is 20 minutes
Training cost: 0.199
EPOCH:: 8, Validation cost: 0.545, Validation F1: 0.856
Running time is 23 minutes
Training cost: 0.035
EPOCH:: 9, Validation cost: 0.538, Validation F1: 0.861
Running time is 25 minutes
Training cost: 0.121
EPOCH:: 10, Validation cost: 0.576, Validation

In [33]:
print(sum(pred_y==test_y)/len(test_y),pred_y,test_y)

0.848833333333 [ 3.  9.  6. ...,  0.  7.  9.] [5 9 6 ..., 0 7 9]


In [5]:
pred_y = homework(train_X, train_y, test_X)
print(sum(test_y==pred_y)/len(test_y))

Hello,world!


INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock
INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock
INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock


Training cost: 0.813
EPOCH:: 1, Validation cost: 0.641, Validation F1: 0.780
Running time is 8 minutes
Training cost: 0.458
EPOCH:: 2, Validation cost: 0.502, Validation F1: 0.829
Running time is 11 minutes
Training cost: 0.331
EPOCH:: 3, Validation cost: 0.441, Validation F1: 0.848
Running time is 15 minutes
Training cost: 0.204
EPOCH:: 4, Validation cost: 0.429, Validation F1: 0.860
Running time is 19 minutes
Training cost: 0.285
EPOCH:: 5, Validation cost: 0.403, Validation F1: 0.872
Running time is 23 minutes
Training cost: 0.150
EPOCH:: 6, Validation cost: 0.411, Validation F1: 0.878
Running time is 27 minutes
Training cost: 0.149
EPOCH:: 7, Validation cost: 0.471, Validation F1: 0.860
Running time is 31 minutes
Training cost: 0.129
EPOCH:: 8, Validation cost: 0.435, Validation F1: 0.877
Running time is 34 minutes
Training cost: 0.096
EPOCH:: 9, Validation cost: 0.466, Validation F1: 0.876
Running time is 38 minutes
Training cost: 0.093
EPOCH:: 10, Validation cost: 0.501, Validati

In [4]:
pred_y = homework(train_X, train_y, test_X)
print(sum(pred_y == test_y)/len(test_y))

Hello,world!


INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock
INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock
INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock
INFO (theano.gof.compilelock): Refreshing lock /home/ubuntu/.theano/compiledir_Linux-3.13--generic-x86_64-with-debian-jessie-sid-x86_64-3.5.2-64/lock_dir/lock


Training cost: 0.651
EPOCH:: 1, Validation cost: 0.632, Validation F1: 0.787
Running time is 10 minutes
Training cost: 0.485
EPOCH:: 2, Validation cost: 0.500, Validation F1: 0.823
Running time is 15 minutes
Training cost: 0.346
EPOCH:: 3, Validation cost: 0.415, Validation F1: 0.861
Running time is 20 minutes
Training cost: 0.409
EPOCH:: 4, Validation cost: 0.406, Validation F1: 0.865
Running time is 26 minutes
Training cost: 0.153
EPOCH:: 5, Validation cost: 0.384, Validation F1: 0.875
Running time is 31 minutes
Training cost: 0.152
EPOCH:: 6, Validation cost: 0.391, Validation F1: 0.877
Running time is 36 minutes
Training cost: 0.122
EPOCH:: 7, Validation cost: 0.413, Validation F1: 0.880
Running time is 41 minutes
Training cost: 0.061
EPOCH:: 8, Validation cost: 0.457, Validation F1: 0.876
Running time is 46 minutes
0.85525


In [19]:
train_y[0:100].shape

(100,)