### CapsNet

Implement CapsNet : [Dynamic routing between capsules][1]<br></br>
Implement CapsNet source code : [here][2]<br></br>

CapsNet 을 설명하기 위한 코드입니다.

[1]:https://arxiv.org/abs/1710.09829
[2]:https://github.com/InnerPeace-Wu/CapsNet-tensorflow

In [19]:
# 데이터 로드
import tensorflow as tf
import numpy as np
import time
from tensorflow.contrib import slim
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./data/mnist/", one_hot=True)

Extracting ./data/mnist/train-images-idx3-ubyte.gz
Extracting ./data/mnist/train-labels-idx1-ubyte.gz
Extracting ./data/mnist/t10k-images-idx3-ubyte.gz
Extracting ./data/mnist/t10k-labels-idx1-ubyte.gz


![image](./image/model.png)

우리는 여기서 첫번째 layer 를 구현해야 합니다. 초기 layer 는 일반적인 CNN 과 같이 ReLu 를 거친 256 개의 featuremap 을 형성할 수 있도록 만듭니다.<br></br>
논문에서는 filter size = 9 , stride = 1 을 주었습니다. padding 은 VALID 를 이용해야 다음 구조와 같은 shape이 나옵니다.

In [3]:
# 파라메터 설정 
batch_size = 48
x = tf.placeholder(tf.float32,[batch_size,784])
x_img = tf.reshape(x, [batch_size, 28,28,1])
y = tf.placeholder(tf.float32,[batch_size,10])
w_initializer = tf.truncated_normal_initializer(stddev=0.1)
b_initializer = tf.zeros_initializer()

In [4]:
# 컨볼루션 레이어 만들기
with tf.name_scope('first_layer'):
    conv_weight = tf.get_variable('cnn_weight',shape=[9,9,1,256])
    conv_layer = tf.nn.conv2d(x_img,conv_weight,strides=[1,1,1,1],padding="VALID")
    conv_layer = tf.nn.relu(conv_layer)
print('첫번째 레이어의 shape 은 ',np.shape(conv_layer),"입니다.")

첫번째 레이어의 shape 은  (48, 20, 20, 256) 입니다.



#### 다음 레이어로 넘어가기 이전에 Capsule layer 의 활성화 함수인 Squash 를 함수 형태로 정의합니다.
![squashing](./image/squashing.png)

In [5]:
# squashing 함수. s_j 의 크기가 크면 1로 수렴하려 하고, s_j 의 크기가 작다면 0으로 수렴 
def squash(cap_input):
    with tf.name_scope('squash'):
        input_norm_square = tf.reduce_sum(tf.square(cap_input), axis=-1, keep_dims=True)
        scale = input_norm_square / (1. + input_norm_square) / tf.sqrt(input_norm_square)

    return cap_input * scale

In [6]:
n_capsule_i = 32
previous_channel = 256
length_of_ui = 8
filter_size = 9
strides = 2

caps = []
for i in range(n_capsule_i):
    with tf.variable_scope('capsule_' + str(i)):
        primary_weights = tf.get_variable('primary_weights', shape=[filter_size, filter_size, previous_channel, length_of_ui], dtype=tf.float32)
        primary_bias = tf.get_variable('primary_bias', shape=[length_of_ui, ], dtype=tf.float32,initializer=b_initializer)
        capsule_i = tf.nn.conv2d(conv_layer, primary_weights, [1, strides, strides, 1], padding='VALID', name='capsule_conv')

        capsule_i = capsule_i + primary_bias
        capsule_i = squash(capsule_i)
        capsule_i = tf.expand_dims(capsule_i, axis=1)
    
    caps.append(capsule_i)
    primary_capsule = tf.concat(caps, axis=1)

In [7]:
print("PrimaryCaps 의 shape은 %s 입니다."%(np.shape(primary_capsule)))
with tf.name_scope('primary_cap_reshape'):
    # reshape and expand dims for broadcasting in dynamic routing
    primary_capsule = tf.reshape(primary_capsule, shape=[batch_size, 32*6*6, 1, 8, 1])
    
# matrix 연산을 위해    
print('reshape 된 PrimaryCaps의 shape 은 %s 입니다.'%(np.shape(primary_capsule)))

PrimaryCaps 의 shape은 (48, 32, 6, 6, 8) 입니다.
reshape 된 PrimaryCaps의 shape 은 (48, 1152, 1, 8, 1) 입니다.


### Routing Algorithm

![routing](./image/routing.png)

In [8]:
# 초기 b_IJ 는 zero.
n_capsule_j = 10
len_capsule_j = 16

n_previous_cap = 32 * 6 * 6
len_previous_cap = 8

routing_iteration = 3

with tf.variable_scope('routing'):
    # b_IJ: [1, num_caps_l, num_caps_l_plus_1, 1, 1] 초기값은 0.
    b_IJ = tf.constant(np.zeros([1, n_previous_cap, n_capsule_j, 1, 1], dtype=np.float32))
    
    W = tf.get_variable('DigitCap_weight', shape=(1, n_previous_cap, n_capsule_j, 
                                                  len_previous_cap, len_capsule_j), dtype=tf.float32, initializer= w_initializer)
    W = tf.tile(W, [batch_size, 1, 1, 1, 1])
    
    primary_capsule = tf.tile(primary_capsule, [1, 1, n_capsule_j, 1, 1])
    
    # u_hat(predict vector) 구하기. u_hat 은 coupling coefficient 와 previous layer의 아웃풋 u_i 의 곱 
    u_hat = tf.matmul(W, primary_capsule, transpose_a=True)
    
    # 3번째. 수식 반복 논문에서 r = 3
    for r_iter in range(routing_iteration):
        with tf.variable_scope('iter_' + str(r_iter)):
            # 4번째. routing logit 의 softmax 값. c = coupling coefficient 상위 몇번째 캡슐로 향할지에 대한 확률? 
            # => [1, 1152, 10, 1, 1]
            c_IJ = tf.nn.softmax(b_IJ, dim=2)
            c_IJ = tf.tile(c_IJ, [batch_size, 1, 1, 1, 1])

            # 5번째. s_j current layer 의 j번째 캡슐의 인풋. s_j 는 벡터값.
            # => [batch_size, 1152, 10, 16, 1]
            s_J = tf.multiply(c_IJ, u_hat)
            # sum 
            # => [batch_size, 1, 10, 16, 1]
            s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)

            # 6번째. squashing : capsule layer 의 마지막에 relu 의 용도로 이용 v_J 는 벡터값.
            # squash using Eq.1,
            v_J = squash(s_J)

            # 7번째. b_IJ 를 Agreement 의 값과 더함. 유사도가 높을수록 즉, 내적값이 클수록 b_IJ 값은 커지고 그말은
            # i 번째 previous capsule 에서 j 번째 current capsule 로 갈 확률이 높다는 것.
            v_J_tiled = tf.tile(v_J, [1, n_previous_cap, 1, 1, 1])
            u_produce_v = tf.matmul(u_hat, v_J_tiled, transpose_a=True)
            b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
            # squeeze 는 열의 차원이 1차원인 열을 제거
digitcaps = tf.squeeze(v_J, axis=1)
print('DigitCaps 의 shape 은 %s 입니다.'%(np.shape(digitcaps)))

DigitCaps 의 shape 은 (48, 10, 16, 1) 입니다.


### L2 norm 
이제 DigitCaps 로부터 L2 norm 을 구해 predict 를 뽑아냅니다.

In [9]:
digit_caps_norm = tf.norm(digitcaps, ord=2, axis=2, name='digit_caps_norm')
digit_caps_norm = tf.reshape(digit_caps_norm,[batch_size,n_capsule_j])
print('DigitCaps 의 L2 norm shape 은 %s 입니다.'%(np.shape(digit_caps_norm)))

DigitCaps 의 L2 norm shape 은 (48, 10) 입니다.


### Reconstruction
![reconstruction](./image/reconstruction.png)

target capsule 은 input 이미지가 만약 3이라면 DigitCaps 의 3번째 row 를 꺼내 fully connected 를 진행합니다. <br></br>
Decoder 의 역할을 하는 Reconstruction 을 통해 산출된 output 의 shape 은 image 의 크기와 같으며 추가적으로 최종적인 loss 에 input image 의 값과 ouput 값의 차를 줄이는 방식으로 학습합니다. 

In [14]:
def reconstruct(target_capsule, w_initializer):

    with tf.name_scope('reconstruct'):
        fc = tf.contrib.layers.fully_connected (target_capsule, 512,
                                  weights_initializer=w_initializer, activation_fn = tf.nn.relu)
        fc = tf.contrib.layers.fully_connected(fc, 1024,
                                  weights_initializer=w_initializer, activation_fn = tf.nn.relu)
        fc = tf.contrib.layers.fully_connected(fc, 784,
                                  weights_initializer=w_initializer,
                                  activation_fn=None)
        
        reconstruct = tf.sigmoid(fc)
    return reconstruct

### Margin loss
![loss](./image/loss.png)

논문에서는 reconstruction loss 를 추가했습니다.

In [15]:
RECONSTRUCT_W = 0.0005
M_POS = 0.9
M_NEG = 0.1
LAMBDA = 0.5
with tf.name_scope('loss'):

    # loss of positive classes
    # max(0, m+ - ||v_c||) ^ 2
    with tf.name_scope('positive_loss'):
        pos_loss = tf.maximum(0., M_POS - tf.reduce_sum(digit_caps_norm * y, axis=1), name='pos_max')
        pos_loss = tf.square(pos_loss, name='pos_square')
        pos_loss = tf.reduce_mean(pos_loss)

    # get index of negative classes
    y_negs = 1. - y
    # max(0, ||v_c|| - m-) ^ 2
    with tf.name_scope('negative_loss'):
        neg_loss = tf.maximum(0., digit_caps_norm * y_negs - M_NEG)
        neg_loss = tf.reduce_sum(tf.square(neg_loss), axis=-1) * LAMBDA
        neg_loss = tf.reduce_mean(neg_loss)
    # neg_loss shape: [None, ]

    y_ = tf.expand_dims(y, axis=2)
    # y_ shape: [None, 10, 1]
    digitcaps = tf.reshape(digitcaps,[batch_size, n_capsule_j, len_capsule_j])
    target_cap = y_ * digitcaps
    
    # target_cap shape: [None, 10, 16]
    target_cap = tf.reduce_sum(target_cap, axis=1)
    # target_cap: [None, 16]

    reconstruct = reconstruct(target_cap,w_initializer)
    
    # loss of reconstruction
    with tf.name_scope('reconstruct_loss'):
        reconstruct_loss = tf.reduce_sum(tf.square(x - reconstruct), axis=-1)
        reconstruct_loss = tf.reduce_mean(reconstruct_loss)


    total_loss = pos_loss + neg_loss + RECONSTRUCT_W * reconstruct_loss

### Training

In [16]:
initial_learningrate = 0.001
STEP_SIZE = 1000
DECAY_RATIO = 0.7

global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(initial_learningrate, global_step,
                                           STEP_SIZE, DECAY_RATIO,
                                           staircase=True)

optimizer = tf.train.AdamOptimizer(learning_rate)
gradidents = optimizer.compute_gradients(total_loss)
train_op = optimizer.apply_gradients(gradidents,global_step=global_step)

In [17]:
with tf.name_scope('accuracy'):
    predict = tf.argmax(digit_caps_norm, 1)
    correct_prediction = tf.equal(tf.argmax(y, 1),
                                  predict)
    correct_prediction = tf.cast(correct_prediction, tf.float32)
    accuracy = tf.reduce_mean(correct_prediction)        

In [20]:
max_iteration = 10000
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())    

    start = time.time()
    for iters in range(max_iteration):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        test_xs, test_ys = mnist.test.next_batch(batch_size)
        feed_dict = {x: batch_xs, y: batch_ys}
        _loss = sess.run(total_loss, feed_dict)
        _op = sess.run(train_op, feed_dict)
        if iters % 100 == 0 and iters > 0:
            train_acc = sess.run(accuracy, feed_dict)
            test_acc = sess.run(accuracy, feed_dict={x:test_xs,y:test_ys})
            print("loss : %.4f " % (_loss),'train accuracy: %.4f' % train_acc)
            print('test accuracy: %.4f' % test_acc)
            finish = time.time()
            print('average time: %.2f secs' % (finish - start))
print('learning finish')

loss : 0.8376  train accuracy: 0.1458
test accuracy: 0.2500
average time: 47.68 secs


KeyboardInterrupt: 