# Chap.5 Back Propagation

## 計算グラフ

「計算を左から右へ進める」というステップを順方向の伝播，略して**順伝播**という．その逆が**逆伝播**という

なぜ逆伝播が使えるのか．それは計算グラフにおける「微分」を順伝播と逆伝播で効率よく計算することができるから(だと)．

## 連鎖律

### 連鎖律とは

$$ z = t^2 $$
$$ t = x + y  ...(1)$$

ある関数が合成関数で表すことができるとき，それぞれの微分の積によって表すことができる．

微分以外の数字で例をだすと以下のようになる．
$$ \frac{a}{c} = \frac{b}{a} \cdot \frac{c}{b} $$

これを微分でも同じようにすることができるというのが合成関数の微分についての性質になる．   
つまり，
$$ \frac{\delta_z}{\delta_x} = \frac{\delta_z}{\delta_t} \cdot \frac{\delta_t}{\delta_x} $$

### 連鎖律と計算グラフ

<img src="./fig/2.jpeg" width=400 align="center">

## 逆伝播

逆伝播の際には，上流から伝わった微分を乗算して下流に伝えていく．この逆伝播は最終的に$L$という値を出力する大きな計算グラフを想定している．

加算の逆伝播は入力信号が変化しないまま次のノードへ出力をする．

<img src="./fig/3.jpeg"  width=400 align="center">

乗算の逆伝播は，順伝播の際の入力信号をひっくり返した値を乗算して下流へ流す．

<img src="./fig/4.jpeg"  width=400 align="center">

## 単純なレイヤの実装

### 乗算レイヤ

In [12]:
class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None
        
    def forward(self, x, y):
        self.x = x
        self.y = y
        out = x * y
        
        return out
    
    def backward(self, dout): # 逆伝播
        dx = dout * self.y # 乗算レイヤなのでxとyをひっくり返す
        dy = dout * self.x 
        
        return dx, dy

In [13]:
apple = 100
apple_num = 2
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price = mul_tax_layer.forward(apple_price, tax)

In [14]:
print(price)

220.00000000000003


In [15]:
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

In [16]:
print(dapple, dapple_num, dtax)

2.2 110.00000000000001 200


### 加算レイヤ

In [17]:
class AddLayer:
    def forward(self, x, y):
        out = x + y
        return out
    
    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        return dx, dy

In [26]:
import math

apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)
orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)
price = mul_tax_layer.forward(all_price, tax)  # (4)

# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)  # (4)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)  # (3)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)  # (2)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # (1)

print("price:", int(price))
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dOrange:", round(dorange,1))
print("dOrange_num:", int(dorange_num))
print("dTax:", dtax)

price: 715
dApple: 2.2
dApple_num: 110
dOrange: 3.3
dOrange_num: 165
dTax: 650


## 活性化関数レイヤ

In [30]:
import numpy as np

### ReLUレイヤ

In [31]:
class Relu:
    def __init__(self):
        self.mask = None
        
    def forward(self, x):
        self.mask = (x <= 0)
        out = x.copy
        out[self.mask] = 0
        
        return out
    
    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
        
        return dx

In [32]:
x = np.array([[1.0, -0.5], [-2.0, 3.0]])
print(x)

[[ 1.  -0.5]
 [-2.   3. ]]


In [33]:
mask = (x <= 0)

In [34]:
print(mask)

[[False  True]
 [ True False]]


### シグモイドレイヤ

In [36]:
class Sigmoid:
    def __init__(self):
        self.out = None
        
    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        self.out = out
        
        return out
    
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out
        
        return dx

## Affine/Softmaxレイヤの実装

ニューラルネットにおける順伝播で行う行列の内積は，幾何学の分野では「アフィン変換」と呼ばれている．  
[1]と[2]の転置行列のかけ方に注意！

<img src="./fig/5.jpeg" width=400 align="center">

### バッチ版Affineレイヤ

In [38]:
class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
        
    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        
        return out
    
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)
        
        return dx

### Softmax-with-Loss レイヤ

In [39]:
class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None # 損失
        self.y = None # softmax の出力
        self.t = None
        
    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)
        self.loss = crossentropy_error(self.y, self.t)
        
        return self.loss
    
    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        dx = (self.y - self.t) / batch_size
        
        return dx

## 誤差逆伝播法の実装

### 学習アルゴリズム
1. 訓練データでランダムに一部のデータを選ぶ
2. 各重みパラメータに関するloss functionのgradientを求める
3. 重みパラメータを勾配方向に微小だけ更新する
4. 繰り返す

In [40]:
import numpy as np
from common.layers import *
from common.gradient import numerical_gradient
from collections import OrderedDict

In [42]:
class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):
        # 重みの初期化
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.params['b2'] = np.zeros(output_size)

        # レイヤの生成(Adding from back propagation)
        self.layers = OrderedDict()
        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
        self.layers['Relu1'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])

        self.lastLayer = SoftmaxWithLoss()
        
    def predict(self, x):
        # (Adding from back propagation)
        for layer in self.layers.values():
            x = layer.forward(x)
        
        return x
        
    # x:入力データ, t:教師データ
    def loss(self, x, t):
        y = self.predict(x)
        return self.lastLayer.forward(y, t)
    
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1 : t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
        
    # x:入力データ, t:教師データ
    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        
        return grads
        
    def gradient(self, x, t):
        # (Adding from back propagation)
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.lastLayer.backward(dout)
        
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # 設定
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads

In [44]:
import numpy as np
from dataset.mnist import load_mnist
from src045.two_layer_net import TwoLayerNet

In [45]:
# データの読み込み
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

x_batch = x_train[:3]
t_batch = t_train[:3]

grad_numerical = network.numerical_gradient(x_batch, t_batch)
grad_backprop = network.gradient(x_batch, t_batch)

for key in grad_numerical.keys():
    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )
    print(key + ":" + str(diff))

b1:7.42192950097e-13
W2:7.85084103413e-13
b2:1.2034816893e-10
W1:2.0160063624e-13


In [47]:
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 勾配(back propagation)
    #grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)
    
    # 更新
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("acc:" + str(train_acc) , "val_acc:" + str(test_acc))


acc:0.1182 val_acc:0.1085
acc:0.9021 val_acc:0.9053
acc:0.921283333333 val_acc:0.9235
acc:0.93565 val_acc:0.937
acc:0.9432 val_acc:0.9425
acc:0.9496 val_acc:0.9467
acc:0.95455 val_acc:0.9541
acc:0.958916666667 val_acc:0.9548
acc:0.962466666667 val_acc:0.9585
acc:0.966 val_acc:0.9619
acc:0.967466666667 val_acc:0.9645
acc:0.970633333333 val_acc:0.9648
acc:0.9728 val_acc:0.9664
acc:0.974983333333 val_acc:0.9682
acc:0.974766666667 val_acc:0.9683
acc:0.9767 val_acc:0.9685
acc:0.977016666667 val_acc:0.9698


### まとめ

+ ニューラルネットで行う処理をレイヤという単位で実装し，それらのレイヤのなかでforward(順伝播)とbackward(逆伝播)というメソッドを実装することで，重みパラメータの勾配を効率的に求めることができる．
+ さらにレイヤをモジュール化することにより，ニューラルネットで自由にレイヤを組み合わせることができ，自分の好きなネットワークを簡単に作ることができる．
+ 計算グラフにおいて，順伝播は通常の計算をおこない，逆伝播に酔って各ノードの微分を求めることができる．
+ 数値微分と誤差逆伝播法の結果を比較することで，誤差逆伝播法の実装に誤りがないかどうかを確認することができる(勾配確認)