# Table of Contents
 <p><div class="lev1 toc-item"><a href="#逆伝搬(backward-propagation)" data-toc-modified-id="逆伝搬(backward-propagation)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>逆伝搬(backward propagation)</a></div><div class="lev2 toc-item"><a href="#計算グラフ" data-toc-modified-id="計算グラフ-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>計算グラフ</a></div><div class="lev2 toc-item"><a href="#数値微分の結果" data-toc-modified-id="数値微分の結果-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>数値微分の結果</a></div><div class="lev2 toc-item"><a href="#逆伝搬による偏微分値" data-toc-modified-id="逆伝搬による偏微分値-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>逆伝搬による偏微分値</a></div><div class="lev2 toc-item"><a href="#りんごとオレンジの合わせ技" data-toc-modified-id="りんごとオレンジの合わせ技-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>りんごとオレンジの合わせ技</a></div><div class="lev1 toc-item"><a href="#活性化関数レイヤ" data-toc-modified-id="活性化関数レイヤ-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>活性化関数レイヤ</a></div><div class="lev1 toc-item"><a href="#affine/softmaxレイヤ" data-toc-modified-id="affine/softmaxレイヤ-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>affine/softmaxレイヤ</a></div><div class="lev1 toc-item"><a href="#誤差逆伝搬法による学習" data-toc-modified-id="誤差逆伝搬法による学習-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>誤差逆伝搬法による学習</a></div>

# 逆伝搬(backward propagation)

数値微分は遅いので，それに置き換える計算法として，誤差逆伝搬法を用いる．これは，一度順伝搬計算をおこない，ノードごとの数値を記録しておいて，それらの総和として得られる値の微小変化を逆向きに伝搬させていってそれぞれのパラメータの偏微分を求めるということなのかな．

正解．

りんごオレンジを微分したらどうなるかを実験するといい！！


## 計算グラフ

図のようなりんごと消費税という単純なモデルを考える．

![calculation graph](DeepLearning/DeepLearning.004.jpeg)

$$
(100 \times 2) \times 1.1 = 220 
$$
になるんですが，それをMulLayerで作ると．．．

In [24]:
class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y     
        out = x * y

        return out

    def backward(self, dout):
        dx = dout * self.y
        dy = dout * self.x

        return dx, dy



In [25]:
apple = 100
apple_num = 2
tax = 1.1

mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price = mul_tax_layer.forward(apple_price, tax)

print(price)

220.00000000000003


## 数値微分の結果

数値微分を考えましょう．

||りんごの個数|りんごの値段|消費税|合計|偏微分
|:----|----:|----:|----:|----:|----:
|最初|2 | 100 | 1.1 |220|
|りんごの個数が変わる|2+1| | | 330|110
|りんごの値段が変わる||110| | 242 |2.2
|消費税が変わる|| |1.2 | 240|200


In [26]:
data = [2, 100, 1.1]
apple_num, apple, tax = data
# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price0 = mul_tax_layer.forward(apple_price, tax)

for i,x in enumerate([1, 10, 0.1]):
    d_data = [2, 100, 1.1]
    print("{0} = {1}".format(i,x))
    d_data[i] = d_data[i] + x
    apple_num, apple, tax = d_data
    # forward
    apple_price = mul_apple_layer.forward(apple, apple_num)
    price = mul_tax_layer.forward(apple_price, tax)

    print(price)
    print((price0-price)/(data[i]-d_data[i]))


0 = 1
330.0
109.99999999999997
1 = 10
242.00000000000003
2.2
2 = 0.1
240.00000000000003
199.99999999999983


## 逆伝搬による偏微分値

これをbackwardで解いてみる．

In [29]:
# backward
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)

print(dapple_price, dtax)

dapple, dapple_num = mul_apple_layer.backward(dapple_price)
print(dapple, dapple_num, dtax)


1.2000000000000002 200
2.4000000000000004 120.00000000000001 200


数式にすると
$$
\begin{aligned}
\frac{\partial f}{\partial AppleNum}=110 \\
\frac{\partial f}{\partial Apple}=2.2 \\
\frac{\partial f}{\partial Tax}=200 
\end{aligned}
$$

dpriceを変えたときは，割っておくことを忘れずに．

In [32]:
# backward
dprice = 2
dapple_price, dtax = mul_tax_layer.backward(dprice)

print(dapple_price, dtax)
print(dapple_price/dprice, dtax/dprice)


dapple, dapple_num = mul_apple_layer.backward(dapple_price)
print(dapple, dapple_num, dtax)
print(dapple/dprice, dapple_num/dprice, dtax/dprice)




2.4000000000000004 400
1.2000000000000002 200.0
4.800000000000001 240.00000000000003 400
2.4000000000000004 120.00000000000001 200.0


In [35]:
data = [2, 100, 1.1]
apple_num, apple, tax = data
# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price0 = mul_tax_layer.forward(apple_price, tax)

for i,x in enumerate([1, 10, 0.1]):
    d_data = [2, 100, 1.1]
    d_data[i] = d_data[i] * 2
    print("{0} = {1}".format(i,d_data[i]))
    apple_num, apple, tax = d_data
    # forward
    apple_price = mul_apple_layer.forward(apple, apple_num)
    price = mul_tax_layer.forward(apple_price, tax)

    print(price)
    print((price0-price)/(data[i]-d_data[i]))



0 = 4
440.00000000000006
110.00000000000001
1 = 200
440.00000000000006
2.2
2 = 2.2
440.00000000000006
200.0


さらに変な数値微分を考えましょう．

影響を2倍にしても偏微分の値は同じなんですよ．

||りんごの個数|りんごの値段|消費税|合計|偏微分
|:----|----:|----:|----:|----:|----:
|最初|2 | 100 | 1.1 |220|
|りんごの個数が変わる|2 x 2| | | 440|110
|りんごの値段が変わる||100 x 2| | 440 |2.2
|りんごの個数が変わる|| |1.1 x 2 | 440|200

つまり，線形の依存性しかない関数の偏微分は，微少量でなくてどんなんとってもいいということ．こいつはすごいな．

## りんごとオレンジの合わせ技


In [11]:
class AddLayer:
    def __init__(self):
        pass
    
    def forward(self, x, y):
        out = x + y
        return out
    
    
    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        return dx, dy
    

In [12]:
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
orange_price = mul_orange_layer.forward(orange, orange_num)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)
price = mul_tax_layer.forward(all_price, tax)

In [13]:
print(apple_price)
print(orange_price)
print(all_price)
print(price)

200
450
650
715.0000000000001


In [14]:
# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)

In [15]:
print(dall_price)
print(dtax)
print(dapple_price)
print(dorange_price)

1.1
650
1.1
1.1


In [16]:
dorange, dorange_num = mul_orange_layer.backward(dorange_price)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

print(dorange)
print(dorange_num)
print(dapple)
print(dapple_num)


3.3000000000000003
165.0
2.2
110.00000000000001


In [19]:
data = [100,2,150,3,1.1]
apple, apple_num, orange, orange_num, tax = data

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
orange_price = mul_orange_layer.forward(orange, orange_num)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)
price0= mul_tax_layer.forward(all_price, tax)
print(price0)

data_name =['apple', 'apple_num', 'orange', 'orange_num', 'tax']

for i in [0,1,2,3,4]:
    d_data = [100,2,150,3,1.1]
    d_data[i] = data[i]*1.1 
    apple, apple_num, orange, orange_num, tax = d_data
    # forward
    apple_price = mul_apple_layer.forward(apple, apple_num)
    orange_price = mul_orange_layer.forward(orange, orange_num)
    all_price = add_apple_orange_layer.forward(apple_price, 
                                               orange_price)
    price= mul_tax_layer.forward(all_price, tax)
    print('%10s: %4.2f %4.2f' % (data_name[i], price, (price0-price)/(data[i]-d_data[i])))

715.0000000000001
     apple: 737.00 2.20
 apple_num: 737.00 110.00
    orange: 764.50 3.30
orange_num: 764.50 165.00
       tax: 786.50 650.00


# 活性化関数レイヤ

# affine/softmaxレイヤ

# 誤差逆伝搬法による学習

In [1]:
# coding: utf-8
import sys, os
sys.path.append(os.pardir)

import numpy as np
from dataset.mnist import load_mnist
from ch05.two_layer_net import TwoLayerNet

# データの読み込み
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # 勾配
    #grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)
    
    # 更新
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)

0.152083333333 0.1523
0.9046 0.9079
0.923883333333 0.9269
0.936916666667 0.9378
0.943966666667 0.9445
0.95175 0.9495
0.955 0.9516
0.961766666667 0.9571
0.964366666667 0.9578
0.967233333333 0.9615
0.970033333333 0.9642
0.971033333333 0.9648
0.973766666667 0.967
0.975166666667 0.968
0.977183333333 0.9688
0.978466666667 0.9686
0.979883333333 0.9707
