# 训练深层神经网络


## 梯度消失/爆炸问题

反向传播算法的工作原理是从输出层到输入层，传播误差的梯度。 一旦该算法已经计算了网络中每个参数的损失函数的梯度，它就使用这些梯度来用梯度下降步骤来更新每个参数。

不幸的是，梯度往往变得越来越小，随着算法进展到较低层。	结果，梯度下降更新使得低层连接权重实际上保持不变，并且训练永远不会收敛到良好的解决方案。这被称为 __梯度消失问题__。	

在某些情况下，可能会发生相反的情况：梯度可能变得越来越大，许多层得到了非常大 的权重更新，算法发散。这是 __梯度爆炸的问题__。


In [3]:
import tensorflow as tf
import numpy as np

###  减缓这一问题的方法一：Xavier初始化和He初始化

提出需要保持每一层的输入和输出的方差一致，并且需要在反向流动过某一层时，前后的方差也要一致。

当输入连接的数量大致等于输出连接的数量时，可以得到更简单的等式：(就是第十章中用到的方差)

</br>
<div align=center><img width="400" height="300" src="./static/1.jpg"/></div>

该方法的折中方案公式为Xavier初始化。

ReLU激活函数的初始化方法有时称为He初始化。

#### He 初始化
He	初始化只考虑了扇入，而不是像	Xavier	初始化那样扇入和扇出之间的平均值。	

这也 是	`variance_scaling_initializer()`	函数的默认值，但您可以通过设置参 数	`mode	="FAN_AVG"`	来更改它。


In [4]:
# he_init = tf.contrib.layers.variance_scaling_initializer()
# hidden1 = tf.layers.dense(X,n_hidden,activation=tf.nn.relu,kernel_initializer=he_init,name='hidden1')

###  减缓这一问题的方法二：更换激活函数

一般来说	ELU	>	leaky	ReLU（及其变体）>	ReLU	>	tanh	>	sigmoid

#### elu 激活函数

tensorflow 中提供了 elu 函数用于建立神经网络：

In [5]:
# hidden1 = tf.layers.dense(x,n_hidden,activation=tf.nn.elu,name='hidden1')

#### leaky	ReLU 激活函数

tensorflow 中没有针对 leaky	ReLU 的函数，但是可以自己定义：

In [6]:
# def leak_relu(z,name=None):
#     return tf.maximum(0.01 * z,z,name = name)
# hidden1 = tf.layers.dense(x,n_hidden1,activation=leak_relu,name='hidden1')

###  减缓这一问题的方法三：批量标准化

尽管使用	He初始化和	ELU（或任何	ReLU	变体）可以显著减少训练开始阶段的梯度消失/爆炸问题，但不保证在训练期间问题不会回来。


__该技术会在每一层激活函数之前在模型中介入一个操作，操作实现简单零中心化和归一化输入，之后再通过每层的两个新参数（一个缩放，一个移动）来控制缩放和移动的结果。这样的操作会让模型学会最佳规模和每层输入的平均值。__
</br>
<div align=center><img width="400" height="300" src="./static/2.jpg"/></div>
<div align=center><img width="660" height="600" src="./static/3.jpg"/></div>

在测试时，没有小批量计算经验均值和标准差，所以您只需使用整个训练集的均值和标准 差。	这些通常在训练期间使用移动平均值进行有效计算。	因此，总的来说，每个批次标准化 的层次都学习了四个参数：	`γ（标度）`，	`β（偏移）`，	`μ（平均值）` 和	`σ（标准差）`。

#### 使用 BN 的优缺点

再使用饱和激活函数的深度神经网络中，批量归一化取得了非常好的成绩，而且还会为降低后续的正则化的技术需求。

但BN的使用确实也增加了模型的复杂度，降低了网络额速度。

#### 使用	TensorFlow	实现批量标准化

在 functools 模块中有一个工具partial()，可以用来"冻结"一个函数的参数，并返回"冻结"参数后的新函数。

In [7]:
from functools import partial
from tensorflow.examples.tutorials.mnist import input_data

#### tf.GraphKeys.UPDATE_OPS  

`batch normalization` 的兩個重要的參數，`moving_mean` 和 `moving_var`,两个 `batch_normalization` 中更新 `mean` 和 `variance` 的操作，需要保证它们在train_op前完成。

这两个操作是在 `tensorflow` 的内部实现中自动被加入 `tf.GraphKeys.UPDATE_OPS` 这个集合的，在 `tf.contrib.layers.batch_norm` 的参数中可以看到有一项`updates_collections` 的默认值即为 `tf.GraphKeys.UPDATE_OPS` ，而在 `tf.layers.batch_normalization` 中则是直接将两个更新操作放入了上述集合。

#### 小结：

1、如果不通过 `tf.get_collection` 来获取，`moving_mean` 和 `moving_var` 不会更新，一直都会是初始值。

2、当然 如果进行批量标准化更新参数，`tf.layers.batch_normalization` 中的  `training` 要设置为 `True` : `tf.placeholder_with_default(True, (), name='training')`

https://ithelp.ithome.com.tw/articles/10220410

https://github.com/jason9075/ithome_tensorflow_series/blob/day17/17/update_op.py


In [23]:
tf.reset_default_graph()

n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

mnist = input_data.read_data_sets('./mldata/')
batch_norm_momentum = 0.9
learning_rate = 0.01

X = tf.placeholder(tf.float32,shape=(None,n_inputs),name = 'X')
y = tf.placeholder(tf.int64,shape=None,name='y')

training = tf.placeholder_with_default(False,shape=(),name = 'training') #给Batch	norm加一个placeholder 

with tf.name_scope('dnn'):
    he_init = tf.contrib.layers.variance_scaling_initializer()
    
    my_batch_norm_layer = partial(
        tf.layers.batch_normalization,
        training = training,
        momentum = batch_norm_momentum
    )

    my_dense_layer = partial(
        tf.layers.dense,
        kernel_initializer = he_init
    )
    
    hidden1 = my_dense_layer(X,n_hidden1,name='hidden1')
    bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))
    
    hidden2 = my_dense_layer(bn1,n_hidden2,name='hidden2')
    bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))
    
    logits_before_bn = my_dense_layer(bn2,n_outputs,name='outputs')
    logits = my_batch_norm_layer(logits_before_bn)
    
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y,logits=logits)
    loss = tf.reduce_mean(xentropy,name='loss')

with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    training_op = optimizer.minimize(loss)
    
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits,y,1)
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 200

extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 

with tf.Session() as sess:
    init.run()

    for epoch in range(n_epochs):
        for iteraction in range(mnist.train.num_examples // batch_size):
            X_batch,y_batch = mnist.train.next_batch(batch_size)
            sess.run([training_op,extra_update_ops],feed_dict={training:True,X:X_batch,y:y_batch})
        
        accuracy_val = accuracy.eval(feed_dict={X:mnist.test.images,y:mnist.test.labels})
        print(epoch, 'Test accuracy:', accuracy_val)



Extracting ./mldata/train-images-idx3-ubyte.gz
Extracting ./mldata/train-labels-idx1-ubyte.gz
Extracting ./mldata/t10k-images-idx3-ubyte.gz
Extracting ./mldata/t10k-labels-idx1-ubyte.gz
0 Test accuracy: 0.8723
1 Test accuracy: 0.8956
2 Test accuracy: 0.9121
3 Test accuracy: 0.9206
4 Test accuracy: 0.9289
5 Test accuracy: 0.9351
6 Test accuracy: 0.9395
7 Test accuracy: 0.9428
8 Test accuracy: 0.9473
9 Test accuracy: 0.949
10 Test accuracy: 0.9521
11 Test accuracy: 0.9518
12 Test accuracy: 0.9559
13 Test accuracy: 0.9584
14 Test accuracy: 0.9585
15 Test accuracy: 0.9609
16 Test accuracy: 0.963
17 Test accuracy: 0.9641
18 Test accuracy: 0.9647
19 Test accuracy: 0.9664


#### 或者将上面的 `train` 改写成：

In [25]:
# with tf.name_scope('train'):
#     optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
#     extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
#     with tf.control_dependencies(extra_update_ops):
#         training_op = optimizer.minimize(loss)


# with tf.Session() as sess:
#     init.run()

#     for epoch in range(n_epochs):
#         for iteraction in range(mnist.train.num_examples // batch_size):
#             X_batch,y_batch = mnist.train.next_batch(batch_size)
#             sess.run(training_op,feed_dict={training:True,X:X_batch,y:y_batch})
        
#         accuracy_val = accuracy.eval(feed_dict={X:mnist.test.images,y:mnist.test.labels})
#         print(epoch, 'Test accuracy:', accuracy_val)

 #### 这样，你只需要在训练过程中评估 training_op，TensorFlow 也会自动运行更新操作

In [24]:
# tf.reset_default_graph()

# n_inputs = 28 * 28
# n_hidden1 = 300
# n_hidden2 = 100
# n_outputs = 10

# mnist = input_data.read_data_sets('./mldata/')
# batch_norm_momentum = 0.9
# learning_rate = 0.01

# X = tf.placeholder(tf.float32,shape=(None,n_inputs),name = 'X')
# y = tf.placeholder(tf.int64,shape=None,name='y')

# training = tf.placeholder_with_default(False,shape=(),name = 'training') #给Batch	norm加一个placeholder

# with tf.name_scope('dnn'):
#     he_init = tf.contrib.layers.variance_scaling_initializer()
    
#     my_batch_norm_layer = partial(
#         tf.layers.batch_normalization,
#         training = training,
#         momentum = batch_norm_momentum
#     )

#     my_dense_layer = partial(
#         tf.layers.dense,
#         kernel_initializer = he_init
#     )
    
#     hidden1 = my_dense_layer(X,n_hidden1,name='hidden1')
#     bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))
    
#     hidden2 = my_dense_layer(bn1,n_hidden2,name='hidden2')
#     bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))
    
#     logits_before_bn = my_dense_layer(bn2,n_outputs,name='outputs')
#     logits = my_batch_norm_layer(logits_before_bn)
    
# with tf.name_scope('loss'):
#     xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels = y,logits=logits)
#     loss = tf.reduce_mean(xentropy,name='loss')

# with tf.name_scope('train'):
#     optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
#     training_op = optimizer.minimize(loss)
    
# with tf.name_scope('eval'):
#     correct = tf.nn.in_top_k(logits,y,1)
#     accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))

# init = tf.global_variables_initializer()
# saver = tf.train.Saver()

# n_epochs = 20
# batch_size = 200

# with tf.Session() as sess:
#     init.run()

#     for epoch in range(n_epochs):
#         for iteraction in range(mnist.train.num_examples // batch_size):
#             X_batch,y_batch = mnist.train.next_batch(batch_size)
#             sess.run([training_op],feed_dict={training:True,X:X_batch,y:y_batch})
        
#         accuracy_val = accuracy.eval(feed_dict={X:mnist.test.images,y:mnist.test.labels})
#         print(epoch, 'Test accuracy:', accuracy_val)


Extracting ./mldata/train-images-idx3-ubyte.gz
Extracting ./mldata/train-labels-idx1-ubyte.gz
Extracting ./mldata/t10k-images-idx3-ubyte.gz
Extracting ./mldata/t10k-labels-idx1-ubyte.gz
0 Test accuracy: 0.7275
1 Test accuracy: 0.762
2 Test accuracy: 0.7851
3 Test accuracy: 0.8003
4 Test accuracy: 0.8213
5 Test accuracy: 0.8446
6 Test accuracy: 0.8438
7 Test accuracy: 0.8652
8 Test accuracy: 0.8791
9 Test accuracy: 0.8897
10 Test accuracy: 0.893
11 Test accuracy: 0.8897
12 Test accuracy: 0.8905
13 Test accuracy: 0.8996
14 Test accuracy: 0.8976
15 Test accuracy: 0.9018
16 Test accuracy: 0.9019
17 Test accuracy: 0.8985
18 Test accuracy: 0.905
19 Test accuracy: 0.9064


In [29]:
import tensorflow as tf
a_1 = tf.Variable(1)
b_1 = tf.Variable(2)
update_op = tf.assign(a_1, 10)
add = tf.add(a_1, b_1)

a_2 = tf.Variable(1)
b_2 = tf.Variable(2)
update_op = tf.assign(a_2, 10)
with tf.control_dependencies([update_op]):
    add_with_dependencies = tf.add(a_2, b_2)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    ans_1, ans_2 = sess.run([add, add_with_dependencies])
    print("Add: ", ans_1)
    print("Add_with_dependency: ", ans_2)

Add:  3
Add_with_dependency:  12


###  减缓这一问题的方法四：梯度裁剪

减少梯度爆炸问题的一种常用技术是在反向传播过程中简单地剪切梯度，使它们不超过某个 __阈值__

一般 来说，人们更喜欢 __批量标准化__ ，但了解 __梯度裁剪__ 以及如何实现它仍然是有用的。

在 TensorFlow 中，优化器的	`minimize()`	函数负责计算梯度并应用它们，所以您必须首先调用优化器的	`compute_gradients()` 方法，然后使用	`clip_by_value()` 函数创建一个 `裁剪梯度` 的 操作，最后创建一个操作来使用优化器的 `apply_gradients()`	方法应用裁剪梯度：


In [None]:
# threshold = 1.0

# optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
# grad_and_var = optimizer.compute_gradients(loss)
# capped_gvs =  [(tf.clip_by_value(grad, -threshold, threshold), var)for grad, var in grads_and_vars] 
# training_op = optimizer.apply_gradients(capped_gvs)

像往常一样，您将在每个训练阶段运行这个	training_op	。	它将计算梯度，将它们裁剪到 -1.0	和	1.0	之间，并应用它们。		threhold	是您可以调整的超参数。


### 复用预训练层


从零开始训练一个非常大的	DNN	通常不是一个好主意，相反，__您应该总是尝试找到一个现有 的神经网络来完成与您正在尝试解决的任务类似的任务，然后复用这个网络的较低层：这就 是所谓的迁移学习__ 。这不仅会大大加快训练速度，还将需要更少的训练数据。


#### 比如现在有这样一个 tensorflow 模型：

In [65]:
tf.reset_default_graph()

n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 50
n_hidden3 = 50
n_hidden4 = 50
n_hidden5 = 50

n_outputs = 10


X = tf.placeholder(tf.float32,shape=(None,n_inputs),name='X')
y = tf.placeholder(tf.int32,shape=None,name='y')

with tf.name_scope('dnn'):
    hidden1 = tf.layers.dense(X,n_hidden1,activation=tf.nn.relu,name='hidden1')
    hidden2 = tf.layers.dense(hidden1,n_hidden2,activation=tf.nn.relu,name='hidden2')
    hidden3 = tf.layers.dense(hidden2,n_hidden3,activation=tf.nn.relu,name='hidden3')
    hidden4 = tf.layers.dense(hidden3,n_hidden4,activation=tf.nn.relu,name='hidden4')
    hidden5 = tf.layers.dense(hidden4,n_hidden5,activation=tf.nn.relu,name='hidden5')

    logits = tf.layers.dense(hidden5,n_outputs,name='outputs')

with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)
    loss = tf.reduce_mean(xentropy,name='loss')
    
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits,y,1)
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32),name='accuracy')

learning_rate = 0.01
threshold = 1.0

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
grads_and_vars = optimizer.compute_gradients(loss)
capped_gvs =  [(tf.clip_by_value(grad, -threshold, threshold), var)for grad, var in grads_and_vars] 
training_op = optimizer.apply_gradients(capped_gvs)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    
    init.run()
    
    for epoch in range(n_epochs):
        
        for iteraction in range(mnist.train.num_examples // batch_size):
            x_batch,y_batch = mnist.train.next_batch(batch_size)
            sess.run([training_op],feed_dict={X:x_batch,y:y_batch})
        accuracy_val = accuracy.eval(feed_dict={X:mnist.test.images,y:mnist.test.labels})
        print('accuracy_val:',accuracy_val)
    save_path = saver.save(sess,'./my_model_final.ckep')

accuracy_val: 0.8876
accuracy_val: 0.9209
accuracy_val: 0.9393
accuracy_val: 0.9506
accuracy_val: 0.9533
accuracy_val: 0.9597
accuracy_val: 0.9644
accuracy_val: 0.9642
accuracy_val: 0.9662
accuracy_val: 0.9692
accuracy_val: 0.9698
accuracy_val: 0.9625
accuracy_val: 0.9697
accuracy_val: 0.972
accuracy_val: 0.9724
accuracy_val: 0.9737
accuracy_val: 0.9712
accuracy_val: 0.972
accuracy_val: 0.9723
accuracy_val: 0.9733


接下来我们要复用之前模型的前几层

In [68]:
import os

In [69]:
ckpt = tf.train.get_checkpoint_state(os.path.dirname(r"./"))
ckpt

model_checkpoint_path: ".\\my_model_final.ckep"
all_model_checkpoint_paths: ".\\my_model_final.ckep"

In [73]:
tf.reset_default_graph()

n_inputs = 28 * 28
n_hidden1 = 300 # reused
n_hidden2 = 50  # reused
n_hidden3 = 50  # reused

n_hidden4 = 20  # new!
n_outputs = 10  # new!

X = tf.placeholder(tf.float32,shape=(None,n_inputs),name='X')
y = tf.placeholder(tf.int32,shape=None,name='y')

with tf.name_scope('dnn'):
    hidden1 = tf.layers.dense(X,n_hidden1,activation=tf.nn.relu,name='hidden1')
    hidden2 = tf.layers.dense(hidden1,n_hidden2,activation=tf.nn.relu,name='hidden2')
    hidden3 = tf.layers.dense(hidden2,n_hidden3,activation=tf.nn.relu,name='hidden3')
    hidden4 = tf.layers.dense(hidden3,n_hidden4,activation=tf.nn.relu,name='hidden4')

    logits = tf.layers.dense(hidden4,n_outputs,name='outputs')

with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)
    loss = tf.reduce_mean(xentropy,name='loss')

    
with tf.name_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits,y,1)
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32),name='accuracy')

[...] # build new model with the same definition as before for hidden layers 1-3

reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope='hidden[123]')
reuse_vars_dict = dict([(var.op.name,var) for var in reuse_vars]) 
restore_saver = tf.train.Saver(reuse_vars_dict) # to restore layers 1-3

init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    
    init.run()
    restore_saver.restore(sess,'.\\my_model_final.ckep')
    
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            x_batch,y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op,feed_dict={X:x_batch,y:y_batch})
        accuracy_val = accuracy.eval(feed_dict={X:mnist.test.images,y:mnist.test.labels})
        print(epoch,'accuracy_val: ',accuracy_val)
        
    saver_path = saver.save(sess,'./my_new_model_final.ckep')

INFO:tensorflow:Restoring parameters from .\my_model_final.ckep
0 accuracy_val:  0.9543
1 accuracy_val:  0.9648
2 accuracy_val:  0.9696
3 accuracy_val:  0.9706
4 accuracy_val:  0.9725
5 accuracy_val:  0.9718
6 accuracy_val:  0.9739
7 accuracy_val:  0.9742
8 accuracy_val:  0.9733
9 accuracy_val:  0.9743
10 accuracy_val:  0.9743
11 accuracy_val:  0.9748
12 accuracy_val:  0.9738
13 accuracy_val:  0.9759
14 accuracy_val:  0.9744
15 accuracy_val:  0.9745
16 accuracy_val:  0.9754
17 accuracy_val:  0.975
18 accuracy_val:  0.9742
19 accuracy_val:  0.9747


首先我们建立新的模型，确保复制原始模型的隐藏层	1	到	3。我们还创建一个节点来初始化 所有变量。	然后我们得到刚刚用	trainable	=	True （这是默认值）创建的所有变量的列表， 我们只保留那些范围与正则表达式 hidden [123] 相匹配的变量（即，我们得到所有可训练的隐藏层 1 到 3 中的变量）。	

接下来，我们创建一个字典，将原始模型中每个变量的名称映射 到新模型中的名称（通常需要保持完全相同的名称）。	然后，我们创建一个	Saver	，它将只 恢复这些变量，并且创建另一个	Saver	来保存整个新模型，而不仅仅是第	1	层到第	3	层。然 后，我们开始一个会话并初始化模型中的所有变量，然后从原始模型的层	1	到	3中恢复变量 值。最后，我们在新任务上训练模型并保存。




## 更快的优化器

训练一个非常大的深度神经网络可能会非常缓慢。	

到目前为止，我们已经看到了四种加速训练的方法（并且达到更好的解决方案）：

* __1、对连接权重应用良好的初始化策略__，


* __2、使用良好的激活函数__，


* __3、使用批量规范化__


* __4、重用预训练网络__

另一个巨大的速度提升来自使用比普通渐变下降优化器更快的优化器。	在本节中，我们将介绍最流行的：__动量优化__ ，__Nesterov	加速梯度__，__AdaGrad__，__RMSProp__，最后是	__Adam	优化__。

In [None]:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)

#### 本节的结论是:

您几乎总是应该使用	__Adam_optimization__	，所以如果您不关心它是如何 工作的，只需使用	__AdamOptimizer__	替换您的	__GradientDescentOptimizer__	

### 学习率调整

找到一个好的学习速度可能会非常棘手。	如果设置太高，训练实际上可能偏离。	如果设置得太低，训练最终会收敛到最佳状态，但这需要很长时间。	

我们可以使用 __指数调度__  

<div align=center><img width="600" height="400" src="./static/4.jpg"/></div>

In [None]:
# initial_learning_rate = 0.1
# decay_steps = 10000
# decay_rate =1/10

# gloable_step = tf.Variable(0,trainable=False,name='gloable_step')
# learning_rate = tf.train.exponential_decay(initial_learning_rate,gloable_step,decay_steps,decay_rate)

# optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,momentum=0.9)
# training_op = optimizer.minimize(loss,global_step=gloable_step)

使用	TensorFlow	的	__exponential_decay()__	函数来定义指数衰减的学习 率（	η0=	0.1	和	r	=	10,000	）

接下来，我们使用这个衰减的学习率创建一个优化器（在这 个例子中是一个	__MomentumOptimizer__	）。	最后，我们通过调用优化器的	__minimize()__	方法来创 建训练操作；因为我们将	__global_step__	变量传递给它，所以请注意增加它。

#### 由于	AdaGrad，RMSProp	和	Adam	优化自动降低了训练期间的学习率，因此不需要添加额 外的学习率调整。	对于其他优化算法，使用指数衰减或性能调度可显著加速收敛。


In [74]:
tf.reset_default_graph()

n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 50
n_hidden3 = 50
n_hidden4 = 50
n_hidden5 = 50

n_outputs = 10


X = tf.placeholder(tf.float32,shape=(None,n_inputs),name='X')
y = tf.placeholder(tf.int32,shape=None,name='y')

with tf.name_scope('dnn'):
    hidden1 = tf.layers.dense(X,n_hidden1,activation=tf.nn.relu,name='hidden1')
    hidden2 = tf.layers.dense(hidden1,n_hidden2,activation=tf.nn.relu,name='hidden2')
    hidden3 = tf.layers.dense(hidden2,n_hidden3,activation=tf.nn.relu,name='hidden3')
    hidden4 = tf.layers.dense(hidden3,n_hidden4,activation=tf.nn.relu,name='hidden4')
    hidden5 = tf.layers.dense(hidden4,n_hidden5,activation=tf.nn.relu,name='hidden5')

    logits = tf.layers.dense(hidden5,n_outputs,name='outputs')

with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)
    loss = tf.reduce_mean(xentropy,name='loss')
    
with tf.name_scope('eval'):
    correct = tf.nn.in_top_k(logits,y,1)
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32),name='accuracy')

learning_rate = 0.01
threshold = 1.0
with tf.name_scope('train'):
    initial_learning_rate = 0.1
    decay_steps = 10000
    decay_rate =1/10

    gloable_step = tf.Variable(0,trainable=False,name='gloable_step')
    learning_rate = tf.train.exponential_decay(initial_learning_rate,gloable_step,decay_steps,decay_rate)

    optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,momentum=0.9)
    training_op = optimizer.minimize(loss,global_step=gloable_step)


init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    
    init.run()
    
    for epoch in range(n_epochs):
        
        for iteraction in range(mnist.train.num_examples // batch_size):
            x_batch,y_batch = mnist.train.next_batch(batch_size)
            sess.run([training_op],feed_dict={X:x_batch,y:y_batch})
        accuracy_val = accuracy.eval(feed_dict={X:mnist.test.images,y:mnist.test.labels})
        print('accuracy_val:',accuracy_val)
    save_path = saver.save(sess,'./my_model_final.ckep')

accuracy_val: 0.9507
accuracy_val: 0.9654
accuracy_val: 0.971
accuracy_val: 0.9729
accuracy_val: 0.9766
accuracy_val: 0.9779
accuracy_val: 0.9787
accuracy_val: 0.9792
accuracy_val: 0.9804
accuracy_val: 0.9805
accuracy_val: 0.9799
accuracy_val: 0.9804
accuracy_val: 0.9812
accuracy_val: 0.9808
accuracy_val: 0.9807
accuracy_val: 0.9807
accuracy_val: 0.9808
accuracy_val: 0.9807
accuracy_val: 0.9807
accuracy_val: 0.9807


## 通过正则化避免过拟合

有了数以百万计的参数，你可以适应整个动物园。	在本节中，我们将介绍一些最流行的神经 网络正则化技术，以及如何用	__TensorFlow__	实现它们：

* __早期停止__

* __l1	和	l2	正则化__

* __drop	out__

* __最大范数正则化__

* __数据增强__

### 早期停止

与	TensorFlow	实现方法之一是评估其对设置定期（例如，每	50	步）验证模型，并保存一 个“winner”的快照，如果它优于以前“winner”的快照。计算自上次“winner”快照保存以来的步 数，并在达到某个限制时（例如	2000	步）中断训练。	然后恢复最后的“winner”快照。

虽然早期停止在实践中运行良好，但是通过将其与其他正则化技术相结合，您通常可以在网 络中获得更高的性能。


### L1	和	L2	正则化

你可以使用	l1	和	l2	正则化约束一个神经网络 的连接权重（但通常不是它的偏置）

In [83]:
tf.reset_default_graph()

In [84]:
n_inputs = 28 * 28  #  MNIST
n_hidden1 = 300
n_hidden2 = 50
n_outputs = 10
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

In [85]:
scale = 0.01

In [86]:
my_dense_layer = partial(
    tf.layers.dense,activation=tf.nn.relu,kernel_regularizer=tf.contrib.layers.l1_regularizer(scale)
)

with tf.name_scope('dnn'):
    hidden1 = my_dense_layer(X,n_hidden1,name='hidden1')
    hidden2 = my_dense_layer(hidden1,n_hidden2,name='hidden2')    
    logits = my_dense_layer(hidden2,n_outputs,name='outputs',activation=None)    

该代码创建了一个具有两个隐藏层和一个输出层的神经网络，并且还在图中创建节点以计算 与每个层的权重相对应的	l1	正则化损失。

TensorFlow	会自动将这些节点添加到包含所有正则 化损失的特殊集合中。	

您只需要将这些正则化损失添加到您的整体损失中:

In [87]:
with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,logits=logits)
    base_loss = tf.reduce_mean(xentropy,name = 'avg_xentropy')
    reg_loss = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    loss = tf.add_n([base_loss] + reg_loss,name='loss')

In [88]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name="accuracy")
learning_rate = 0.01
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)
    
init = tf.global_variables_initializer()
saver = tf.train.Saver()

#### 查看 base_loss 和 base_loss 和 loss

In [117]:
# n_epochs = 2
# batch_size = 2
# with tf.Session() as sess:
#     init.run()
#     for epoch in range(n_epochs):
#         for iteration in range(mnist.train.num_examples // batch_size):
#             X_batch, y_batch = mnist.train.next_batch(batch_size)
#             sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            
#             print('base_loss',sess.run(base_loss, feed_dict={X: X_batch, y: y_batch}))
#             print('base_loss',sess.run(reg_loss))
#             print('loss:',sess.run(loss, feed_dict={X: X_batch, y: y_batch}))


#         accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,
#                                                 y: mnist.test.labels})
#         print(epoch, "Test accuracy:", accuracy_val)


base_loss 2.0395257
reg_loss [87.28086, 9.8397045, 0.76513356]
loss: 99.92523
base_loss 1.9679055
reg_loss [87.04675, 9.824848, 0.7644928]
loss: 99.604004
base_loss 1.5994551
reg_loss [86.813736, 9.810329, 0.7641037]
loss: 98.98763
base_loss 1.7554479
reg_loss [86.58022, 9.795431, 0.76366484]
loss: 98.89477
base_loss 1.5769806
reg_loss [86.34713, 9.78064, 0.76318014]
loss: 98.46793
base_loss 1.890858
reg_loss [86.11391, 9.765747, 0.7626589]
loss: 98.53317
base_loss 1.5533714
reg_loss [85.88197, 9.751421, 0.7623978]
loss: 97.949165
base_loss 2.174447
reg_loss [85.64851, 9.736156, 0.7617344]
loss: 98.32085
base_loss 2.086952
reg_loss [85.41607, 9.721332, 0.7612041]
loss: 97.98556
base_loss 1.9674599
reg_loss [85.18447, 9.706643, 0.7607554]
loss: 97.61933
base_loss 1.8041854
reg_loss [84.95316, 9.692038, 0.7603098]
loss: 97.209694
base_loss 2.1641507
reg_loss [84.72172, 9.676592, 0.7596122]
loss: 97.322075
base_loss 1.9816387
reg_loss [84.490944, 9.661867, 0.7591539]
loss: 96.8936
base_lo

base_loss 1.8778477
reg_loss [63.234467, 8.264755, 0.71789557]
loss: 74.09496
base_loss 1.5407398
reg_loss [63.033348, 8.251349, 0.71772176]
loss: 73.54316
base_loss 1.78876
reg_loss [62.837227, 8.238059, 0.71733105]
loss: 73.581375
base_loss 1.5401347
reg_loss [62.640957, 8.224937, 0.71706086]
loss: 73.12309
base_loss 1.7979263
reg_loss [62.442364, 8.211532, 0.71661276]
loss: 73.168434
base_loss 1.4467611
reg_loss [62.245735, 8.198847, 0.71641266]
loss: 72.60776
base_loss 1.6996384
reg_loss [62.04963, 8.185421, 0.7159836]
loss: 72.65067
base_loss 1.6148676
reg_loss [61.854607, 8.172226, 0.7156759]
loss: 72.357376
base_loss 1.5372995
reg_loss [61.656727, 8.159006, 0.7154152]
loss: 72.06845
base_loss 2.026428
reg_loss [61.45916, 8.144889, 0.715031]
loss: 72.34551
base_loss 1.7039036
reg_loss [61.26361, 8.131602, 0.7147338]
loss: 71.81385
base_loss 1.8413584
reg_loss [61.06515, 8.118069, 0.7144553]
loss: 71.73904
base_loss 1.5661857
reg_loss [60.873962, 8.104946, 0.7140274]
loss: 71.2591

base_loss 1.9265676
reg_loss [40.877586, 6.695016, 0.6847508]
loss: 50.183918
base_loss 1.4810266
reg_loss [40.718178, 6.6835217, 0.68453026]
loss: 49.567253
base_loss 1.533439
reg_loss [40.559494, 6.6719184, 0.68428564]
loss: 49.449135
base_loss 1.3547475
reg_loss [40.402264, 6.660348, 0.68409336]
loss: 49.10145
base_loss 0.99241495
reg_loss [40.247246, 6.6490245, 0.68397033]
loss: 48.57266
base_loss 1.2234796
reg_loss [40.083923, 6.6370907, 0.6838158]
loss: 48.62831
base_loss 0.5786583
reg_loss [39.92995, 6.6261225, 0.68379897]
loss: 47.81853
base_loss 1.301538
reg_loss [39.77377, 6.6150427, 0.6836208]
loss: 48.37397
base_loss 1.529404
reg_loss [39.61509, 6.6031027, 0.68342614]
loss: 48.431023
base_loss 1.1555587
reg_loss [39.45929, 6.5913305, 0.6832816]
loss: 47.88946
base_loss 1.5562452
reg_loss [39.307068, 6.579856, 0.683287]
loss: 48.126457
base_loss 1.0889202
reg_loss [39.14976, 6.567872, 0.682953]
loss: 47.489506
base_loss 0.9611114
reg_loss [38.99538, 6.556666, 0.6826821]
loss

base_loss 1.3803706
reg_loss [18.985016, 4.9248796, 0.6538212]
loss: 25.944088
base_loss 1.1639845
reg_loss [18.885237, 4.9157186, 0.65387714]
loss: 25.618816
base_loss 1.5805942
reg_loss [18.786331, 4.907072, 0.65362555]
loss: 25.927624
base_loss 1.5145345
reg_loss [18.684607, 4.89793, 0.6535637]
loss: 25.750635
base_loss 1.7860909
reg_loss [18.584171, 4.8869023, 0.6531621]
loss: 25.910326
base_loss 1.2528584
reg_loss [18.467644, 4.876505, 0.6530301]
loss: 25.250038
base_loss 1.6540859
reg_loss [18.35514, 4.8661246, 0.6529147]
loss: 25.528265
base_loss 1.4359901
reg_loss [18.246748, 4.8565226, 0.652815]
loss: 25.192076
base_loss 1.4779046
reg_loss [18.13815, 4.846445, 0.6527037]
loss: 25.115204
base_loss 1.3662028
reg_loss [18.035997, 4.8368735, 0.6525932]
loss: 24.891668
base_loss 1.498264
reg_loss [17.928102, 4.826585, 0.65241224]
loss: 24.905365
base_loss 1.126251
reg_loss [17.827549, 4.8170247, 0.6523669]
loss: 24.423191
base_loss 1.190973
reg_loss [17.724438, 4.80797, 0.6524318]


base_loss 1.7955225
reg_loss [8.418856, 3.8251708, 0.6350934]
loss: 14.674643
base_loss 1.3575146
reg_loss [8.347444, 3.8163095, 0.6350903]
loss: 14.156357
base_loss 1.6069613
reg_loss [8.271088, 3.8075294, 0.63496727]
loss: 14.320545
base_loss 1.4952552
reg_loss [8.196586, 3.798268, 0.63460416]
loss: 14.124714
base_loss 0.89845777
reg_loss [8.13836, 3.7906947, 0.6344931]
loss: 13.462006
base_loss 0.8526445
reg_loss [8.068884, 3.782728, 0.6342827]
loss: 13.33854
base_loss 1.5826192
reg_loss [8.005368, 3.7741416, 0.63410485]
loss: 13.996234
base_loss 1.464217
reg_loss [7.952378, 3.7669525, 0.63394195]
loss: 13.817489
base_loss 1.8113596
reg_loss [7.8747363, 3.7570512, 0.63366944]
loss: 14.0768175
base_loss 1.5367124
reg_loss [7.8027735, 3.7478387, 0.63337636]
loss: 13.720701
base_loss 0.71357846
reg_loss [7.761671, 3.7425337, 0.6333013]
loss: 12.851085
base_loss 1.6236383
reg_loss [7.695455, 3.734332, 0.6331821]
loss: 13.686608
base_loss 1.4033254
reg_loss [7.6336117, 3.7258215, 0.63316

reg_loss [2.4973006, 2.8835247, 0.61307704]
loss: 7.723775
base_loss 1.648468
reg_loss [2.459637, 2.8762562, 0.61289513]
loss: 7.5972557
base_loss 1.416191
reg_loss [2.4349582, 2.8706806, 0.61285895]
loss: 7.334688
base_loss 1.5021077
reg_loss [2.4113202, 2.8629782, 0.6126]
loss: 7.389006
base_loss 1.5116636
reg_loss [2.3753493, 2.8557665, 0.61252415]
loss: 7.355304
base_loss 1.7364038
reg_loss [2.3453267, 2.848792, 0.6122492]
loss: 7.542772
base_loss 1.8577988
reg_loss [2.3172388, 2.8420384, 0.61192214]
loss: 7.628998
base_loss 1.6043193
reg_loss [2.291589, 2.834925, 0.61179507]
loss: 7.342628
base_loss 1.59998
reg_loss [2.2677698, 2.8287952, 0.6117376]
loss: 7.3082824
base_loss 1.7514179
reg_loss [2.2307482, 2.8218548, 0.6114977]
loss: 7.4155188
base_loss 1.8149651
reg_loss [2.199878, 2.8136263, 0.61119306]
loss: 7.4396625
base_loss 1.7077014
reg_loss [2.163847, 2.8055797, 0.6108653]
loss: 7.287993
base_loss 1.5842727
reg_loss [2.1355417, 2.7986364, 0.61053455]
loss: 7.1289854
base_l

base_loss 1.7814515
reg_loss [0.70611113, 2.0443027, 0.58260494]
loss: 5.11447
base_loss 1.6620114
reg_loss [0.7061163, 2.0377884, 0.58235687]
loss: 4.988273
base_loss 1.80807
reg_loss [0.68895185, 2.0321002, 0.5821458]
loss: 5.1112676
base_loss 1.9392786
reg_loss [0.679209, 2.0259933, 0.58194214]
loss: 5.226423
base_loss 1.9734068
reg_loss [0.67037266, 2.0199883, 0.58159786]
loss: 5.2453656
base_loss 2.029548
reg_loss [0.67973113, 2.0135808, 0.581201]
loss: 5.304061
base_loss 1.6600838
reg_loss [0.66675854, 2.007777, 0.5809506]
loss: 4.9155703
base_loss 1.3656677
reg_loss [0.666843, 2.003279, 0.5807158]
loss: 4.616505
base_loss 1.6967638
reg_loss [0.6754108, 1.997326, 0.5804032]
loss: 4.949904
base_loss 2.0295918
reg_loss [0.6682767, 1.9908345, 0.5801109]
loss: 5.268814
base_loss 2.13211
reg_loss [0.6556313, 1.9843109, 0.57981044]
loss: 5.351863
base_loss 1.9895463
reg_loss [0.663643, 1.9784107, 0.57946056]
loss: 5.2110605
base_loss 2.037384
reg_loss [0.64844, 1.9717671, 0.5791589]
lo

base_loss 1.8617942
reg_loss [0.46835485, 1.4112872, 0.55035603]
loss: 4.2917924
base_loss 1.8390567
reg_loss [0.46959046, 1.4065677, 0.5501184]
loss: 4.265333
base_loss 2.1733832
reg_loss [0.45291683, 1.4011761, 0.5497622]
loss: 4.5772386
base_loss 1.7583363
reg_loss [0.47048774, 1.3972329, 0.5495537]
loss: 4.175611
base_loss 2.005714
reg_loss [0.46317905, 1.3920366, 0.5492306]
loss: 4.41016
base_loss 1.796937
reg_loss [0.46278498, 1.3876004, 0.5489631]
loss: 4.1962857
base_loss 2.0097563
reg_loss [0.45414972, 1.382428, 0.5487088]
loss: 4.395043
base_loss 2.0228143
reg_loss [0.45470655, 1.3782294, 0.54849046]
loss: 4.4042406
base_loss 2.001845
reg_loss [0.44944903, 1.3732806, 0.5482421]
loss: 4.3728166
base_loss 2.0245628
reg_loss [0.45153823, 1.3690612, 0.5480142]
loss: 4.3931766
base_loss 1.8533187
reg_loss [0.4461482, 1.3647678, 0.5477746]
loss: 4.2120094
base_loss 1.9510176
reg_loss [0.44952786, 1.3602859, 0.5474066]
loss: 4.308238
base_loss 1.9348881
reg_loss [0.44669494, 1.35573

base_loss 1.9686639
reg_loss [0.3573776, 0.8900983, 0.51665384]
loss: 3.7327936
base_loss 1.9098103
reg_loss [0.3480648, 0.8858684, 0.51629406]
loss: 3.6600375
base_loss 1.762641
reg_loss [0.37307528, 0.88264203, 0.51605093]
loss: 3.534409
base_loss 2.0565205
reg_loss [0.36065355, 0.8786635, 0.515738]
loss: 3.8115757
base_loss 1.984791
reg_loss [0.36235237, 0.8758531, 0.51547444]
loss: 3.738471
base_loss 2.0146143
reg_loss [0.36097035, 0.87213224, 0.5153179]
loss: 3.7630348
base_loss 2.1139634
reg_loss [0.35014144, 0.8682214, 0.51499826]
loss: 3.8473246
base_loss 1.9591115
reg_loss [0.34832576, 0.8646901, 0.51466566]
loss: 3.6867929
base_loss 1.8681984
reg_loss [0.34933376, 0.8611768, 0.5144122]
loss: 3.593121
base_loss 1.9836628
reg_loss [0.35725442, 0.85780346, 0.51410306]
loss: 3.7128239
base_loss 2.0176668
reg_loss [0.35953388, 0.8548434, 0.513862]
loss: 3.7459059
base_loss 1.8977892
reg_loss [0.34913316, 0.85128236, 0.51361334]
loss: 3.6118183
base_loss 2.1402256
reg_loss [0.35335

base_loss 1.7599101
reg_loss [0.33065158, 0.5493595, 0.4867298]
loss: 3.126651
base_loss 2.2773633
reg_loss [0.33349204, 0.5470913, 0.48642775]
loss: 3.6443744
base_loss 2.3193207
reg_loss [0.3291806, 0.5441349, 0.486158]
loss: 3.6787944
base_loss 1.8973215
reg_loss [0.33291277, 0.542152, 0.48597842]
loss: 3.2583644
base_loss 2.0413723
reg_loss [0.32881796, 0.5392737, 0.4856249]
loss: 3.3950887
base_loss 2.0705392
reg_loss [0.3456171, 0.537499, 0.48530498]
loss: 3.4389603
base_loss 1.7426848
reg_loss [0.33983326, 0.5350218, 0.48508117]
loss: 3.102621
base_loss 2.0701466
reg_loss [0.33925852, 0.53274816, 0.48474875]
loss: 3.426902
base_loss 2.207604
reg_loss [0.33034357, 0.5300091, 0.4845163]
loss: 3.5524728
base_loss 1.8923128
reg_loss [0.33325514, 0.5280846, 0.48425362]
loss: 3.237906
base_loss 1.89166
reg_loss [0.33682433, 0.52612305, 0.48409975]
loss: 3.238707
base_loss 2.0707304
reg_loss [0.32852966, 0.5236075, 0.4838733]
loss: 3.406741
base_loss 2.1288044
reg_loss [0.3244254, 0.52

loss: 3.0400436
base_loss 1.6544299
reg_loss [0.31778935, 0.33666137, 0.45895034]
loss: 2.7678308
base_loss 1.7090814
reg_loss [0.32212707, 0.33590347, 0.45886278]
loss: 2.8259747
base_loss 1.905583
reg_loss [0.32775787, 0.33471385, 0.45856613]
loss: 3.026621
base_loss 2.153987
reg_loss [0.32424492, 0.333628, 0.45839283]
loss: 3.2702527
base_loss 1.9076028
reg_loss [0.32552445, 0.3324398, 0.45811808]
loss: 3.023685
base_loss 2.0871906
reg_loss [0.32034302, 0.3311077, 0.4579223]
loss: 3.1965635
base_loss 1.6598158
reg_loss [0.3209545, 0.32995352, 0.4576685]
loss: 2.7683923
base_loss 2.21213
reg_loss [0.32065582, 0.3288967, 0.45746306]
loss: 3.3191457
base_loss 2.1517923
reg_loss [0.31563592, 0.32731384, 0.45715666]
loss: 3.2518988
base_loss 2.1958601
reg_loss [0.31641325, 0.32626647, 0.45685843]
loss: 3.2953982
base_loss 1.704207
reg_loss [0.3142688, 0.32518622, 0.4567408]
loss: 2.8004029
base_loss 2.2624369
reg_loss [0.3122383, 0.3241763, 0.45650294]
loss: 3.3553543
base_loss 2.0782304

loss: 3.1434536
base_loss 0.9579513
reg_loss [0.32683304, 0.24846978, 0.43385613]
loss: 1.9671103
base_loss 1.729397
reg_loss [0.3297641, 0.24849124, 0.43374497]
loss: 2.7413974
base_loss 1.4210933
reg_loss [0.33206242, 0.2480907, 0.43367794]
loss: 2.4349244
base_loss 1.9682055
reg_loss [0.32939056, 0.24775396, 0.43352938]
loss: 2.9788792
base_loss 2.2008247
reg_loss [0.32661042, 0.24690144, 0.43330714]
loss: 3.2076437
base_loss 2.0638804
reg_loss [0.32474372, 0.24645251, 0.4331002]
loss: 3.068177
base_loss 1.6954825
reg_loss [0.32554358, 0.24614757, 0.43293563]
loss: 2.7001095
base_loss 2.2985485
reg_loss [0.3238002, 0.24528772, 0.4325587]
loss: 3.3001952
base_loss 1.5663786
reg_loss [0.32327005, 0.24511866, 0.4324507]
loss: 2.567218
base_loss 1.950624
reg_loss [0.32546306, 0.24458514, 0.43213716]
loss: 2.9528093
base_loss 1.394648
reg_loss [0.32520524, 0.2446183, 0.4321774]
loss: 2.396649
base_loss 1.4469049
reg_loss [0.32545683, 0.24460317, 0.43198332]
loss: 2.4489481
base_loss 2.09

loss: 2.3552828
base_loss 1.8263577
reg_loss [0.33524662, 0.20746383, 0.4112601]
loss: 2.7803283
base_loss 1.6712022
reg_loss [0.33791885, 0.20712662, 0.41100705]
loss: 2.6272545
base_loss 1.4114969
reg_loss [0.34110725, 0.20725204, 0.41089004]
loss: 2.3707461
base_loss 2.0602913
reg_loss [0.34008983, 0.20710295, 0.4107831]
loss: 3.0182672
base_loss 2.3367105
reg_loss [0.33846572, 0.20687275, 0.4103284]
loss: 3.2923772
base_loss 1.3870734
reg_loss [0.339999, 0.2064233, 0.4101405]
loss: 2.343636
base_loss 1.416247
reg_loss [0.34101447, 0.20638908, 0.41004565]
loss: 2.3736963
base_loss 1.3758559
reg_loss [0.34059864, 0.20641606, 0.40993774]
loss: 2.3328083
base_loss 2.1776242
reg_loss [0.33990452, 0.2060234, 0.40966165]
loss: 3.133214
base_loss 1.15214
reg_loss [0.34036806, 0.2060831, 0.4095788]
loss: 2.10817
base_loss 1.761204
reg_loss [0.3432889, 0.20681034, 0.4095635]
loss: 2.7208667
base_loss 2.2593303
reg_loss [0.34225845, 0.20681109, 0.40937683]
loss: 3.2177768
base_loss 2.162788
r

reg_loss [0.35329902, 0.19400232, 0.39501318]
loss: 2.970405
base_loss 1.7593333
reg_loss [0.3517391, 0.1939, 0.3947194]
loss: 2.6996918
base_loss 0.9889354
reg_loss [0.3548541, 0.19366932, 0.39457333]
loss: 1.9320322
base_loss 2.283887
reg_loss [0.35056308, 0.1934384, 0.3943847]
loss: 3.2222729
base_loss 1.5369592
reg_loss [0.3505948, 0.19355707, 0.39422703]
loss: 2.475338
base_loss 2.0211608
reg_loss [0.35024396, 0.19366683, 0.39402664]
loss: 2.9590983
base_loss 1.2618403
reg_loss [0.3502069, 0.19349544, 0.39384863]
loss: 2.1993911
base_loss 1.7803459
reg_loss [0.34899458, 0.193339, 0.3937343]
loss: 2.7164137
base_loss 1.4135878
reg_loss [0.3534302, 0.19313768, 0.39350674]
loss: 2.3536625
base_loss 2.1944637
reg_loss [0.35252392, 0.19397388, 0.393354]
loss: 3.1343153
base_loss 1.791561
reg_loss [0.35085943, 0.19360848, 0.39339378]
loss: 2.7294228
base_loss 1.2674503
reg_loss [0.35897407, 0.19341138, 0.39319]
loss: 2.2130258
base_loss 1.3409443
reg_loss [0.3565235, 0.193423, 0.3930589

reg_loss [0.3645513, 0.1861456, 0.3790245]
loss: 2.4537442
base_loss 1.4959193
reg_loss [0.36781025, 0.18667965, 0.3790437]
loss: 2.429453
base_loss 1.3689234
reg_loss [0.36614463, 0.18633679, 0.3789304]
loss: 2.3003352
base_loss 1.9459069
reg_loss [0.36491928, 0.18590473, 0.37890056]
loss: 2.8756313
base_loss 1.6341376
reg_loss [0.36548054, 0.1860502, 0.3787948]
loss: 2.5644631
base_loss 1.6423886
reg_loss [0.3673555, 0.18635425, 0.37875292]
loss: 2.5748513
base_loss 1.6582131
reg_loss [0.36585432, 0.18631946, 0.37865904]
loss: 2.5890458
base_loss 1.2723086
reg_loss [0.36551705, 0.18586072, 0.3784758]
loss: 2.2021623
base_loss 1.596724
reg_loss [0.36628103, 0.18607046, 0.3786057]
loss: 2.527681
base_loss 1.1047395
reg_loss [0.36839688, 0.18641782, 0.3785936]
loss: 2.038148
base_loss 1.332487
reg_loss [0.37030664, 0.18653876, 0.37853602]
loss: 2.2678683
base_loss 1.5132163
reg_loss [0.36826736, 0.18609616, 0.37829146]
loss: 2.445871
base_loss 1.1075919
reg_loss [0.37081355, 0.18642026,

base_loss 2.0398765
reg_loss [0.3749974, 0.18375738, 0.3655464]
loss: 2.9641776
base_loss 1.4575849
reg_loss [0.37478665, 0.18371752, 0.36535963]
loss: 2.3814485
base_loss 1.3489977
reg_loss [0.3767385, 0.1840931, 0.36557522]
loss: 2.2754045
base_loss 1.6910917
reg_loss [0.3767959, 0.18414932, 0.36576]
loss: 2.617797
base_loss 1.7296374
reg_loss [0.37686482, 0.18388402, 0.36548102]
loss: 2.655867
base_loss 1.523117
reg_loss [0.37549788, 0.18432347, 0.36592707]
loss: 2.4488654
base_loss 0.41178098
reg_loss [0.37806654, 0.18413772, 0.3655918]
loss: 1.339577
base_loss 1.6747043
reg_loss [0.37927645, 0.184299, 0.3656646]
loss: 2.6039443
base_loss 1.8126072
reg_loss [0.37962207, 0.18422528, 0.36574358]
loss: 2.7421982
base_loss 1.2798444
reg_loss [0.37722138, 0.18397436, 0.3652663]
loss: 2.2063065
base_loss 1.3525628
reg_loss [0.38389668, 0.18405218, 0.36479273]
loss: 2.2853045
base_loss 1.0392581
reg_loss [0.38508785, 0.18406269, 0.36458865]
loss: 1.9729973
base_loss 1.8532599
reg_loss [0.

base_loss 0.98092926
reg_loss [0.38104928, 0.18181832, 0.35027686]
loss: 1.8940737
base_loss 1.6590008
reg_loss [0.3816262, 0.18209264, 0.35021457]
loss: 2.5729342
base_loss 1.9667052
reg_loss [0.3788932, 0.18175554, 0.3502051]
loss: 2.8775592
base_loss 1.441305
reg_loss [0.3790169, 0.18197022, 0.3503794]
loss: 2.3526716
base_loss 0.8680544
reg_loss [0.37921873, 0.18192016, 0.3500672]
loss: 1.7792604
base_loss 1.3061304
reg_loss [0.37993667, 0.18159378, 0.3498418]
loss: 2.2175026
base_loss 1.4621326
reg_loss [0.3826974, 0.18151645, 0.34930274]
loss: 2.3756492
base_loss 1.3186347
reg_loss [0.3809008, 0.18133335, 0.34916538]
loss: 2.2300344
base_loss 1.3352926
reg_loss [0.38403025, 0.18198463, 0.3493774]
loss: 2.2506847
base_loss 0.8343032
reg_loss [0.38478142, 0.18169777, 0.34926483]
loss: 1.7500472
base_loss 0.70168173
reg_loss [0.38409737, 0.1816168, 0.34919658]
loss: 1.6165924
base_loss 2.115343
reg_loss [0.38137493, 0.1814471, 0.34925136]
loss: 3.0274162
base_loss 0.57591933
reg_los

base_loss 1.803077
reg_loss [0.38015106, 0.18074428, 0.33483958]
loss: 2.6988118
base_loss 1.2263787
reg_loss [0.3797174, 0.18071008, 0.3345634]
loss: 2.1213696
base_loss 0.8779677
reg_loss [0.3855503, 0.18049397, 0.3346351]
loss: 1.7786471
base_loss 1.3874805
reg_loss [0.38076633, 0.18090224, 0.33489028]
loss: 2.2840395
base_loss 1.7312119
reg_loss [0.3809835, 0.18077844, 0.3350889]
loss: 2.628063
base_loss 2.3667748
reg_loss [0.37993068, 0.18078336, 0.33467257]
loss: 3.2621613
base_loss 1.5160639
reg_loss [0.3826234, 0.18046686, 0.33424827]
loss: 2.4134026
base_loss 1.5677274
reg_loss [0.37948188, 0.1799029, 0.33395854]
loss: 2.4610708
base_loss 1.4700108
reg_loss [0.37817785, 0.17962818, 0.3332286]
loss: 2.3610454
base_loss 0.72626114
reg_loss [0.38116106, 0.17950693, 0.33334243]
loss: 1.6202716
base_loss 1.2563376
reg_loss [0.38478965, 0.17953783, 0.33328673]
loss: 2.153952
base_loss 1.7953141
reg_loss [0.38197726, 0.17948554, 0.3332149]
loss: 2.689992
base_loss 1.1881349
reg_loss 

loss: 2.882581
base_loss 1.1982613
reg_loss [0.38388515, 0.18103859, 0.3244151]
loss: 2.0876002
base_loss 1.3768959
reg_loss [0.38397613, 0.18106028, 0.32459784]
loss: 2.26653
base_loss 1.5986573
reg_loss [0.38378674, 0.18137334, 0.32466567]
loss: 2.488483
base_loss 0.30212468
reg_loss [0.38575432, 0.18143561, 0.32432586]
loss: 1.1936405
base_loss 1.2816466
reg_loss [0.3856978, 0.18101957, 0.32380146]
loss: 2.1721654
base_loss 1.406212
reg_loss [0.38405418, 0.18096082, 0.32410026]
loss: 2.2953272
base_loss 1.2454326
reg_loss [0.38256398, 0.18102443, 0.32417595]
loss: 2.1331968
base_loss 1.3363091
reg_loss [0.38189703, 0.18081306, 0.32419482]
loss: 2.2232141
base_loss 0.88742065
reg_loss [0.3815841, 0.1808642, 0.3240684]
loss: 1.7739375
base_loss 2.3448088
reg_loss [0.38374496, 0.18089445, 0.32374117]
loss: 3.2331893
base_loss 1.4182469
reg_loss [0.3842305, 0.18084237, 0.3233806]
loss: 2.3067002
base_loss 1.3496336
reg_loss [0.38357487, 0.1807499, 0.32363048]
loss: 2.237589
base_loss 1.

base_loss 0.9449494
reg_loss [0.39417258, 0.18051635, 0.31480333]
loss: 1.8344417
base_loss 1.2569556
reg_loss [0.39345583, 0.1805508, 0.31509385]
loss: 2.1460562
base_loss 0.879997
reg_loss [0.39481577, 0.18092532, 0.3150567]
loss: 1.7707949
base_loss 0.98804635
reg_loss [0.39332867, 0.18071522, 0.31526184]
loss: 1.8773521
base_loss 1.8440512
reg_loss [0.39135417, 0.18034182, 0.3151944]
loss: 2.7309415
base_loss 1.3900168
reg_loss [0.38980114, 0.1807243, 0.31501687]
loss: 2.275559
base_loss 1.1711941
reg_loss [0.3890892, 0.18060496, 0.3152497]
loss: 2.056138
base_loss 1.17126
reg_loss [0.38882914, 0.18060239, 0.31494617]
loss: 2.0556378
base_loss 1.214833
reg_loss [0.38853186, 0.18058641, 0.3144891]
loss: 2.0984404
base_loss 1.792995
reg_loss [0.38813272, 0.1805204, 0.31457934]
loss: 2.6762273
base_loss 0.59884465
reg_loss [0.38941345, 0.1807395, 0.31447238]
loss: 1.48347
base_loss 0.5997014
reg_loss [0.38762993, 0.1807805, 0.31448036]
loss: 1.4825921
base_loss 0.72571975
reg_loss [0.

reg_loss [0.3930711, 0.1802503, 0.30282858]
loss: 3.2624173
base_loss 1.6310796
reg_loss [0.3976843, 0.18025266, 0.3025874]
loss: 2.5116038
base_loss 1.7122004
reg_loss [0.3960881, 0.18003172, 0.30238554]
loss: 2.5907059
base_loss 0.6962788
reg_loss [0.39401573, 0.18018307, 0.30212668]
loss: 1.5726043
base_loss 1.2462947
reg_loss [0.3959971, 0.18048574, 0.30199274]
loss: 2.1247702
base_loss 1.3393598
reg_loss [0.3944568, 0.18061678, 0.3021522]
loss: 2.2165856
base_loss 0.9865973
reg_loss [0.3926655, 0.18063286, 0.30211702]
loss: 1.8620126
base_loss 1.0289125
reg_loss [0.38992384, 0.18033254, 0.30207053]
loss: 1.9012394
base_loss 1.5711694
reg_loss [0.3894647, 0.18002526, 0.30180594]
loss: 2.4424653
base_loss 1.5129032
reg_loss [0.38732833, 0.18003029, 0.30251712]
loss: 2.3827791
base_loss 1.235535
reg_loss [0.38873202, 0.1804981, 0.30331576]
loss: 2.1080809
base_loss 0.9722911
reg_loss [0.3879169, 0.18031068, 0.3029445]
loss: 1.8434633
base_loss 1.2573968
reg_loss [0.39024875, 0.180297

loss: 2.0333855
base_loss 0.8469788
reg_loss [0.38622576, 0.17968808, 0.29541472]
loss: 1.7083074
base_loss 1.5048709
reg_loss [0.39155123, 0.17983702, 0.29526183]
loss: 2.371521
base_loss 1.0739083
reg_loss [0.39335886, 0.1797533, 0.29492256]
loss: 1.941943
base_loss 0.8505856
reg_loss [0.3926787, 0.17990263, 0.29544514]
loss: 1.7186122
base_loss 1.48287
reg_loss [0.3879786, 0.17942886, 0.29506466]
loss: 2.3453422
base_loss 1.1681565
reg_loss [0.3875354, 0.17935121, 0.2950039]
loss: 2.030047
base_loss 0.6728985
reg_loss [0.38679913, 0.17947635, 0.29480758]
loss: 1.5339816
base_loss 2.0213356
reg_loss [0.38647237, 0.17918952, 0.29505414]
loss: 2.8820517
base_loss 1.8841853
reg_loss [0.3862357, 0.17897803, 0.2944121]
loss: 2.7438111
base_loss 0.78968406
reg_loss [0.39602518, 0.1792307, 0.29392093]
loss: 1.6588609
base_loss 0.78872615
reg_loss [0.39633128, 0.17944781, 0.29362875]
loss: 1.658134
base_loss 0.65916014
reg_loss [0.39584076, 0.179478, 0.29335138]
loss: 1.5278304
base_loss 1.4

loss: 1.0910431
base_loss 2.0094194
reg_loss [0.3994789, 0.17785394, 0.28441727]
loss: 2.8711696
base_loss 1.7288768
reg_loss [0.39670584, 0.17733364, 0.28441575]
loss: 2.587332
base_loss 0.88352644
reg_loss [0.3936976, 0.17733616, 0.28462237]
loss: 1.7391827
base_loss 0.9748641
reg_loss [0.39546913, 0.17730424, 0.28451085]
loss: 1.8321483
base_loss 1.3664998
reg_loss [0.3929632, 0.17724116, 0.28484008]
loss: 2.2215443
base_loss 1.3876325
reg_loss [0.39043102, 0.17703043, 0.2853329]
loss: 2.2404268
base_loss 0.93940425
reg_loss [0.38845858, 0.17720723, 0.28518504]
loss: 1.7902551
base_loss 1.2182422
reg_loss [0.38869163, 0.1771339, 0.28467724]
loss: 2.068745
base_loss 1.7626185
reg_loss [0.3844087, 0.17676786, 0.28522152]
loss: 2.6090167
base_loss 0.87312716
reg_loss [0.38500273, 0.17664416, 0.2851539]
loss: 1.7199279
base_loss 1.4383872
reg_loss [0.3848079, 0.1766427, 0.2849174]
loss: 2.2847552
base_loss 1.5217748
reg_loss [0.38729817, 0.17660218, 0.28488457]
loss: 2.3705597
base_loss

base_loss 1.2087822
reg_loss [0.38061666, 0.1759603, 0.27655116]
loss: 2.0419104
base_loss 0.7400985
reg_loss [0.38185588, 0.17610672, 0.27638406]
loss: 1.574445
base_loss 1.579906
reg_loss [0.38489562, 0.1760605, 0.27639034]
loss: 2.4172523
base_loss 1.1469702
reg_loss [0.3827782, 0.17618889, 0.27665707]
loss: 1.9825943
base_loss 1.3022299
reg_loss [0.3837215, 0.17610939, 0.27704546]
loss: 2.1391063
base_loss 0.71376437
reg_loss [0.38184008, 0.1762899, 0.27719632]
loss: 1.5490906
base_loss 1.1741246
reg_loss [0.37948766, 0.17595832, 0.27651194]
loss: 2.0060825
base_loss 0.60555667
reg_loss [0.37968913, 0.176173, 0.27631485]
loss: 1.4377337
base_loss 1.5210929
reg_loss [0.37754574, 0.17584434, 0.27644733]
loss: 2.3509302
base_loss 0.5652635
reg_loss [0.3767731, 0.17596778, 0.27641302]
loss: 1.3944175
base_loss 1.3662798
reg_loss [0.37539488, 0.17567185, 0.27616894]
loss: 2.1935153
base_loss 1.7812636
reg_loss [0.37404162, 0.1755574, 0.27581683]
loss: 2.6066794
base_loss 1.4499013
reg_l

base_loss 1.9573245
reg_loss [0.39572403, 0.1752832, 0.27101877]
loss: 2.7993505
base_loss 0.8707071
reg_loss [0.4039312, 0.17558734, 0.27093294]
loss: 1.7211585
base_loss 1.3279897
reg_loss [0.40038884, 0.17547995, 0.2704242]
loss: 2.1742826
base_loss 1.361506
reg_loss [0.3994197, 0.17514378, 0.2708678]
loss: 2.2069373
base_loss 1.2123328
reg_loss [0.39735937, 0.1750246, 0.27015764]
loss: 2.0548744
base_loss 1.5413308
reg_loss [0.39533728, 0.17457555, 0.2696745]
loss: 2.3809183
base_loss 1.6122816
reg_loss [0.39445707, 0.17484264, 0.2698271]
loss: 2.4514084
base_loss 0.6278411
reg_loss [0.39313057, 0.17499405, 0.2698185]
loss: 1.4657843
base_loss 0.8111607
reg_loss [0.3909406, 0.17501225, 0.2697089]
loss: 1.6468223
base_loss 1.2120576
reg_loss [0.38960594, 0.1746727, 0.26969263]
loss: 2.0460289
base_loss 1.1851559
reg_loss [0.38884765, 0.17495939, 0.26999453]
loss: 2.0189574
base_loss 0.638403
reg_loss [0.39836514, 0.17555231, 0.26970917]
loss: 1.4820297
base_loss 0.81446004
reg_loss 

base_loss 1.0878859
reg_loss [0.39525378, 0.17304665, 0.26382956]
loss: 1.9200159
base_loss 1.3115469
reg_loss [0.3916911, 0.17279014, 0.26423344]
loss: 2.1402617
base_loss 1.2108462
reg_loss [0.38876426, 0.17246482, 0.26437384]
loss: 2.0364492
base_loss 1.2047124
reg_loss [0.39470112, 0.17268531, 0.26465908]
loss: 2.036758
base_loss 0.4496532
reg_loss [0.39736518, 0.17321607, 0.2646122]
loss: 1.2848467
base_loss 1.2664897
reg_loss [0.39573726, 0.17316459, 0.2645968]
loss: 2.0999885
base_loss 1.3485986
reg_loss [0.39430687, 0.17305046, 0.26479357]
loss: 2.1807497
base_loss 1.1647726
reg_loss [0.39373037, 0.1731086, 0.26499915]
loss: 1.9966108
base_loss 1.5603925
reg_loss [0.3960939, 0.17287159, 0.2644415]
loss: 2.3937995
base_loss 1.0674726
reg_loss [0.39670226, 0.17279138, 0.26487035]
loss: 1.9018366
base_loss 1.2177634
reg_loss [0.39527163, 0.17276892, 0.26500788]
loss: 2.050812
base_loss 0.5361835
reg_loss [0.39481482, 0.17285374, 0.26510784]
loss: 1.3689599
base_loss 0.98232675
reg

base_loss 0.059566252
reg_loss [0.39848852, 0.17152092, 0.25712144]
loss: 0.8866972
base_loss 0.060391244
reg_loss [0.39762348, 0.17143655, 0.2568748]
loss: 0.8863261
base_loss 0.6143157
reg_loss [0.39562827, 0.17153141, 0.25702065]
loss: 1.4384961
base_loss 0.8858967
reg_loss [0.3981444, 0.17182203, 0.25745472]
loss: 1.7133179
base_loss 1.5508697
reg_loss [0.39609146, 0.1719856, 0.25688702]
loss: 2.3758337
base_loss 1.0785826
reg_loss [0.39535508, 0.17165047, 0.2572018]
loss: 1.90279
base_loss 1.1850433
reg_loss [0.39694542, 0.1715643, 0.25698802]
loss: 2.0105412
base_loss 1.1385574
reg_loss [0.39672026, 0.17141019, 0.25701153]
loss: 1.9636995
base_loss 1.2202835
reg_loss [0.3946889, 0.17132558, 0.25701308]
loss: 2.043311
base_loss 1.410763
reg_loss [0.394135, 0.17108203, 0.25752228]
loss: 2.2335024
base_loss 1.3077543
reg_loss [0.39194122, 0.17122756, 0.2574822]
loss: 2.1284053
base_loss 1.1152622
reg_loss [0.3914606, 0.17106338, 0.2570613]
loss: 1.9348474
base_loss 1.2558842
reg_los

loss: 2.0051491
base_loss 1.6924794
reg_loss [0.39574748, 0.16852728, 0.2514855]
loss: 2.5082397
base_loss 0.54346
reg_loss [0.39432415, 0.16867727, 0.2518174]
loss: 1.358279
base_loss 0.89697856
reg_loss [0.39737722, 0.16862509, 0.25257072]
loss: 1.7155516
base_loss 1.6933783
reg_loss [0.39566982, 0.16870145, 0.25236043]
loss: 2.51011
base_loss 0.74656624
reg_loss [0.39522347, 0.16863468, 0.25217462]
loss: 1.562599
base_loss 1.5569807
reg_loss [0.39446384, 0.16864736, 0.2523593]
loss: 2.3724513
base_loss 1.8142773
reg_loss [0.39694718, 0.16846715, 0.2519802]
loss: 2.631672
base_loss 1.5766258
reg_loss [0.3961008, 0.16870496, 0.25249338]
loss: 2.393925
base_loss 1.0882182
reg_loss [0.39606407, 0.16875508, 0.25289366]
loss: 1.905931
base_loss 1.0248001
reg_loss [0.3962942, 0.16892776, 0.2529722]
loss: 1.8429942
base_loss 1.4072208
reg_loss [0.3942634, 0.16872664, 0.25326383]
loss: 2.2234747
base_loss 1.0817976
reg_loss [0.39178795, 0.16857406, 0.25247046]
loss: 1.8946302
base_loss 0.784

loss: 1.2746369
base_loss 1.6643996
reg_loss [0.39260712, 0.16645172, 0.24682966]
loss: 2.4702883
base_loss 1.3384844
reg_loss [0.39156812, 0.166537, 0.24675366]
loss: 2.1433432
base_loss 1.7072115
reg_loss [0.39308533, 0.16620281, 0.24635863]
loss: 2.5128582
base_loss 1.179014
reg_loss [0.39118537, 0.16639729, 0.24645643]
loss: 1.9830531
base_loss 1.4366475
reg_loss [0.39176744, 0.16608696, 0.24611542]
loss: 2.2406173
base_loss 0.33597106
reg_loss [0.3900521, 0.16628143, 0.24588248]
loss: 1.1381872
base_loss 1.1158738
reg_loss [0.38997146, 0.1661115, 0.24584942]
loss: 1.9178061
base_loss 1.0347471
reg_loss [0.39187104, 0.16653676, 0.24555837]
loss: 1.8387133
base_loss 0.9705992
reg_loss [0.3925171, 0.16627201, 0.24613227]
loss: 1.7755206
base_loss 1.3658321
reg_loss [0.39336485, 0.16649696, 0.24619669]
loss: 2.1718907
base_loss 0.93514574
reg_loss [0.39327738, 0.1663457, 0.24620287]
loss: 1.7409717
base_loss 0.67775774
reg_loss [0.39319003, 0.16650005, 0.2463644]
loss: 1.4838122
base_

base_loss 1.3189561
reg_loss [0.3978336, 0.1656047, 0.24388875]
loss: 2.1262832
base_loss 1.4505205
reg_loss [0.39686152, 0.1653241, 0.24313663]
loss: 2.255843
base_loss 1.5011933
reg_loss [0.39821196, 0.1657999, 0.2430845]
loss: 2.3082895
base_loss 2.032457
reg_loss [0.39705047, 0.16536626, 0.2424535]
loss: 2.8373272
base_loss 0.99497414
reg_loss [0.39550272, 0.16520938, 0.24270126]
loss: 1.7983875
base_loss 1.2642326
reg_loss [0.39543754, 0.16494243, 0.24262777]
loss: 2.0672402
base_loss 0.4739392
reg_loss [0.3934623, 0.16501729, 0.24272452]
loss: 1.2751433
base_loss 1.4401939
reg_loss [0.39440885, 0.16497727, 0.24252881]
loss: 2.2421088
base_loss 1.4297669
reg_loss [0.3924501, 0.16482465, 0.2427197]
loss: 2.2297614
base_loss 0.825127
reg_loss [0.3914009, 0.1647002, 0.2423932]
loss: 1.6236212
base_loss 0.73353004
reg_loss [0.39154065, 0.1647928, 0.24280195]
loss: 1.5326654
base_loss 0.9175064
reg_loss [0.39171326, 0.1647691, 0.24255852]
loss: 1.7165471
base_loss 0.56020164
reg_loss [

base_loss 1.8267949
reg_loss [0.39872718, 0.16155021, 0.23713283]
loss: 2.624205
base_loss 1.228734
reg_loss [0.39973927, 0.16156484, 0.23646143]
loss: 2.0264995
base_loss 1.1401372
reg_loss [0.40122813, 0.16132069, 0.23686996]
loss: 1.939556
base_loss 1.760247
reg_loss [0.39957932, 0.16117027, 0.23689513]
loss: 2.5578916
base_loss 1.2989717
reg_loss [0.3981905, 0.16092774, 0.23691694]
loss: 2.095007
base_loss 0.63241637
reg_loss [0.39901564, 0.16118632, 0.23638293]
loss: 1.4290013
base_loss 0.99696815
reg_loss [0.39772785, 0.16117392, 0.23648681]
loss: 1.7923567
base_loss 0.40575626
reg_loss [0.3981074, 0.16134277, 0.2364159]
loss: 1.2016222
base_loss 0.22052088
reg_loss [0.39679152, 0.16148984, 0.23633537]
loss: 1.0151377
base_loss 0.98313904
reg_loss [0.396273, 0.1614545, 0.23625281]
loss: 1.7771194
base_loss 0.94231945
reg_loss [0.3952704, 0.16159141, 0.2364396]
loss: 1.7356209
base_loss 0.35360223
reg_loss [0.39755675, 0.16191737, 0.23629059]
loss: 1.149367
base_loss 1.2535537
reg

base_loss 2.0110562
reg_loss [0.39590454, 0.15957043, 0.23344621]
loss: 2.7999773
base_loss 0.7286178
reg_loss [0.3977014, 0.15980813, 0.23340134]
loss: 1.5195286
base_loss 0.6415173
reg_loss [0.39643273, 0.1595649, 0.23322736]
loss: 1.4307423
base_loss 0.74579287
reg_loss [0.39509398, 0.15960893, 0.23317926]
loss: 1.533675
base_loss 0.9741774
reg_loss [0.39419803, 0.1596009, 0.23342113]
loss: 1.7613974
base_loss 1.010844
reg_loss [0.3922159, 0.15976875, 0.23431937]
loss: 1.797148
base_loss 1.5685742
reg_loss [0.39280027, 0.15971927, 0.23395197]
loss: 2.3550458
base_loss 0.4292956
reg_loss [0.39115432, 0.15972255, 0.23369895]
loss: 1.2138715
base_loss 0.967162
reg_loss [0.39078677, 0.15954225, 0.23399587]
loss: 1.7514869
base_loss 0.9481684
reg_loss [0.3905979, 0.15970097, 0.23466247]
loss: 1.7331297
base_loss 0.74952334
reg_loss [0.39087856, 0.15953588, 0.23469594]
loss: 1.5346336
base_loss 0.83631635
reg_loss [0.38985363, 0.15955731, 0.23439193]
loss: 1.6201192
base_loss 0.67274934
r

reg_loss [0.3947345, 0.15997885, 0.23417297]
loss: 0.95642793
base_loss 1.2068768
reg_loss [0.39441967, 0.1599073, 0.2343738]
loss: 1.9955776
base_loss 0.9340422
reg_loss [0.40299907, 0.16045304, 0.23487012]
loss: 1.7323644
base_loss 0.821383
reg_loss [0.4018988, 0.16025913, 0.2345706]
loss: 1.6181116
base_loss 0.4989407
reg_loss [0.400745, 0.16050813, 0.23426206]
loss: 1.294456
base_loss 1.3343024
reg_loss [0.3983427, 0.16012895, 0.2338071]
loss: 2.1265812
base_loss 1.272231
reg_loss [0.39639077, 0.16015363, 0.23359397]
loss: 2.0623693
base_loss 0.5903372
reg_loss [0.39671797, 0.15996912, 0.23360017]
loss: 1.3806244
base_loss 0.4428122
reg_loss [0.39555535, 0.16013011, 0.23402122]
loss: 1.2325189
base_loss 0.91930485
reg_loss [0.39527637, 0.16003193, 0.23448972]
loss: 1.7091027
base_loss 1.1180893
reg_loss [0.39564276, 0.15995325, 0.23431571]
loss: 1.9080011
base_loss 1.1876673
reg_loss [0.39528337, 0.15974823, 0.23426017]
loss: 1.976959
base_loss 0.8996014
reg_loss [0.39907157, 0.160

base_loss 0.6093436
reg_loss [0.39889035, 0.15800558, 0.22912145]
loss: 1.395361
base_loss 1.3490924
reg_loss [0.39599174, 0.15780203, 0.22911851]
loss: 2.1320045
base_loss 0.800052
reg_loss [0.3953768, 0.15778509, 0.22949247]
loss: 1.5827063
base_loss 1.4778371
reg_loss [0.39432278, 0.1578489, 0.2296004]
loss: 2.2596092
base_loss 1.5732603
reg_loss [0.3929687, 0.15755863, 0.22987825]
loss: 2.3536658
base_loss 0.45576018
reg_loss [0.39223862, 0.15774727, 0.22962932]
loss: 1.2353754
base_loss 1.5767769
reg_loss [0.39075825, 0.1574699, 0.2286756]
loss: 2.3536806
base_loss 0.95765483
reg_loss [0.3896287, 0.15728809, 0.22912616]
loss: 1.7336979
base_loss 0.97621334
reg_loss [0.3892214, 0.15705833, 0.22888416]
loss: 1.7513773
base_loss 1.0429313
reg_loss [0.38776115, 0.15715258, 0.22926114]
loss: 1.8171061
base_loss 1.247912
reg_loss [0.3878902, 0.15716979, 0.22938205]
loss: 2.0223541
base_loss 1.5559304
reg_loss [0.38780624, 0.1573058, 0.22956182]
loss: 2.330604
base_loss 0.9184154
reg_los

reg_loss [0.39388198, 0.15931983, 0.22871383]
loss: 1.915564
base_loss 1.2898781
reg_loss [0.39273593, 0.15944809, 0.22876666]
loss: 2.070829
base_loss 0.8205595
reg_loss [0.39228168, 0.15937541, 0.22929649]
loss: 1.601513
base_loss 0.94095194
reg_loss [0.39139053, 0.15946095, 0.22898224]
loss: 1.7207856
base_loss 0.76428586
reg_loss [0.39108703, 0.15920839, 0.22912471]
loss: 1.5437059
base_loss 0.9717745
reg_loss [0.39316544, 0.15954867, 0.22941259]
loss: 1.7539011
base_loss 0.45615324
reg_loss [0.39302492, 0.1592629, 0.22990364]
loss: 1.2383447
base_loss 0.6687401
reg_loss [0.39390522, 0.15925373, 0.23042554]
loss: 1.4523246
base_loss 0.8900775
reg_loss [0.39498153, 0.15920393, 0.23128626]
loss: 1.6755491
base_loss 1.1628705
reg_loss [0.3955754, 0.15915263, 0.23114985]
loss: 1.9487484
base_loss 0.3706847
reg_loss [0.39450067, 0.15896323, 0.23115522]
loss: 1.1553037
base_loss 0.47549602
reg_loss [0.3931414, 0.15908363, 0.23134439]
loss: 1.2590654
base_loss 1.4327753
reg_loss [0.392585

base_loss 2.176395
reg_loss [0.39176807, 0.15944727, 0.22885452]
loss: 2.9564645
base_loss 0.5377799
reg_loss [0.39161688, 0.15927385, 0.22929403]
loss: 1.3179647
base_loss 0.5480116
reg_loss [0.39105836, 0.15936674, 0.22935848]
loss: 1.3277951
base_loss 1.0990798
reg_loss [0.39203656, 0.15949774, 0.22865468]
loss: 1.8792689
base_loss 1.3422439
reg_loss [0.39077038, 0.15950392, 0.22880802]
loss: 2.1213262
base_loss 1.1602935
reg_loss [0.38962546, 0.15941136, 0.22842278]
loss: 1.937753
base_loss 1.3807395
reg_loss [0.38961697, 0.15922685, 0.22778976]
loss: 2.157373
base_loss 1.2908812
reg_loss [0.39020014, 0.15934162, 0.22802915]
loss: 2.0684521
base_loss 1.0351764
reg_loss [0.39096588, 0.1590894, 0.22804083]
loss: 1.8132725
base_loss 0.51720715
reg_loss [0.3893827, 0.1591047, 0.22810993]
loss: 1.2938045
base_loss 0.25255743
reg_loss [0.3884821, 0.1591746, 0.22781862]
loss: 1.0280328
base_loss 0.7979355
reg_loss [0.3884639, 0.15941587, 0.22773848]
loss: 1.5735538
base_loss 0.98032546
re

reg_loss [0.40491226, 0.15783003, 0.22639076]
loss: 1.7150984
base_loss 0.817507
reg_loss [0.40704185, 0.15775166, 0.22655909]
loss: 1.6088595
base_loss 1.2696643
reg_loss [0.4064476, 0.15781914, 0.2268235]
loss: 2.0607545
base_loss 0.88164055
reg_loss [0.40693545, 0.15752478, 0.22748424]
loss: 1.673585
base_loss 0.93631107
reg_loss [0.40636668, 0.15763876, 0.22816104]
loss: 1.7284775
base_loss 0.8382362
reg_loss [0.4072315, 0.15748829, 0.22807442]
loss: 1.6310304
base_loss 1.356138
reg_loss [0.408079, 0.15766692, 0.22799964]
loss: 2.1498835
base_loss 1.9882662
reg_loss [0.41505924, 0.15773484, 0.2271296]
loss: 2.7881901
base_loss 2.9209752
reg_loss [0.41628703, 0.15776356, 0.22653358]
loss: 3.7215593
base_loss 1.5768788
reg_loss [0.4169785, 0.15726192, 0.22507901]
loss: 2.3761983
base_loss 0.5560112
reg_loss [0.41487736, 0.15699281, 0.22492546]
loss: 1.3528068
base_loss 0.5743209
reg_loss [0.41568148, 0.15684308, 0.22514592]
loss: 1.3719914
base_loss 2.9997172
reg_loss [0.4158824, 0.1

base_loss 0.82801497
reg_loss [0.40574706, 0.15746742, 0.2235386]
loss: 1.614768
base_loss 1.6030474
reg_loss [0.40877333, 0.15733516, 0.2231019]
loss: 2.392258
base_loss 0.52991956
reg_loss [0.4067996, 0.15744595, 0.2227453]
loss: 1.3169104
base_loss 0.2761292
reg_loss [0.40752673, 0.15733594, 0.22293657]
loss: 1.0639284
base_loss 0.58770835
reg_loss [0.41009414, 0.1578698, 0.22313707]
loss: 1.3788093
base_loss 0.32627
reg_loss [0.4088703, 0.15746447, 0.22353631]
loss: 1.1161411
base_loss 0.8379885
reg_loss [0.40735558, 0.15780532, 0.22367765]
loss: 1.626827
base_loss 1.516647
reg_loss [0.40541893, 0.15738133, 0.22344379]
loss: 2.302891
base_loss 0.96196866
reg_loss [0.40548795, 0.15772516, 0.22503228]
loss: 1.7502141
base_loss 0.7303082
reg_loss [0.4049095, 0.15751879, 0.22499251]
loss: 1.5177289
base_loss 0.70590246
reg_loss [0.40411812, 0.15776177, 0.22505462]
loss: 1.4928371
base_loss 0.9396437
reg_loss [0.41105574, 0.15755403, 0.22475858]
loss: 1.7330121
base_loss 0.20195091
reg_

base_loss 1.2762358
reg_loss [0.4157168, 0.15890993, 0.22397737]
loss: 2.0748398
base_loss 0.81832457
reg_loss [0.41504118, 0.15939188, 0.22366828]
loss: 1.616426
base_loss 0.94808346
reg_loss [0.41369385, 0.15882862, 0.22427621]
loss: 1.7448821
base_loss 0.8473843
reg_loss [0.412817, 0.15895124, 0.22487845]
loss: 1.6440309
base_loss 0.7757219
reg_loss [0.41259795, 0.15888284, 0.22531223]
loss: 1.572515
base_loss 0.73418
reg_loss [0.4129546, 0.15913275, 0.22622894]
loss: 1.5324962
base_loss 0.80769193
reg_loss [0.41326353, 0.15938161, 0.22561498]
loss: 1.6059521
base_loss 0.62238955
reg_loss [0.412163, 0.1597479, 0.22602212]
loss: 1.4203225
base_loss 0.2924963
reg_loss [0.41384855, 0.15973642, 0.22585714]
loss: 1.0919384
base_loss 1.0178213
reg_loss [0.41333458, 0.1597933, 0.22661987]
loss: 1.817569
base_loss 1.2989068
reg_loss [0.4166515, 0.15946493, 0.22610071]
loss: 2.101124
base_loss 0.8180624
reg_loss [0.41559142, 0.15947782, 0.22602993]
loss: 1.6191616
base_loss 0.16338398
reg_lo

reg_loss [0.42252752, 0.16104838, 0.22713172]
loss: 1.2220006
base_loss 0.37062356
reg_loss [0.4224853, 0.16093826, 0.22706047]
loss: 1.1811075
base_loss 0.057877705
reg_loss [0.42155716, 0.16104037, 0.22687922]
loss: 0.86735445
base_loss 0.8451789
reg_loss [0.42055184, 0.16088367, 0.22648594]
loss: 1.6531004
base_loss 1.8143249
reg_loss [0.4191046, 0.161119, 0.22651109]
loss: 2.6210594
base_loss 0.8317994
reg_loss [0.4189779, 0.1606361, 0.22694793]
loss: 1.6383612
base_loss 2.819131
reg_loss [0.4175076, 0.16034156, 0.2258175]
loss: 3.6227975
base_loss 0.90857995
reg_loss [0.41837135, 0.16017132, 0.22627369]
loss: 1.7133962
base_loss 0.99153733
reg_loss [0.41791853, 0.16045359, 0.22663811]
loss: 1.7965475
base_loss 1.1538415
reg_loss [0.41972923, 0.16042091, 0.22719969]
loss: 1.9611913
base_loss 0.9699349
reg_loss [0.4191423, 0.160526, 0.22777504]
loss: 1.7773783
base_loss 0.9012222
reg_loss [0.4203681, 0.16045822, 0.22861893]
loss: 1.7106675
base_loss 0.51002795
reg_loss [0.42031252, 

base_loss 0.7989374
reg_loss [0.4245233, 0.16312714, 0.22981198]
loss: 1.6163999
base_loss 0.7306885
reg_loss [0.4235869, 0.16290107, 0.22894853]
loss: 1.5461249
base_loss 0.6042139
reg_loss [0.42275363, 0.16304642, 0.22869726]
loss: 1.4187112
base_loss 1.0414401
reg_loss [0.4224313, 0.16289058, 0.22880836]
loss: 1.8555704
base_loss 1.248244
reg_loss [0.42152023, 0.1631051, 0.22906318]
loss: 2.0619326
base_loss 0.78438556
reg_loss [0.4213512, 0.16275516, 0.22896]
loss: 1.5974519
base_loss 0.8408622
reg_loss [0.4192766, 0.16277163, 0.22814201]
loss: 1.6510524
base_loss 0.39157367
reg_loss [0.41944847, 0.16272031, 0.22753786]
loss: 1.2012804
base_loss 0.90893155
reg_loss [0.42559394, 0.1628574, 0.22666542]
loss: 1.7240483
base_loss 0.7086691
reg_loss [0.4246912, 0.1626028, 0.22678417]
loss: 1.5227473
base_loss 1.1021292
reg_loss [0.4330137, 0.1632584, 0.22763228]
loss: 1.9260336
base_loss 0.96017253
reg_loss [0.43328413, 0.16333908, 0.22806859]
loss: 1.7848644
base_loss 1.060595
reg_loss

base_loss 0.41645747
reg_loss [0.4274136, 0.16314445, 0.22756998]
loss: 1.2345855
base_loss 0.3200438
reg_loss [0.42675644, 0.16315317, 0.22776122]
loss: 1.1377146
base_loss 1.3663349
reg_loss [0.42531586, 0.1630971, 0.22748974]
loss: 2.1822376
base_loss 1.3292422
reg_loss [0.4269167, 0.16301626, 0.22680944]
loss: 2.1459846
base_loss 0.2744855
reg_loss [0.42888755, 0.16324554, 0.2269253]
loss: 1.0935439
base_loss 0.5057124
reg_loss [0.4288283, 0.1631782, 0.22706817]
loss: 1.3247871
base_loss 0.52549833
reg_loss [0.43347058, 0.16400708, 0.22700696]
loss: 1.3499829
base_loss 0.42456025
reg_loss [0.43412378, 0.16364951, 0.22687708]
loss: 1.2492107
base_loss 0.40140063
reg_loss [0.4346765, 0.1638639, 0.2268096]
loss: 1.2267506
base_loss 0.936362
reg_loss [0.43394622, 0.16345806, 0.22685039]
loss: 1.7606168
base_loss 0.7483431
reg_loss [0.43352354, 0.16375068, 0.22661792]
loss: 1.5722352
base_loss 1.0637338
reg_loss [0.43256706, 0.1633424, 0.22717859]
loss: 1.8868219
base_loss 0.51908404
re

loss: 1.7500935
base_loss 0.62615687
reg_loss [0.44385082, 0.16612574, 0.23018496]
loss: 1.4663184
base_loss 0.23056535
reg_loss [0.44843265, 0.16651303, 0.2307692]
loss: 1.0762802
base_loss 0.47581035
reg_loss [0.44868475, 0.16620494, 0.23058717]
loss: 1.3212872
base_loss 0.7347828
reg_loss [0.44603744, 0.16633949, 0.23074378]
loss: 1.5779035
base_loss 0.5647429
reg_loss [0.44531894, 0.16632763, 0.2306997]
loss: 1.4070891
base_loss 2.3081784
reg_loss [0.4432566, 0.1666663, 0.22959688]
loss: 3.1476982
base_loss 0.47188243
reg_loss [0.44574103, 0.16625996, 0.22932081]
loss: 1.3132042
base_loss 1.0635422
reg_loss [0.44537988, 0.16587369, 0.23050001]
loss: 1.9052957
base_loss 1.3166482
reg_loss [0.4462667, 0.16561252, 0.23123729]
loss: 2.1597648
base_loss 1.4119875
reg_loss [0.4468651, 0.16575582, 0.23149437]
loss: 2.2561028
base_loss 0.36121434
reg_loss [0.4473086, 0.16563946, 0.23138967]
loss: 1.2055521
base_loss 1.0453656
reg_loss [0.45027807, 0.16627342, 0.23065849]
loss: 1.8925755
ba

base_loss 0.64037263
reg_loss [0.44427904, 0.16525449, 0.22936614]
loss: 1.4792724
base_loss 1.9385872
reg_loss [0.4448728, 0.16496266, 0.22894292]
loss: 2.7773657
base_loss 0.7387793
reg_loss [0.4444808, 0.16503792, 0.22919458]
loss: 1.5774925
base_loss 0.51103765
reg_loss [0.4431491, 0.16478929, 0.22976795]
loss: 1.3487439
base_loss 0.7097215
reg_loss [0.44137776, 0.16504325, 0.2295709]
loss: 1.5457133
base_loss 0.35403395
reg_loss [0.4407602, 0.16484925, 0.22931713]
loss: 1.1889606
base_loss 0.5326803
reg_loss [0.43967354, 0.16518432, 0.22908396]
loss: 1.3666222
base_loss 0.8985262
reg_loss [0.44054702, 0.16493008, 0.22985676]
loss: 1.73386
base_loss 0.8245491
reg_loss [0.44277945, 0.16497236, 0.22943504]
loss: 1.6617359
base_loss 0.8296313
reg_loss [0.44291392, 0.16450858, 0.22969617]
loss: 1.66675
base_loss 0.4740959
reg_loss [0.44169918, 0.16444321, 0.22986223]
loss: 1.3101006
base_loss 0.55725163
reg_loss [0.44216377, 0.16449936, 0.22955158]
loss: 1.3934664
base_loss 0.4537937
r

base_loss 1.520646
reg_loss [0.43928036, 0.16636881, 0.23197283]
loss: 2.358268
base_loss 1.9856877
reg_loss [0.43769687, 0.16594875, 0.23229179]
loss: 2.8216252
base_loss 0.66852176
reg_loss [0.4371101, 0.16610315, 0.23243216]
loss: 1.5041671
base_loss 0.42671293
reg_loss [0.43631622, 0.16590828, 0.23260486]
loss: 1.2615422
base_loss 0.41137597
reg_loss [0.43448585, 0.16593532, 0.23275124]
loss: 1.2445483
base_loss 1.0755184
reg_loss [0.43951026, 0.16624668, 0.23203361]
loss: 1.9133089
base_loss 0.88620716
reg_loss [0.43862003, 0.16583946, 0.23144405]
loss: 1.7221106
base_loss 0.82607776
reg_loss [0.43756568, 0.1654739, 0.2315514]
loss: 1.6606688
base_loss 0.5525341
reg_loss [0.43688944, 0.16555065, 0.23186134]
loss: 1.3868356
base_loss 0.10235326
reg_loss [0.4365819, 0.16554107, 0.23171686]
loss: 0.9361931
base_loss 0.73556566
reg_loss [0.43492427, 0.16594398, 0.23221405]
loss: 1.568648
base_loss 0.6354529
reg_loss [0.43616652, 0.16569854, 0.23295134]
loss: 1.4702694
base_loss 1.0364

loss: 1.5779388
base_loss 0.41624677
reg_loss [0.43994698, 0.1664185, 0.23549688]
loss: 1.2581092
base_loss 0.7545531
reg_loss [0.4397346, 0.16661748, 0.23460765]
loss: 1.5955129
base_loss 0.52563167
reg_loss [0.45198923, 0.16767307, 0.23492937]
loss: 1.3802233
base_loss 1.1453254
reg_loss [0.4496467, 0.16731872, 0.23503879]
loss: 1.9973296
base_loss 0.98916227
reg_loss [0.449365, 0.16718274, 0.23485427]
loss: 1.8405643
base_loss 0.96924615
reg_loss [0.45052612, 0.16696498, 0.23457557]
loss: 1.8213129
base_loss 0.7546495
reg_loss [0.45076674, 0.16651382, 0.23328401]
loss: 1.605214
base_loss 0.8489113
reg_loss [0.45095283, 0.16637039, 0.23342316]
loss: 1.6996576
base_loss 0.710374
reg_loss [0.45102322, 0.16604103, 0.23329017]
loss: 1.5607284
base_loss 0.9161344
reg_loss [0.44874483, 0.16604078, 0.23347393]
loss: 1.7643939
base_loss 0.25209147
reg_loss [0.44822165, 0.16599327, 0.233273]
loss: 1.0995793
base_loss 1.1005673
reg_loss [0.44853252, 0.16628431, 0.23338659]
loss: 1.9487709
base

reg_loss [0.45519903, 0.16843155, 0.23415574]
loss: 1.5836296
base_loss 0.9301554
reg_loss [0.45848337, 0.1683394, 0.23474361]
loss: 1.7917217
base_loss 0.42541763
reg_loss [0.45758018, 0.16835383, 0.23457216]
loss: 1.2859238
base_loss 0.41327322
reg_loss [0.45916393, 0.1683712, 0.23471786]
loss: 1.2755262
base_loss 0.5970397
reg_loss [0.45736808, 0.16846448, 0.23422419]
loss: 1.4570965
base_loss 1.1944551
reg_loss [0.45670202, 0.16834657, 0.23447323]
loss: 2.053977
base_loss 0.5544363
reg_loss [0.45484903, 0.16864574, 0.23564088]
loss: 1.413572
base_loss 0.2751415
reg_loss [0.45441985, 0.16862817, 0.2352372]
loss: 1.1334267
base_loss 2.1751204
reg_loss [0.45945197, 0.1693074, 0.2348216]
loss: 3.0387013
base_loss 1.0004942
reg_loss [0.4596824, 0.16914162, 0.23464696]
loss: 1.8639653
base_loss 0.47897083
reg_loss [0.45915657, 0.16899738, 0.23487836]
loss: 1.3420031
base_loss 0.4157829
reg_loss [0.4589385, 0.16881329, 0.23490804]
loss: 1.2784429
base_loss 1.1075125
reg_loss [0.4620563, 0

base_loss 0.40547276
reg_loss [0.45493227, 0.16843945, 0.23275577]
loss: 1.2616003
base_loss 1.1960094
reg_loss [0.45406714, 0.1680419, 0.23224847]
loss: 2.0503669
base_loss 0.9401838
reg_loss [0.45639113, 0.16800855, 0.23250614]
loss: 1.7970897
base_loss 0.65367377
reg_loss [0.4568486, 0.16795982, 0.23301248]
loss: 1.5114946
base_loss 0.8529881
reg_loss [0.45544773, 0.16790739, 0.23234256]
loss: 1.7086859
base_loss 0.68565845
reg_loss [0.4561817, 0.16791731, 0.23176166]
loss: 1.5415192
base_loss 0.56514865
reg_loss [0.45592484, 0.16803308, 0.23227037]
loss: 1.421377
base_loss 0.89004505
reg_loss [0.4601599, 0.16826206, 0.23253165]
loss: 1.7509986
base_loss 0.35355866
reg_loss [0.45987505, 0.16810977, 0.23189811]
loss: 1.2134416
base_loss 0.45335293
reg_loss [0.4604979, 0.16817729, 0.23193443]
loss: 1.3139625
base_loss 0.39900106
reg_loss [0.45803306, 0.16825162, 0.23183294]
loss: 1.2571187
base_loss 0.47543332
reg_loss [0.45691246, 0.16832878, 0.23203702]
loss: 1.3327116
base_loss 0.6

base_loss 0.444185
reg_loss [0.4593242, 0.17029877, 0.23086177]
loss: 1.3046697
base_loss 1.2754967
reg_loss [0.46053767, 0.17080058, 0.23044968]
loss: 2.1372848
base_loss 0.69759727
reg_loss [0.45995852, 0.17055, 0.23081279]
loss: 1.5589186
base_loss 0.9624479
reg_loss [0.45984817, 0.17048897, 0.23115583]
loss: 1.8239409
base_loss 0.96069956
reg_loss [0.45766357, 0.17021509, 0.2320966]
loss: 1.8206748
base_loss 0.94891286
reg_loss [0.45655918, 0.17037639, 0.23213236]
loss: 1.8079808
base_loss 0.36510608
reg_loss [0.45844263, 0.17021665, 0.23260798]
loss: 1.2263733
base_loss 1.7717125
reg_loss [0.4580792, 0.17029895, 0.23269069]
loss: 2.6327815
base_loss 0.5766084
reg_loss [0.45879832, 0.16998558, 0.23258275]
loss: 1.4379749
base_loss 0.78474385
reg_loss [0.45887542, 0.1700548, 0.23292035]
loss: 1.6465943
base_loss 0.9064008
reg_loss [0.46107742, 0.16977808, 0.23430607]
loss: 1.7715625
base_loss 1.697065
reg_loss [0.4632396, 0.1697416, 0.2335527]
loss: 2.5635989
base_loss 0.15341792
re

base_loss 0.65381044
reg_loss [0.47900623, 0.16904819, 0.23356704]
loss: 1.5354319
base_loss 1.0716462
reg_loss [0.48023883, 0.16925623, 0.23352109]
loss: 1.9546623
base_loss 0.20379353
reg_loss [0.47880152, 0.16921452, 0.23353048]
loss: 1.08534
base_loss 0.9657631
reg_loss [0.4809797, 0.16941549, 0.23338906]
loss: 1.8495473
base_loss 0.44917363
reg_loss [0.4795821, 0.16947822, 0.23343204]
loss: 1.331666
base_loss 0.48875993
reg_loss [0.48084575, 0.16963245, 0.23383482]
loss: 1.373073
base_loss 0.17464252
reg_loss [0.4780303, 0.17000343, 0.23337598]
loss: 1.0560522
base_loss 0.4433673
reg_loss [0.4776481, 0.16959709, 0.2335535]
loss: 1.3241659
base_loss 0.22297269
reg_loss [0.47618082, 0.16969964, 0.23333685]
loss: 1.10219
base_loss 0.6122811
reg_loss [0.47970065, 0.17002937, 0.23314404]
loss: 1.4951552
base_loss 0.6651726
reg_loss [0.4764666, 0.16948055, 0.2330669]
loss: 1.5441867
base_loss 0.92642117
reg_loss [0.477501, 0.16951066, 0.233831]
loss: 1.807264
base_loss 0.17417304
reg_lo

base_loss 0.2655673
reg_loss [0.4736976, 0.17171493, 0.23646371]
loss: 1.1474435
base_loss 0.6584692
reg_loss [0.47280055, 0.17183712, 0.23667642]
loss: 1.5397832
base_loss 1.1451769
reg_loss [0.4873923, 0.17230503, 0.23738305]
loss: 2.0422573
base_loss 0.21395373
reg_loss [0.48634914, 0.1716768, 0.23763865]
loss: 1.1096183
base_loss 0.5406709
reg_loss [0.48432365, 0.17169306, 0.23705329]
loss: 1.4337409
base_loss 0.3720319
reg_loss [0.484034, 0.17122379, 0.23781256]
loss: 1.2651021
base_loss 0.7963099
reg_loss [0.482083, 0.17135978, 0.23703994]
loss: 1.6867926
base_loss 0.8858304
reg_loss [0.48014906, 0.17114899, 0.23761776]
loss: 1.7747462
base_loss 1.9098201
reg_loss [0.47813696, 0.17108822, 0.23740089]
loss: 2.7964463
base_loss 0.5266107
reg_loss [0.47668356, 0.17072758, 0.2383569]
loss: 1.4123788
base_loss 0.30727872
reg_loss [0.47534603, 0.17075942, 0.23805667]
loss: 1.1914408
base_loss 1.1406639
reg_loss [0.4750632, 0.170541, 0.23850381]
loss: 2.024772
base_loss 0.9016169
reg_lo

reg_loss [0.47607887, 0.1704638, 0.23735048]
loss: 1.3768847
base_loss 0.86482847
reg_loss [0.47515887, 0.17077686, 0.23789787]
loss: 1.748662
base_loss 0.9956225
reg_loss [0.47653314, 0.17068973, 0.23806377]
loss: 1.8809092
base_loss 2.8265216
reg_loss [0.47539833, 0.17078386, 0.23699622]
loss: 3.7096999
base_loss 1.9050288
reg_loss [0.47594696, 0.17068474, 0.23634389]
loss: 2.7880044
base_loss 0.5384952
reg_loss [0.47783825, 0.17115928, 0.23733976]
loss: 1.4248325
base_loss 1.0555884
reg_loss [0.47863457, 0.17076239, 0.23779613]
loss: 1.9427814
base_loss 0.48078722
reg_loss [0.47766837, 0.17099455, 0.23778248]
loss: 1.3672326
base_loss 0.67855006
reg_loss [0.47782904, 0.17080095, 0.23743314]
loss: 1.5646131
base_loss 0.72030663
reg_loss [0.48028097, 0.17065677, 0.23713218]
loss: 1.6083766
base_loss 0.4189631
reg_loss [0.47971362, 0.17008731, 0.23669486]
loss: 1.3054589
base_loss 0.47892362
reg_loss [0.4782326, 0.17057654, 0.23666352]
loss: 1.3643963
base_loss 1.3369749
reg_loss [0.47

base_loss 0.45737314
reg_loss [0.47055387, 0.1704833, 0.23701753]
loss: 1.3354279
base_loss 0.18878971
reg_loss [0.47842792, 0.17194661, 0.23762478]
loss: 1.076789
base_loss 1.1438088
reg_loss [0.47817978, 0.17200199, 0.23748896]
loss: 2.0314796
base_loss 0.94760346
reg_loss [0.47797915, 0.17108497, 0.23739257]
loss: 1.8340602
base_loss 0.37763193
reg_loss [0.47709486, 0.17076147, 0.2372703]
loss: 1.2627585
base_loss 0.62621874
reg_loss [0.47608688, 0.17113206, 0.23717338]
loss: 1.510611
base_loss 0.9366688
reg_loss [0.4750434, 0.17207088, 0.23642796]
loss: 1.8202109
base_loss 0.70492667
reg_loss [0.47494742, 0.1721682, 0.23666942]
loss: 1.5887116
base_loss 0.672549
reg_loss [0.4770478, 0.17189579, 0.23670349]
loss: 1.5581961
base_loss 0.5282288
reg_loss [0.47779936, 0.17183791, 0.23687696]
loss: 1.4147431
base_loss 0.75689006
reg_loss [0.4782244, 0.17139137, 0.23783132]
loss: 1.6443372
base_loss 0.15918177
reg_loss [0.47833908, 0.1717596, 0.23792194]
loss: 1.0472023
base_loss 0.827707

base_loss 0.49514744
reg_loss [0.47418064, 0.17202917, 0.2366316]
loss: 1.3779888
base_loss 0.45350114
reg_loss [0.47289547, 0.17226058, 0.23652355]
loss: 1.3351806
base_loss 1.0329258
reg_loss [0.4746884, 0.17240092, 0.23640193]
loss: 1.9164171
base_loss 0.6577821
reg_loss [0.47377333, 0.17231497, 0.2362319]
loss: 1.5401024
base_loss 0.55944765
reg_loss [0.47308594, 0.1723917, 0.23638584]
loss: 1.4413111
base_loss 0.3866287
reg_loss [0.47221625, 0.17256153, 0.23617901]
loss: 1.2675854
base_loss 0.69619334
reg_loss [0.4730038, 0.17245004, 0.2364601]
loss: 1.5781072
base_loss 0.8054316
reg_loss [0.47452873, 0.1727639, 0.2368701]
loss: 1.6895944
base_loss 0.3570814
reg_loss [0.47365835, 0.17252153, 0.23679714]
loss: 1.2400584
base_loss 0.25320867
reg_loss [0.47292173, 0.17262073, 0.23597966]
loss: 1.1347307
base_loss 0.30148062
reg_loss [0.47735766, 0.1727673, 0.23627223]
loss: 1.1878778
base_loss 0.74973416
reg_loss [0.4774437, 0.17266582, 0.23628184]
loss: 1.6361256
base_loss 0.5363146

base_loss 0.7339464
reg_loss [0.48998034, 0.17379506, 0.23727548]
loss: 1.6349974
base_loss 0.56446505
reg_loss [0.48972452, 0.1739887, 0.23699574]
loss: 1.465174
base_loss 0.3088671
reg_loss [0.48824713, 0.17379096, 0.23661089]
loss: 1.2075161
base_loss 0.6415365
reg_loss [0.48690546, 0.17402507, 0.23685722]
loss: 1.5393242
base_loss 0.51657367
reg_loss [0.48736662, 0.17399259, 0.23658365]
loss: 1.4145167
base_loss 0.22068891
reg_loss [0.48625335, 0.17403828, 0.23638351]
loss: 1.117364
base_loss 0.169876
reg_loss [0.49807358, 0.17472486, 0.23698045]
loss: 1.0796549
base_loss 0.6625793
reg_loss [0.49824747, 0.1746054, 0.23668373]
loss: 1.5721159
base_loss 0.7617667
reg_loss [0.4965152, 0.17401642, 0.23763986]
loss: 1.6699383
base_loss 0.77444935
reg_loss [0.4953415, 0.17399536, 0.23757985]
loss: 1.6813661
base_loss 0.24007688
reg_loss [0.49393812, 0.17379165, 0.23799628]
loss: 1.145803
base_loss 0.3188558
reg_loss [0.49325126, 0.17399257, 0.23798656]
loss: 1.2240863
base_loss 0.7927369

base_loss 0.39569303
reg_loss [0.48473036, 0.17353389, 0.24250634]
loss: 1.2964637
base_loss 0.1496839
reg_loss [0.48353598, 0.17388947, 0.24240382]
loss: 1.0495131
base_loss 2.4571095
reg_loss [0.48678467, 0.17429817, 0.24292313]
loss: 3.3611152
base_loss 0.315062
reg_loss [0.4857745, 0.17392577, 0.24360879]
loss: 1.218371
base_loss 0.35166466
reg_loss [0.48566458, 0.173968, 0.24353302]
loss: 1.2548302
base_loss 0.22287828
reg_loss [0.48364216, 0.17409503, 0.24325426]
loss: 1.1238698
base_loss 0.6718694
reg_loss [0.4838055, 0.17406681, 0.2426328]
loss: 1.5723746
base_loss 0.5125648
reg_loss [0.48147696, 0.17410405, 0.24285933]
loss: 1.4110051
base_loss 0.4975714
reg_loss [0.48319176, 0.17407636, 0.24286927]
loss: 1.3977088
base_loss 0.61580026
reg_loss [0.48250243, 0.17400274, 0.24210948]
loss: 1.514415
base_loss 0.44636
reg_loss [0.48209015, 0.17385845, 0.24250112]
loss: 1.3448097
base_loss 0.5426146
reg_loss [0.48150817, 0.17413044, 0.24213992]
loss: 1.4403931
base_loss 0.5311397
re

base_loss 0.21811637
reg_loss [0.4830523, 0.17402515, 0.24105394]
loss: 1.1162478
base_loss 0.29606515
reg_loss [0.4828702, 0.17384258, 0.24082819]
loss: 1.193606
base_loss 0.40191
reg_loss [0.4820579, 0.17395721, 0.24099596]
loss: 1.2989211
base_loss 0.25242227
reg_loss [0.48172924, 0.1737593, 0.24118015]
loss: 1.1490909
base_loss 0.8120518
reg_loss [0.48324552, 0.17394622, 0.24121033]
loss: 1.7104539
base_loss 0.95297676
reg_loss [0.4870837, 0.17380355, 0.240929]
loss: 1.8547931
base_loss 1.2647823
reg_loss [0.48980376, 0.17394373, 0.23982076]
loss: 2.1683507
base_loss 0.5191851
reg_loss [0.4912993, 0.17362827, 0.23977512]
loss: 1.423888
base_loss 0.17888296
reg_loss [0.4893731, 0.17406283, 0.2396607]
loss: 1.0819796
base_loss 1.0262674
reg_loss [0.4888314, 0.17387874, 0.23978813]
loss: 1.9287658
base_loss 0.6602267
reg_loss [0.48698977, 0.17413454, 0.23988932]
loss: 1.5612402
base_loss 0.6242015
reg_loss [0.4876568, 0.17429325, 0.24066554]
loss: 1.5268171
base_loss 0.971838
reg_loss

reg_loss [0.49453756, 0.17402251, 0.23929891]
loss: 1.7579411
base_loss 1.8343937
reg_loss [0.49405965, 0.17361766, 0.23903686]
loss: 2.7411077
base_loss 0.7766999
reg_loss [0.49726874, 0.17391784, 0.23944347]
loss: 1.68733
base_loss 0.46922004
reg_loss [0.49817938, 0.17377193, 0.23998155]
loss: 1.3811529
base_loss 0.055185404
reg_loss [0.49721378, 0.17379089, 0.23986517]
loss: 0.9660552
base_loss 1.1478801
reg_loss [0.4973153, 0.1736449, 0.24051046]
loss: 2.0593507
base_loss 0.7808126
reg_loss [0.49522647, 0.17352687, 0.23901552]
loss: 1.6885815
base_loss 0.2634021
reg_loss [0.4955958, 0.173423, 0.23935665]
loss: 1.1717776
base_loss 0.22788768
reg_loss [0.49620718, 0.17372948, 0.23932235]
loss: 1.1371467
base_loss 0.7501948
reg_loss [0.4957005, 0.1734419, 0.2388413]
loss: 1.6581784
base_loss 0.20352945
reg_loss [0.49444416, 0.17342949, 0.23866576]
loss: 1.1100688
base_loss 0.2855634
reg_loss [0.49379554, 0.17335029, 0.23883316]
loss: 1.1915424
base_loss 0.13296044
reg_loss [0.49225864

base_loss 0.20915386
reg_loss [0.48869693, 0.17336185, 0.23516539]
loss: 1.1063781
base_loss 0.6840622
reg_loss [0.49337265, 0.17325863, 0.23552279]
loss: 1.5862162
base_loss 0.7587205
reg_loss [0.49351925, 0.17281188, 0.2357579]
loss: 1.6608095
base_loss 0.777146
reg_loss [0.4914493, 0.17285779, 0.23587587]
loss: 1.6773288
base_loss 0.43726963
reg_loss [0.4922964, 0.17296967, 0.23577358]
loss: 1.3383093
base_loss 0.55151004
reg_loss [0.49185607, 0.17313257, 0.23638773]
loss: 1.4528863
base_loss 0.40667903
reg_loss [0.4921785, 0.17291313, 0.23637727]
loss: 1.3081479
base_loss 0.32570726
reg_loss [0.49119583, 0.17306949, 0.23625721]
loss: 1.2262298
base_loss 0.5258169
reg_loss [0.49291646, 0.17331262, 0.23657401]
loss: 1.4286201
base_loss 0.27874812
reg_loss [0.49205557, 0.17305891, 0.23632444]
loss: 1.1801871
base_loss 0.17601873
reg_loss [0.4920305, 0.17299014, 0.23600085]
loss: 1.0770402
base_loss 0.87664557
reg_loss [0.49052554, 0.172922, 0.23663257]
loss: 1.7767256
base_loss 0.2269

reg_loss [0.48751292, 0.17296222, 0.23697698]
loss: 1.0861591
base_loss 0.5352869
reg_loss [0.48907012, 0.17289756, 0.23699768]
loss: 1.4342524
base_loss 0.56908584
reg_loss [0.48884913, 0.1731261, 0.23690866]
loss: 1.4679698
base_loss 1.735343
reg_loss [0.4901848, 0.17310709, 0.23743664]
loss: 2.6360714
base_loss 0.46734202
reg_loss [0.49039024, 0.17305602, 0.23747115]
loss: 1.3682594
base_loss 0.5377672
reg_loss [0.49124458, 0.17349117, 0.23887216]
loss: 1.441375
base_loss 0.31259778
reg_loss [0.48958716, 0.17317948, 0.23858187]
loss: 1.2139463
base_loss 0.33021817
reg_loss [0.48876682, 0.17303829, 0.2385176]
loss: 1.2305409
base_loss 2.0997622
reg_loss [0.49058837, 0.17294478, 0.23766868]
loss: 3.0009642
base_loss 0.14113559
reg_loss [0.49421787, 0.17365295, 0.23803087]
loss: 1.0470372
base_loss 0.3842365
reg_loss [0.49194902, 0.17371364, 0.23769817]
loss: 1.2875974
base_loss 0.5636182
reg_loss [0.49105787, 0.17363241, 0.23728]
loss: 1.4655885
base_loss 0.3682497
reg_loss [0.4900547

loss: 3.6490564
base_loss 0.46376923
reg_loss [0.5019615, 0.17251083, 0.23982956]
loss: 1.3780712
base_loss 0.36229008
reg_loss [0.5012974, 0.17231795, 0.24010307]
loss: 1.2760086
base_loss 1.3061339
reg_loss [0.50014, 0.1722526, 0.23971044]
loss: 2.218237
base_loss 0.22656497
reg_loss [0.5007966, 0.17224018, 0.2403465]
loss: 1.1399482
base_loss 0.21855578
reg_loss [0.50006896, 0.17230327, 0.24002916]
loss: 1.1309571
base_loss 0.13783643
reg_loss [0.499213, 0.172234, 0.23980485]
loss: 1.0490882
base_loss 0.2318865
reg_loss [0.49809608, 0.17244315, 0.2395637]
loss: 1.1419895
base_loss 0.55886257
reg_loss [0.4985714, 0.1722718, 0.24017143]
loss: 1.4698772
base_loss 0.21131417
reg_loss [0.49910808, 0.17254199, 0.23979934]
loss: 1.1227636
base_loss 0.8560474
reg_loss [0.49843353, 0.17239186, 0.23995478]
loss: 1.7668277
base_loss 0.52578807
reg_loss [0.49908808, 0.1726924, 0.2400568]
loss: 1.4376253
base_loss 0.42101747
reg_loss [0.49828944, 0.17255907, 0.24023347]
loss: 1.3320993
base_loss

base_loss 1.0183597
reg_loss [0.4885434, 0.17218947, 0.24091122]
loss: 1.9200038
base_loss 0.34559533
reg_loss [0.49101353, 0.1724473, 0.24097794]
loss: 1.2500341
base_loss 0.31923017
reg_loss [0.4913216, 0.17234509, 0.24084878]
loss: 1.2237456
base_loss 0.59341896
reg_loss [0.49064276, 0.17253977, 0.24151479]
loss: 1.4981164
base_loss 0.758686
reg_loss [0.49145707, 0.17239831, 0.24043222]
loss: 1.6629736
base_loss 0.28743428
reg_loss [0.4892083, 0.1723657, 0.23983093]
loss: 1.1888392
base_loss 0.38727313
reg_loss [0.4898682, 0.17265005, 0.2394235]
loss: 1.289215
base_loss 0.34501237
reg_loss [0.48896047, 0.1726879, 0.23995492]
loss: 1.2466156
base_loss 0.65153193
reg_loss [0.49450824, 0.1733213, 0.23979303]
loss: 1.5591545
base_loss 0.5483096
reg_loss [0.494581, 0.17271648, 0.2400932]
loss: 1.4557004
base_loss 0.94996244
reg_loss [0.49542108, 0.17212635, 0.23962492]
loss: 1.8571348
base_loss 0.13594258
reg_loss [0.49394587, 0.17232005, 0.23937991]
loss: 1.0415884
base_loss 0.4434015
r

base_loss 0.35161862
reg_loss [0.49923238, 0.17140174, 0.23658761]
loss: 1.2588404
base_loss 0.038037993
reg_loss [0.49741608, 0.1713086, 0.23643857]
loss: 0.94320124
base_loss 0.3339556
reg_loss [0.49735308, 0.17148285, 0.23627973]
loss: 1.2390712
base_loss 0.2753003
reg_loss [0.49534145, 0.17146382, 0.23688371]
loss: 1.1789893
base_loss 0.14323123
reg_loss [0.49508402, 0.17125712, 0.23668075]
loss: 1.0462532
base_loss 0.6120243
reg_loss [0.49412462, 0.1717911, 0.2374782]
loss: 1.5154183
base_loss 0.56329286
reg_loss [0.49872875, 0.17169744, 0.2371612]
loss: 1.4708803
base_loss 1.4824795
reg_loss [0.49955225, 0.17170635, 0.23817044]
loss: 2.3919084
base_loss 0.6323656
reg_loss [0.5007348, 0.17144272, 0.23743175]
loss: 1.5419749
base_loss 0.5714136
reg_loss [0.4987615, 0.17152318, 0.23772243]
loss: 1.4794207
base_loss 0.23383892
reg_loss [0.49956614, 0.1715483, 0.23734218]
loss: 1.1422956
base_loss 0.16711321
reg_loss [0.49824676, 0.17159083, 0.2370914]
loss: 1.0740422
base_loss 0.1906

reg_loss [0.50441366, 0.17163175, 0.23630989]
loss: 1.1328657
base_loss 1.8531809
reg_loss [0.50323147, 0.17130145, 0.23534346]
loss: 2.7630572
base_loss 0.38639978
reg_loss [0.5033135, 0.1709125, 0.23550245]
loss: 1.2961283
base_loss 0.74018085
reg_loss [0.5038599, 0.17105399, 0.2359054]
loss: 1.6510001
base_loss 0.07032819
reg_loss [0.5028227, 0.17081909, 0.23575155]
loss: 0.97972155
base_loss 0.40520442
reg_loss [0.504029, 0.1710415, 0.2347911]
loss: 1.3150661
base_loss 0.45587993
reg_loss [0.50328404, 0.1708601, 0.23492919]
loss: 1.3649533
base_loss 1.3378096
reg_loss [0.5016655, 0.17117718, 0.23449296]
loss: 2.2451453
base_loss 0.36370468
reg_loss [0.5001539, 0.17096226, 0.2352447]
loss: 1.2700655
base_loss 0.26514655
reg_loss [0.49923378, 0.1709559, 0.2353355]
loss: 1.1706717
base_loss 0.36186078
reg_loss [0.4995789, 0.17093444, 0.23576646]
loss: 1.2681406
base_loss 2.3080063
reg_loss [0.4981593, 0.17094646, 0.23517738]
loss: 3.2122893
base_loss 0.5590827
reg_loss [0.49770817, 0.

base_loss 0.93601537
reg_loss [0.48832855, 0.16969119, 0.23516683]
loss: 1.8292019
base_loss 0.43547022
reg_loss [0.48726743, 0.16978945, 0.23521775]
loss: 1.327745
base_loss 0.18749261
reg_loss [0.48701048, 0.16972157, 0.23528354]
loss: 1.0795082
base_loss 0.5804987
reg_loss [0.48571506, 0.16979171, 0.23526467]
loss: 1.4712701
base_loss 0.31388384
reg_loss [0.48567286, 0.1695765, 0.23506057]
loss: 1.2041938
base_loss 0.9381802
reg_loss [0.48737997, 0.1699027, 0.23588549]
loss: 1.8313484
base_loss 0.25327972
reg_loss [0.48648575, 0.16973053, 0.23576789]
loss: 1.1452639
base_loss 0.5688
reg_loss [0.4864699, 0.16983294, 0.23676573]
loss: 1.4618685
base_loss 0.51245606
reg_loss [0.48603582, 0.16981117, 0.23720075]
loss: 1.4055037
base_loss 1.212532
reg_loss [0.49378103, 0.16998921, 0.23665231]
loss: 2.1129546
base_loss 0.16827065
reg_loss [0.49365228, 0.17003722, 0.23675299]
loss: 1.0687132
base_loss 0.41092834
reg_loss [0.49236473, 0.17020164, 0.23643611]
loss: 1.3099308
base_loss 0.3054

reg_loss [0.49813065, 0.17243439, 0.23653999]
loss: 1.4293042
base_loss 0.5672658
reg_loss [0.4968273, 0.1721175, 0.23631123]
loss: 1.4725218
base_loss 0.4797114
reg_loss [0.49723056, 0.17186485, 0.23669897]
loss: 1.3855058
base_loss 0.40381864
reg_loss [0.49870518, 0.17229559, 0.23584424]
loss: 1.3106637
base_loss 0.39349395
reg_loss [0.4978714, 0.17200762, 0.23578458]
loss: 1.2991575
base_loss 0.6306027
reg_loss [0.49819323, 0.17215812, 0.23582737]
loss: 1.5367814
base_loss 0.5616108
reg_loss [0.4983387, 0.17179225, 0.23561159]
loss: 1.4673533
base_loss 0.6977595
reg_loss [0.49561942, 0.17212567, 0.23607582]
loss: 1.6015804
base_loss 0.5158922
reg_loss [0.4957344, 0.17202243, 0.23612583]
loss: 1.4197749
base_loss 1.4081801
reg_loss [0.4950467, 0.17221157, 0.2358395]
loss: 2.311278
base_loss 0.8113256
reg_loss [0.49526113, 0.1713034, 0.23610397]
loss: 1.7139941
base_loss 0.42638344
reg_loss [0.49598616, 0.17145473, 0.2368177]
loss: 1.3306421
base_loss 0.21994594
reg_loss [0.4958687, 0

reg_loss [0.51138884, 0.17259035, 0.23835953]
loss: 1.2252734
base_loss 0.11920613
reg_loss [0.51020646, 0.17256784, 0.2383134]
loss: 1.0402938
base_loss 0.45634323
reg_loss [0.50986886, 0.17240213, 0.23793378]
loss: 1.3765479
base_loss 0.2586427
reg_loss [0.5090141, 0.17275253, 0.23774648]
loss: 1.1781558
base_loss 0.15904257
reg_loss [0.5081099, 0.1725992, 0.23773545]
loss: 1.0774872
base_loss 0.6180022
reg_loss [0.51292175, 0.17277424, 0.23789135]
loss: 1.5415895
base_loss 0.35544577
reg_loss [0.51332134, 0.17250912, 0.23879607]
loss: 1.2800723
base_loss 0.6562309
reg_loss [0.51243216, 0.1725479, 0.23874763]
loss: 1.5799586
base_loss 0.08746408
reg_loss [0.5113573, 0.17229447, 0.23871423]
loss: 1.0098301
base_loss 0.117044486
reg_loss [0.50994396, 0.17243841, 0.23869304]
loss: 1.0381199
base_loss 0.4209465
reg_loss [0.51123714, 0.17238156, 0.23887515]
loss: 1.3434403
base_loss 0.1795651
reg_loss [0.51022124, 0.17235585, 0.23851506]
loss: 1.1006572
base_loss 0.35682946
reg_loss [0.51

base_loss 0.32045954
reg_loss [0.51071495, 0.1719094, 0.23732528]
loss: 1.2404093
base_loss 0.31714943
reg_loss [0.50983316, 0.1721577, 0.23769559]
loss: 1.236836
base_loss 1.090209
reg_loss [0.51246345, 0.17206904, 0.23774067]
loss: 2.0124822
base_loss 0.5405631
reg_loss [0.5135903, 0.17230496, 0.23790804]
loss: 1.4643664
base_loss 0.31123105
reg_loss [0.5130218, 0.17202894, 0.23790707]
loss: 1.2341889
base_loss 0.20143445
reg_loss [0.512993, 0.17227398, 0.23811199]
loss: 1.1248134
base_loss 0.7426852
reg_loss [0.5131516, 0.17201488, 0.23867443]
loss: 1.666526
base_loss 0.91578895
reg_loss [0.5118152, 0.17203566, 0.23783652]
loss: 1.8374764
base_loss 0.6100069
reg_loss [0.5137908, 0.17212196, 0.23808578]
loss: 1.5340054
base_loss 0.77996874
reg_loss [0.512018, 0.17210078, 0.23765667]
loss: 1.7017442
base_loss 0.2610228
reg_loss [0.511758, 0.17178571, 0.23773745]
loss: 1.182304
base_loss 0.66342545
reg_loss [0.5129702, 0.17201738, 0.23758991]
loss: 1.586003
base_loss 0.1536202
reg_loss

base_loss 0.5581135
reg_loss [0.5020493, 0.17161171, 0.23518682]
loss: 1.4669613
base_loss 0.27625567
reg_loss [0.50149953, 0.17129213, 0.23514223]
loss: 1.1841896
base_loss 0.39803508
reg_loss [0.5004569, 0.17136836, 0.2356233]
loss: 1.3054836
base_loss 0.70866287
reg_loss [0.49987677, 0.1712129, 0.23545872]
loss: 1.6152112
base_loss 0.4534655
reg_loss [0.49961874, 0.17131118, 0.23528829]
loss: 1.3596836
base_loss 0.31941527
reg_loss [0.5011385, 0.17155881, 0.23601417]
loss: 1.2281268
base_loss 0.1734421
reg_loss [0.49979705, 0.17154415, 0.23642236]
loss: 1.0812056
base_loss 0.13708435
reg_loss [0.49950367, 0.17159489, 0.23634113]
loss: 1.0445241
base_loss 0.48787028
reg_loss [0.49774858, 0.17176498, 0.23647642]
loss: 1.3938602
base_loss 0.27458188
reg_loss [0.49941573, 0.17171116, 0.23622268]
loss: 1.1819315
base_loss 0.08686039
reg_loss [0.497771, 0.17184982, 0.23611662]
loss: 0.9925978
base_loss 0.47596696
reg_loss [0.49974245, 0.17179677, 0.23639973]
loss: 1.383906
base_loss 0.087

base_loss 0.82217586
reg_loss [0.5023893, 0.16997105, 0.23429102]
loss: 1.7288272
base_loss 0.2296412
reg_loss [0.50095683, 0.16991816, 0.23459785]
loss: 1.1351141
base_loss 0.16727832
reg_loss [0.5015597, 0.17066273, 0.2343811]
loss: 1.0738819
base_loss 0.27127042
reg_loss [0.500167, 0.17061798, 0.23440419]
loss: 1.1764596
base_loss 0.6967864
reg_loss [0.49992567, 0.17037888, 0.2344579]
loss: 1.6015488
base_loss 0.6066123
reg_loss [0.5049578, 0.17038581, 0.23382364]
loss: 1.5157796
base_loss 0.8818803
reg_loss [0.504168, 0.17046788, 0.23386496]
loss: 1.7903812
base_loss 0.06574939
reg_loss [0.50230116, 0.17019878, 0.23367527]
loss: 0.97192466
base_loss 0.17881532
reg_loss [0.50160533, 0.17002895, 0.2338945]
loss: 1.084344
base_loss 0.32777295
reg_loss [0.5155174, 0.17066082, 0.2337376]
loss: 1.2476888
base_loss 0.2708893
reg_loss [0.5133552, 0.16996403, 0.23361596]
loss: 1.1878245
base_loss 0.19633564
reg_loss [0.5124029, 0.17002167, 0.23344336]
loss: 1.1122036
base_loss 0.8595741
reg

reg_loss [0.51079273, 0.17040555, 0.23537666]
loss: 1.7059765
base_loss 0.33604014
reg_loss [0.5090713, 0.1706575, 0.2352032]
loss: 1.250972
base_loss 0.07719086
reg_loss [0.50802785, 0.170787, 0.23489122]
loss: 0.990897
base_loss 0.6418747
reg_loss [0.510286, 0.17105702, 0.23459856]
loss: 1.5578161
base_loss 0.6135654
reg_loss [0.5108104, 0.1706958, 0.23441589]
loss: 1.5294875
base_loss 0.70837647
reg_loss [0.5099424, 0.1707064, 0.23478727]
loss: 1.6238126
base_loss 0.9253002
reg_loss [0.50799125, 0.17040733, 0.2342564]
loss: 1.8379551
base_loss 0.6294074
reg_loss [0.5080033, 0.17038557, 0.23424196]
loss: 1.5420382
base_loss 0.24612305
reg_loss [0.51281494, 0.17040668, 0.23368481]
loss: 1.1630294
base_loss 0.4018684
reg_loss [0.5110311, 0.17031147, 0.23371087]
loss: 1.3169218
base_loss 0.81337154
reg_loss [0.50981224, 0.17027034, 0.23425256]
loss: 1.7277067
base_loss 1.065723
reg_loss [0.50943416, 0.17008069, 0.23359494]
loss: 1.9788327
base_loss 0.5306208
reg_loss [0.5097152, 0.17006

reg_loss [0.51418453, 0.17013068, 0.23447792]
loss: 1.8436772
base_loss 0.4728225
reg_loss [0.5145468, 0.16991822, 0.23479742]
loss: 1.3920848
base_loss 0.9811796
reg_loss [0.5138153, 0.17003258, 0.23474309]
loss: 1.8997706
base_loss 0.46837813
reg_loss [0.5140788, 0.16978477, 0.23494796]
loss: 1.3871896
base_loss 0.11125405
reg_loss [0.51445687, 0.16996281, 0.23470451]
loss: 1.0303782
base_loss 1.1091939
reg_loss [0.51351225, 0.16987263, 0.23466119]
loss: 2.02724
base_loss 0.49937212
reg_loss [0.5136543, 0.16990532, 0.23462184]
loss: 1.4175537
base_loss 0.46032554
reg_loss [0.51312673, 0.16973054, 0.234309]
loss: 1.3774917
base_loss 0.10534173
reg_loss [0.51185614, 0.16961032, 0.23415224]
loss: 1.0209605
base_loss 1.0395199
reg_loss [0.51581377, 0.16973315, 0.23419411]
loss: 1.959261
base_loss 0.35063055
reg_loss [0.5143759, 0.16965151, 0.23426574]
loss: 1.2689238
base_loss 0.2668711
reg_loss [0.5161108, 0.16963708, 0.23391841]
loss: 1.1865374
base_loss 0.6857605
reg_loss [0.5148337, 

reg_loss [0.51348704, 0.16834565, 0.23466979]
loss: 0.9941328
base_loss 1.0951351
reg_loss [0.51489276, 0.16816494, 0.2340872]
loss: 2.01228
base_loss 0.42275745
reg_loss [0.5141528, 0.16852598, 0.23420016]
loss: 1.3396363
base_loss 0.14066933
reg_loss [0.51299703, 0.16823226, 0.23397382]
loss: 1.0558724
base_loss 0.6559032
reg_loss [0.51219285, 0.16840963, 0.23412216]
loss: 1.5706278
base_loss 0.12516496
reg_loss [0.5144101, 0.16819341, 0.23378937]
loss: 1.0415578
base_loss 0.71203613
reg_loss [0.513287, 0.16841075, 0.23414119]
loss: 1.6278752
base_loss 0.47904763
reg_loss [0.51304495, 0.16858438, 0.23416795]
loss: 1.3948449
base_loss 0.21913922
reg_loss [0.51270354, 0.1685874, 0.23416877]
loss: 1.134599
base_loss 0.5554892
reg_loss [0.51248324, 0.16878057, 0.23400848]
loss: 1.4707614
base_loss 0.48612088
reg_loss [0.5113976, 0.16859055, 0.23411639]
loss: 1.4002255
base_loss 0.6097231
reg_loss [0.51230043, 0.16835614, 0.23464148]
loss: 1.5250212
base_loss 0.2080951
reg_loss [0.5128841

base_loss 0.50813544
reg_loss [0.5182717, 0.16770786, 0.23378795]
loss: 1.4279029
base_loss 0.42101505
reg_loss [0.5192844, 0.16787162, 0.23400113]
loss: 1.3421723
base_loss 0.65648854
reg_loss [0.5188291, 0.16778076, 0.23443654]
loss: 1.5775349
base_loss 0.71578896
reg_loss [0.51824456, 0.1679161, 0.2342684]
loss: 1.6362181
base_loss 1.9317979
reg_loss [0.5170143, 0.16775832, 0.23396064]
loss: 2.850531
base_loss 0.252844
reg_loss [0.51943916, 0.16763635, 0.23362468]
loss: 1.1735442
base_loss 0.110254556
reg_loss [0.51858026, 0.16729349, 0.23356673]
loss: 1.029695
base_loss 0.40955946
reg_loss [0.5178202, 0.16763519, 0.23399673]
loss: 1.3290116
base_loss 0.34217185
reg_loss [0.5178384, 0.16752046, 0.23344272]
loss: 1.2609735
base_loss 0.057753656
reg_loss [0.5161274, 0.16760299, 0.23331687]
loss: 0.97480094
base_loss 0.2469868
reg_loss [0.51618385, 0.16751157, 0.23362556]
loss: 1.1643078
base_loss 0.63054234
reg_loss [0.5151109, 0.16772385, 0.23360027]
loss: 1.5469774
base_loss 0.24896

reg_loss [0.51477486, 0.16694634, 0.23502082]
loss: 2.1090195
base_loss 0.39074373
reg_loss [0.5155661, 0.16709217, 0.2354265]
loss: 1.3088286
base_loss 0.4215581
reg_loss [0.51705134, 0.16758512, 0.23503931]
loss: 1.341234
base_loss 1.0929835
reg_loss [0.5182165, 0.16738722, 0.23511319]
loss: 2.0137005
base_loss 0.156153
reg_loss [0.5177807, 0.16688131, 0.23478536]
loss: 1.0756004
base_loss 0.5881361
reg_loss [0.5133299, 0.16665445, 0.23476121]
loss: 1.5028816
base_loss 0.2510268
reg_loss [0.51345575, 0.16670392, 0.23503536]
loss: 1.1662219
base_loss 0.3797362
reg_loss [0.51177514, 0.16675638, 0.23512565]
loss: 1.2933934
base_loss 0.46136415
reg_loss [0.51185507, 0.16668144, 0.23541065]
loss: 1.3753114
base_loss 1.0609695
reg_loss [0.51100665, 0.16700748, 0.2351288]
loss: 1.9741124
base_loss 0.6580682
reg_loss [0.51318586, 0.16667828, 0.23467351]
loss: 1.5726058
base_loss 0.4376668
reg_loss [0.51247245, 0.16678712, 0.23495044]
loss: 1.3518769
base_loss 0.53864825
reg_loss [0.5135507, 

base_loss 0.13822812
reg_loss [0.5009289, 0.16660921, 0.23418584]
loss: 1.039952
base_loss 0.20488386
reg_loss [0.5020807, 0.16654822, 0.23429468]
loss: 1.1078074
base_loss 0.54559964
reg_loss [0.5032601, 0.16711639, 0.23464067]
loss: 1.4506168
base_loss 0.48621428
reg_loss [0.50140476, 0.16672558, 0.23516297]
loss: 1.3895075
base_loss 0.32802245
reg_loss [0.5007965, 0.1669429, 0.2350514]
loss: 1.2308133
base_loss 0.24273553
reg_loss [0.501585, 0.16704777, 0.23510595]
loss: 1.1464742
base_loss 0.089001894
reg_loss [0.49991575, 0.1670726, 0.23502344]
loss: 0.99101365
base_loss 0.23412564
reg_loss [0.50051874, 0.16692086, 0.235524]
loss: 1.1370893
base_loss 0.2550875
reg_loss [0.4989389, 0.16700904, 0.23535188]
loss: 1.1563873
base_loss 0.45588502
reg_loss [0.50321835, 0.16727024, 0.23510955]
loss: 1.3614831
base_loss 0.5156016
reg_loss [0.502324, 0.16723183, 0.23525886]
loss: 1.4204161
base_loss 0.5717395
reg_loss [0.5016035, 0.16704643, 0.23479329]
loss: 1.4751828
base_loss 0.23758551


base_loss 0.050011665
reg_loss [0.50922513, 0.16774437, 0.23412167]
loss: 0.96110284
base_loss 0.57759225
reg_loss [0.50982416, 0.16755188, 0.23484074]
loss: 1.489809
base_loss 0.19414058
reg_loss [0.5074449, 0.16780153, 0.23468156]
loss: 1.1040686
base_loss 0.14896753
reg_loss [0.50676674, 0.1676448, 0.23462033]
loss: 1.0579994
base_loss 1.0851738
reg_loss [0.50533414, 0.16759683, 0.23491965]
loss: 1.9930245
base_loss 0.113181666
reg_loss [0.50665486, 0.16751693, 0.23458993]
loss: 1.0219433
base_loss 0.1801562
reg_loss [0.50543183, 0.16776928, 0.23467107]
loss: 1.0880284
base_loss 0.46105826
reg_loss [0.50801504, 0.16756462, 0.23523101]
loss: 1.371869
base_loss 1.0825015
reg_loss [0.5057318, 0.16780166, 0.23550391]
loss: 1.9915389
base_loss 0.2653829
reg_loss [0.50599474, 0.1673529, 0.23549406]
loss: 1.1742246
base_loss 0.39108092
reg_loss [0.50656015, 0.16759492, 0.23570739]
loss: 1.3009434
base_loss 1.2404711
reg_loss [0.5067138, 0.16724329, 0.23600566]
loss: 2.150434
base_loss 0.17

loss: 1.0699363
base_loss 0.4043027
reg_loss [0.5039637, 0.16791959, 0.23670477]
loss: 1.3128908
base_loss 0.25942343
reg_loss [0.50449955, 0.16719694, 0.236946]
loss: 1.1680659
base_loss 0.26866713
reg_loss [0.5043787, 0.16704273, 0.23671068]
loss: 1.1767992
base_loss 0.28448188
reg_loss [0.5038854, 0.16672052, 0.23646906]
loss: 1.1915568
base_loss 0.7754294
reg_loss [0.5033293, 0.1669849, 0.23615395]
loss: 1.6818976
base_loss 0.12541161
reg_loss [0.50271237, 0.16666943, 0.23623042]
loss: 1.0310239
base_loss 0.24342063
reg_loss [0.5005274, 0.16696759, 0.23618259]
loss: 1.1470982
base_loss 0.13835749
reg_loss [0.50012803, 0.16665165, 0.23627214]
loss: 1.0414094
base_loss 0.15931289
reg_loss [0.49854463, 0.16690405, 0.23587242]
loss: 1.060634
base_loss 0.6057719
reg_loss [0.5047383, 0.167224, 0.23631988]
loss: 1.5140541
base_loss 0.11036171
reg_loss [0.50291586, 0.16697899, 0.23628883]
loss: 1.0165453
base_loss 0.4468922
reg_loss [0.50223243, 0.16679072, 0.23637901]
loss: 1.3522943
base

KeyboardInterrupt: 

In [89]:
n_epochs = 20
batch_size = 200
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,
                                                y: mnist.test.labels})
        print(epoch, "Test accuracy:", accuracy_val)
    save_path = saver.save(sess, "./my_model_final.ckpt")

0 Test accuracy: 0.7471
1 Test accuracy: 0.736
2 Test accuracy: 0.5298
3 Test accuracy: 0.4616
4 Test accuracy: 0.4692
5 Test accuracy: 0.5141
6 Test accuracy: 0.5543
7 Test accuracy: 0.584
8 Test accuracy: 0.6152
9 Test accuracy: 0.639
10 Test accuracy: 0.6619
11 Test accuracy: 0.6817
12 Test accuracy: 0.6903
13 Test accuracy: 0.709
14 Test accuracy: 0.7162
15 Test accuracy: 0.7263
16 Test accuracy: 0.7347
17 Test accuracy: 0.7469
18 Test accuracy: 0.7527
19 Test accuracy: 0.7565


### 正则化： Dropout

Dropout什么意思呢？

因为我们的参数很多，在每次迭代训练的时候我们并不是希望神一个神经元都是被激活的，我们曾经通过激活函数(如果激活值小鱼阈值则不会传递到下一层)，现在我们也可以通过dorpout来做这样的事情，他就是以一定的概率值随机丢弃一些神经元，单这些丢弃的神经元只是在这一次迭代中不参与训练，下一次训练我们还可以通过Dorpout在进行丢弃。

这样一来，我们这次训练的时候就不是所有的神经元都参与训练。

以我们的图为例，我们可以看到输入输出神经元都有四个，如果我们使用dropout，那么我们输入神经元的第一个和第三个神经元就不在参与训练，这样使得我们的模型变得简单，减少了模型的参数。降低复杂度。同时随机的神经元进行组合，减少了神经元之间可能形成的共同依赖，dropout的神经网络是由dropout之后的子模型组成的，因为我们在训练完之后，我们是需要所有的神经元参与的(我们在训练的时候训练很多子模型，但是最后我们要将训练好的子模型进行一个组合，因为我们是需要所有的神经元参与预测) 这样就有利于提高我们模型的泛化能力

In [121]:
tf.reset_default_graph()

In [122]:
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

In [124]:
training = tf.placeholder_with_default(False,shape=(),name='train')

dropout_rate = 0.5
x_drop = tf.layers.dropout(X,dropout_rate,training=training)

with tf.name_scope('dnn'):
    hidden1 = tf.layers.dense(x_drop,n_hidden1,activation=tf.nn.relu,name='hidden1')
    hidden1_drop = tf.layers.dropout(hidden1,dropout_rate,training=training)
    
    hidden2 = tf.layers.dense(hidden1_drop,n_hidden2,activation=tf.nn.relu,name='hidden2')
    hidden2_drop = tf.layers.dropout(hidden2,dropout_rate,training=training)
    
    logits = tf.layers.dense(hidden2_drop,n_outputs,name='outputs')

Instructions for updating:
Use keras.layers.dropout instead.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [127]:
tf.layers.dropout??

In [125]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
    
with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)
    training_op = optimizer.minimize(loss)    
    
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
    
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [126]:
n_epochs = 20
batch_size = 50
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={training: True, X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Test accuracy:", acc_test)
    save_path = saver.save(sess, "./my_model_final.ckpt")

0 Test accuracy: 0.9205
1 Test accuracy: 0.9362
2 Test accuracy: 0.9415
3 Test accuracy: 0.9508
4 Test accuracy: 0.9572
5 Test accuracy: 0.9557
6 Test accuracy: 0.9602
7 Test accuracy: 0.9602
8 Test accuracy: 0.9609
9 Test accuracy: 0.964
10 Test accuracy: 0.9633
11 Test accuracy: 0.9651
12 Test accuracy: 0.9667
13 Test accuracy: 0.9658
14 Test accuracy: 0.9658
15 Test accuracy: 0.9671
16 Test accuracy: 0.9685
17 Test accuracy: 0.9686
18 Test accuracy: 0.9672
19 Test accuracy: 0.9685


### 最大范数正则化

<div align=center><img src="./static/6.jpg"/></div>

#### 定义一个max_norm_regularizer()函数

`clip_by_norm()` 函数计算剪切后的权重。 然后我们创建一个赋值操作来将权值赋给权值变量：

In [None]:
def max_norm_regularizer(threshold, axes=1, name="max_norm",
                         collection="max_norm"):
    def max_norm(weights):
        clipped = tf.clip_by_norm(weights, clip_norm=threshold, axes=axes)
        clip_weights = tf.assign(weights, clipped, name=name)
        tf.add_to_collection(collection, clip_weights)
        return None #  there is no regularization loss term
    return max_norm

然后你可以调用这个函数来得到一个最大范数调节器（与你想要的阈值）。 当你创建一个隐藏层时，你可以将这个正则化器传递给kernel_regularizer参数：

In [None]:
n_inputs = 28 * 28
n_hidden1 = 300
n_hidden2 = 50
n_outputs = 10
learning_rate = 0.01
momentum = 0.9
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

In [None]:
max_norm_reg = max_norm_regularizer(threshold=1.0)
with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,
                              kernel_regularizer=max_norm_reg, name="hidden1")
    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu,
                              kernel_regularizer=max_norm_reg, name="hidden2")
    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")

In [None]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")
with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)
    training_op = optimizer.minimize(loss)    
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
init = tf.global_variables_initializer()
saver = tf.train.Saver()

请注意，最大范数正则化不需要在整体损失函数中添加正则化损失项，所以max_norm()函数返回None。 

但是，在每个训练步骤之后，仍需要运行clip_weights操作，因此您需要能够掌握它。 这就是为什么max_norm()函数将clip_weights节点添加到最大范数剪裁操作的集合中的原因。您需要获取这些裁剪操作并在每个训练步骤后运行它们：

In [None]:
n_epochs = 20
batch_size = 50

clip_all_weights = tf.get_collection("max_norm")
with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            sess.run(clip_all_weights)
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images,     #  not shown in the book
                                            y: mnist.test.labels})    #  not shown
        print(epoch, "Test accuracy:", acc_test)                      #  not shown
    save_path = saver.save(sess, "./my_model_final.ckpt")             #  not shown