Auto pruning #2603

NHZlX · 2017-06-26T07:28:00Z

Pruning

Principle:

The trained model has a large number of parameters redundancy, and these redundant parameter's value are very small, so we can cut them off.

For every layer chosen to be pruned, we add a 0-1 value mask which is of the same size as the layer's parameter and decide which of the parameters participate in the forward process。

Let's assume that the size of parameters in a layer is M and current sparsity ratio is current_spr, we first order the parameters according to the absolute value in that layer, then choose the smallest current_spr * M numbers out and set the corresponding mask's value to zero.

Paddle uses an automatic, gradual pruning approach. We use interval_pass, sparsity_upper_bound and end_passto control the process of this.
The parameters are pruned every interval_pass pass (a pass represents a epoch) as the network is fine-tuned to gradually increase the sparsity while allowing the network recover from any pruning-induced loss in accuracy. The network will reach sparsity_upper_bound sparsity finally, and the whole process will undergo end_pass/inter_pass times pruning.

As shown below, we use a log function for sparsity changes. We cut our network more aggressively in the initial stage for there exists a lot of redundant parameters and gradually reduced the number of the parameters being cutted for there are less redundant parameters in late stage and it's helpful for our network recover from pruning-induced loss in accuracy.

Usage:

from paddle.v2.attr import  Hook
from paddle.v2.attr import  ParamAttr

# The interval_pass value defalut is 3, end_pass value default is 60 
pa = ParamAttr(update_hooks = Hook('dynamic_pruning', sparsity_upper_bound=0.75, interval_pass=1, end_pass=3))

# for conv layer 
paddle.layer.img_conv(input=input,
                      filter_size=3,
                      num_channels=32,
                      num_filters=64,
                      param_attr=pa,
                      act=paddle.activation.Relu())

# for fully connected layer
out = paddle.layer.fc(input=input,
                      size=102,
                      act=paddle.activation.Softmax(),
                      param_attr = pa)

… auto_pruning

NHZlX · 2017-07-03T13:21:02Z

It exits bug, i am trying to settle it.

…r with mask

… auto_pruning

…uto_pruning

… auto_pruning

luotao1 · 2017-07-06T07:12:34Z

paddle/api/PaddleAPI.h

@@ -880,6 +880,7 @@ class ParameterUpdater {
   * @param param
   */
  void update(Parameter* param);
+  void preprocess(Parameter* param, size_t currentPass, size_t currentBatch);


.h文件中新加的函数都请添加注释，下同

luotao1 · 2017-07-06T07:14:50Z

paddle/parameter/ParameterUpdaterHook.cpp

+    for (size_t i = 0; i < para->getSize(); i++){
+        std::cout  << data[i] << " " ;
+    }
+    */


138-143行注释的代码请删掉。

luotao1 · 2017-07-06T07:15:30Z

paddle/parameter/ParameterUpdaterHook.cpp

+          sum_non += 1;
+      }
+    std::cout<<"sum_non: " <<sum_non << " " << para->getSize()<< std::endl;
+   */ 


169-178注释的代码请删掉。

luotao1 · 2017-07-06T07:17:14Z

paddle/parameter/ParameterUpdaterHook.cpp

@@ -139,6 +220,8 @@ static IParameterUpdaterHook *createImpl(
  auto &type = config.type();
  if (type == "pruning") {
    return new StaticPruningHook(config);
+  } else if (type == "dpruning") {


dpruning需要写全称dynamic_pruning，以便用户更好理解呢？

…efore featched in paddle.v2.parameters.get(...)

… auto_pruning

Xreki · 2017-07-10T11:25:52Z

paddle/parameter/ParameterUpdaterHook.cpp


-  void generateMask(Parameter *para) {
+  virtual void generateMask(Parameter *para, size_t nonZeroNum) {


建议第二个参数传real sparsityRatio

嗯，好的

Xreki · 2017-07-10T11:44:06Z

paddle/parameter/ParameterUpdaterHook.cpp

+      : ParameterPruningHook() {
+    this->upperBound_ = hookConfig.upper_bound();
+    this->interPass_ = hookConfig.inter_pass();
+    this->endPass_ = hookConfig.end_pass();


这几个参数命名都很不直观，需要去猜测所代表的意思，建议：

upper_bound -> sparsity_upper_bound

inter_pass中的inter应该代表的是interval吧，缩写成inter意思就不直观了。是否可以改成sparsity_increasing_interval，或者其他

我理解endPass的目的是用来计算sparsityRatio每次的增量。但是在这里设置endPass这个参数不太合适：

用户在train的时候会设置一次num_passes，这里又设置一次会很繁琐。

我理解也不能直接用外围设置的num_passes，因为用户会习惯将num_passes设置成一个很大的值，然后等收敛了再将作业kill掉。
所以增量的设置，再斟酌一下。

upper_bound 改成了 sparsity_upper_bound inter_pass 改成 interval_pass ，属性名字太长的话用户体验不是太好，end_pass 的话我认为是有必要有的，num_passes 是整个训练过程经过的pass，而end_pass 是我们sparsity_ratio 变化期间经过的pass

Xreki · 2017-07-10T11:46:11Z

paddle/parameter/ParameterUpdaterHook.cpp

+
+      size_t nonZeroNum = para->getSize() * (1 - sparsityRatio);
+      this->generateMask(para, nonZeroNum);
+      std::cout << para->getName()


如果需要输出提示用户，调用glog函数。否则删掉。删掉对应的#include <iostream>

Xreki · 2017-07-10T12:05:00Z

paddle/parameter/ParameterUpdaterHook.cpp

+    vec->copyFrom(*this->weightTemp_);
+  }
+
+  void handleBeforeFetch(Parameter *para) override {


应该是在每个forward计算之前调用的吧，函数命名不直观。

所有增加的接口都通过api接口，在python端调用控制train的流程，是否可以改成在ParameterUpdaterBase.h的update函数中调用：

55 // between startBatch() and finishBatch(), update() will be called 56 // by the trainer multiple times, each time for updating one Parameter 57 // with its gradient in PARAMETER_GRADIENT 58 void update(Parameter* para) { 59 SetDevice setDevice(para->getDeviceId()); 60 para->updateHook(); 61 this->updateImpl(para); // Call handleBeforeFetch(para) here 62 }

Xreki · 2017-07-10T12:24:09Z

paddle/parameter/ParameterUpdaterHook.cpp

+    auto &paraVec = para->getBuf(PARAMETER_VALUE);
+    weightTemp_->copyFrom(*paraVec);
+    paraVec->dotMul(*this->maskVec_);
+  }


我理解整个dynamic pruning流程是这样的：

每隔inter_pass增大sparsity_ratio，并将参数保存到weightTemp中，下一个inter_pass中，weightTemp不会改变，maskVec不会改变。

每次调用backward函数，在更新parameter之前，将weightTemp的数据拷贝到para->getBuf(PARAMETER_VALUE)中，相当于是在使用weightTemp来计算梯度更新，这里是会改变para->getBuf(PARAMETER_VALUE)的内容？改变的内容不需要保存到weightTemp里面？

每次调用forward函数、即在使用parameter之前，先调用handleBeforeFetch()函数，将参数稀疏化vec->dotMul(*maskVec_);，他所使用的参数实际是对weightTemp做了更新的参数。

preprocess(...)是每次forward都会调用的，所以weightTemp 每次都改变， maskVec是每隔inter_pass 才会改变一次

首先weightTemp 存储的是本次forward之前没有进行稀疏化的参数，前向的时候，参数首先和mask相乘进行稀疏化，然后forwardbackward，计算出梯度后，我们更新的是本次forward之前没有进行稀疏化的参数，原因是我们没有办法去改变momentum的稀疏度。

由于没有办法改变momentum的稀疏度，所以每次更新paramters后，parameters的并不是mask代表的稀疏度，所以 handleBeforeFetch() 是在python api， paramter.tor_tar()的时候，先让parameter * mask，然后存储。

第1点，weightTemp这个我理解了。而preprocess和handleBeforeFetch()都会在每次forward之前调用，并且执行parameter * mask，将parameter变成稀疏的。这两处应该可以合并一下。

第2、3点，我们实际上只需要在梯度更新也就是updateImpl()的时候，才需要用到稠密的weightTemp，那么梯度更新完了之后，实际上就可以parameter * mask，使parameter的常态是稀疏的，这样也不用在forward计算和to_tar保存参数之前额外地执行一次parameter * mask，所以，请考虑在line 166行提出的建议。

嗯，在updateImpl()后进行parameter * mask，可以保证parameter 是稀疏的，这样是可以的，perfect！

… auto_pruning

fix format

… auto_pruning

…uto_pruning

fix compile bug on mac

… auto_pruning

hedaoyuan · 2017-12-12T08:08:33Z

@NHZlX @Xreki 这个PR还有问题吗？@NHZlX这周看看有时间把这个PR Refine一下Merge了吧。

Xreki · 2017-12-19T06:27:34Z

paddle/parameter/ParameterUpdaterBase.h

    this->updateImpl(para);
+    if (para->useGpu()) {


为什么只有GPU的情况下才能使用hook？似乎是在updateImpl之后调用updateHook效果会好些？

这里的updateImpl(para) 只在gpu下会更新 https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/trainer/ThreadParameterUpdater.cpp#L85 cpu的在调用finishBatch() 时候会更新参数, 不是太清楚当时为什么这么设计，但是这块我确实找了很长时间。。。

Xreki · 2017-12-19T07:01:50Z

paddle/parameter/ParameterUpdaterHook.cpp

    if (para->useGpu()) {
-      maskVec_ = Vector::create(para->getSize(), para->useGpu());
-      maskVec_->copyFrom(*maskTemp);
+      this->maskVec_ = Vector::create(para->getSize(), para->useGpu());


Hook应该在整个训练过程中一直存在吧，也就是说，这里应该用resizeOrCreate？就是maskVec_如果已经被create过了，应该就不用再次create了。在static pruning里面，generateMask函数只被调用一次；在dynamic pruning里面，generateMask函数会被调用多次。

ok 这里应该使用resizeOrCreate

Xreki · 2017-12-19T07:25:09Z

paddle/parameter/ParameterUpdaterHook.cpp

    auto &paraVec = para->getBuf(PARAMETER_VALUE);
-    paraVec->dotMul(*maskVec_);
+    paraVec->dotMul(*this->maskVec_);


这里只是有个疑问，*this->maskVec_是等价于*(this->maskVec_)的？为何要加this呢？

this 可以去掉...

Xreki · 2017-12-19T07:33:13Z

paddle/parameter/ParameterUpdaterHook.cpp

+    this->sparsityRatio_ = hookConfig.sparsity_ratio();
+  }
+
+  void init(Parameter *para) override {
    size_t initCount = this->initCount_.fetch_add(1);
    CHECK_EQ(initCount, 0UL) << "Currently the StaticPruningHook must invoke "
                                "in same ParamterUpdater";


invoke -> be invoked
same -> the same

Xreki · 2017-12-19T07:35:30Z

paddle/parameter/ParameterUpdaterHook.cpp

    updateThreadChecker_.check();
-    auto &vec = para->getBuf(PARAMETER_GRADIENT);
+    auto &vec = para->getBuf(PARAMETER_VALUE);


看起来像是原来的pruning根本没生效啊，-____-

我们先进行梯度更新，由于momentum，可能param之前为0的又变化了，然后调用update() 重新将param 设置为0.

Xreki · 2017-12-19T07:45:55Z

paddle/trainer/ThreadParameterUpdater.cpp

@@ -207,6 +207,7 @@ void SgdThreadUpdater::finishBatch(real cost) {
  for (auto& para : parameters_) {
    int pid = para->getID();
    optimizers_[pid]->finishBatch();
+    para->updateHook();


这里为何需要调用updateHook，我理解，在每次调用para->update()的时候同时调用updateHook()即可？

原因是，cpu更新权重（w = w - w_diff）在finishbatch 中gpu在 updateImpl()中，即
https://github.com/NHZlX/Paddle/blob/7c9d5e5653155aa2a9106ba9621a1fa7f3b3bc0f/paddle/trainer/ThreadParameterUpdater.cpp#L202 所以得分开进行update

Xreki · 2017-12-19T07:48:46Z

proto/ParameterConfig.proto

+  // dynamic pruning's parameter, sparsity ratio will not change until 'pass %
+  // interval_pass == 0',
+  // the change in sparsity ratio is a log curve.
+  // More details can be found https://github.com/PaddlePaddle/Paddle/pull/2603


这里可以改成具体见实现代码，还是不要引用这个pr吧。

Xreki · 2017-12-19T07:52:22Z

python/paddle/trainer_config_helpers/attrs.py

+
+    :param interval_pass:  'dynamic_pruning' hook parameters,
+                        sparsity ratio will not change until 'pass % interval_pass == 0', the change in sparsity ratio is a log curve.
+                        More details can be found https://github.com/PaddlePaddle/Paddle/pull/2603


这里也是，不要引用pr。可以把计算的公式在这里列出来，在对dynamic pruning的总体介绍里面。

Xreki · 2017-12-19T07:55:13Z

python/paddle/trainer_config_helpers/attrs.py

+                float), 'sparisity_upper_bound   must be float type'
+            assert self.sparsity_upper_bound <= 1 and self.sparsity_upper_bound >= 0, 'sparsity_upper_bound must be a float between [0, 1] '
+
+        if self.interval_pass is not None:


dynamic pruning类型时，interval_pass和end_pass是必须设置的吧，这里要check下。另外，pruning类型时，不需要设置这些参数，打个warning提醒这些设置是不生效的，提醒使用dynamic pruning类型。

interval_pass 和end_pass 可以使用默认的，不过可以设置一下提醒

Xreki · 2017-12-19T07:58:03Z

paddle/api/PaddleAPI.h

+   *@param  currentPass
+   *@param  currentBatch
+   */
+  void preprocess(Parameter* param, size_t currentPass, size_t currentBatch);


参数名，我觉得还是叫passId和batchId比较好。

NHZlX added 2 commits June 26, 2017 13:21

add new auto pruning module

8317882

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9a8d498

… auto_pruning

NHZlX requested review from Xreki and hedaoyuan June 26, 2017 07:28

NHZlX added 5 commits June 30, 2017 13:12

the log function in preprocess must plus 2, or it will be a bug

6dbc941

fixed some bug of auto pruning

07cbc9e

delete fault file

7526315

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

44c4060

… auto_pruning

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

54a5577

… auto_pruning

NHZlX added 6 commits July 3, 2017 21:29

there is the conflict with the momentum and the mask

e681196

auto pruning modify bug

b960be8

fix bug in auto pruning, before save the model, multiple the paramete…

6849985

…r with mask

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

1e44249

… auto_pruning

Merge branch 'auto_pruning' of https://github.com/NHZlX/Paddle into a…

cbd5afb

…uto_pruning

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d875def

… auto_pruning

luotao1 reviewed Jul 6, 2017

View reviewed changes

NHZlX added 6 commits July 7, 2017 19:22

add the handleBeforeFetch which the parameter operate with the hook b…

3813928

…efore featched in paddle.v2.parameters.get(...)

dynamic pruning prameter config

a4862cd

refactor the dynamic pruning and fixed some bug

3251527

add dynamic pruning interface of the python

12cf82f

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3747d19

… auto_pruning

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6636a21

… auto_pruning

Xreki requested changes Jul 10, 2017

View reviewed changes

Xreki added this to In Progress in Embedded and Mobile Deployment Jul 17, 2017

NHZlX added 5 commits July 17, 2017 16:42

delete the handleBeforeFetch function

9d98dd1

set the updateHook after the updateImpl

1368682

modity the parameter config in PrameterAttribute

59c8c43

modity the related interface in python of pruning

34b7b90

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d6b04ec

… auto_pruning

NHZlX and others added 10 commits July 17, 2017 17:09

delete the explicit keywords of PrameterPruningHook constructors

e184a41

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ab3d10b

… auto_pruning

Update ParameterUpdaterBase.h

e8e5a67

fix format

Update trainer.py

8b74b72

fix format

fix the format

8966db2

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

b896b32

… auto_pruning

Merge branch 'auto_pruning' of https://github.com/NHZlX/Paddle into a…

3a24d7e

…uto_pruning

fix format

5c822fe

tiny modify

ecf25b1

Update ParameterUpdaterHook.cpp

6e5d805

fix compile bug on mac

hedaoyuan moved this from In Progress to Model Compression in Embedded and Mobile Deployment Aug 3, 2017

NHZlX added 7 commits September 21, 2017 15:02

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

44aec34

… auto_pruning

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

322fcbe

… auto_pruning

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

cc139b2

… auto_pruning

modify code style

a5407fa

fix error in cpu

bd6749a

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

bb3038d

… auto_pruning

add comments of preprocess func

7c9d5e5

Xreki reviewed Dec 19, 2017

View reviewed changes

NHZlX closed this Aug 9, 2018


		void generateMask(Parameter *para) {
		virtual void generateMask(Parameter *para, size_t nonZeroNum) {

Auto pruning #2603

Auto pruning #2603

Conversation

NHZlX commented Jun 26, 2017 • edited Loading

Pruning

Principle:

Usage:

NHZlX commented Jul 3, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki Jul 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hedaoyuan commented Dec 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NHZlX commented Jun 26, 2017 •

edited

Loading

Xreki Jul 13, 2017 •

edited

Loading