Add pruning possibilities at inner_product_layer #4294

Caenorst · 2016-06-12T17:15:36Z

This add the Deep compression's pruning feature on Inner product according to http://arxiv.org/abs/1506.02626, can also be used to a regularization looking at https://arxiv.org/abs/1602.07360

To use it just add in the prototxt the following:

layer {
  ...
  type: InnerProduct
  ...
  pruning_param {
    coeff: _here goes the pruning rate, between 0 and 1, with 0 = no pruning_
  }
  ...
}

seanbell · 2016-06-12T18:12:06Z

src/caffe/proto/caffe.proto

@@ -915,6 +916,11 @@ message PowerParameter {
  optional float shift = 3 [default = 0.0];
 }

+message PruningParameter {
+  // Pruning coefficient for deep compression


It would be good to document what this parameter does here. The current comment adds no information. It looks like it's the fraction of weights to keep, sorted by absolute value?

seanbell · 2016-06-12T18:18:35Z

Thanks for the PR -- it looks like this is most of what is needed to implement weight compression.

It looks like the current PR doesn't actually compress any data. The weights are dropped at load time, not save time. Also, it looks like the weights that are pruned is fixed -- the mask never changes? Does that mean that the intent is to only add this to a fully trained model at deploy time?

I don't quite understand the point of this PR. It looks like it will always be slower (from the extra mask multiply) and have a lower accuracy, but with no improvement in final model size (actual size on disk). It's missing the part where you make the model file smaller because some weights are now 0. Was that going to be in a separate PR?

seanbell · 2016-06-12T18:21:45Z

I don't see any unit tests.

Caenorst · 2016-06-12T18:54:54Z

Thank you for your review,

sorry for the messy PR, it's my first PR ever (So I'm not really confident with the process and Git neither)... I will to take more care next time, and add unit test aswell.

You are right about the parameter, gonna change the comment.

In the publications they have a better accuracy around ~70% pruning (act like a form of regularization), also at the end of the method we can use the sparsity to do the compression at the deployment (with sparse GEMM) which is coming, should I have implemented it in the same PR ?

Caenorst · 2016-06-12T19:01:22Z

And yes this is suppose to be used on an already trained model, and the masks don't changes after the pruning during the training, as there is no rules about how the pruning parameter is suppose to change.

seanbell · 2016-06-13T01:45:53Z

which is coming, should I have implemented it in the same PR ?

No, it's good to split up PRs in small units of functionality; that makes them easier to review.

ajtulloch · 2016-06-14T13:22:51Z

In practice to run a sufficiently pruned model, it might be worth just giving the input W as three blobs for the CSR representation, and calling mkl_csrmm on CPU/cusparseScsrmm on GPU. That's what the deep compression papers do when reporting their speedups for inference time (Appendix A in https://arxiv.org/pdf/1510.00149.pdf).

You can do the sparsification + conversion from dense to sparse in a few lines of PyCaffe by masking the original W, calling scipy.sparse.csr, and extracting the CSR ndarrays from the resulting csr_matrixand passing them to your new SparseInnerProduct layer.

Caenorst · 2016-06-14T13:53:23Z

So you think it's better to do the retraining directly on sparse representation ? I was afraid it doesn't give a lot of flexibility for playing with pruning.

Also I'd to implement dropconnect later and I thought it would fit well with the already done mask.

While the slowdown of the mask is almost unoticeable (but could be important with a not-enough-sparse model and csrmm), I thought about doing the conversion only for deploy.

I could do one way or another, depend on what the community prefer, let me know.

ajtulloch · 2016-06-14T16:19:20Z

Ah, I was just talking about replicating the speedups of the model.

For replicating that paper, the way @songhan did it IIRC was adding a mask_ SyncedData of the same size as data_ to Blob, exposing that to pycaffe so Python can get/set the mask, and overriding Blob::Update to conditionally mask out updates. Then everything else works IIRC.

Caenorst · 2016-06-14T17:10:08Z

Oh, ok got it. Is that important to replicate this way ? I don't feel confident about changing the whole Blob::update just for adding this feature to InnerProduct (I don't intend to add it to convolution layers as the result is far less interesting).

jpiabrantes · 2016-06-15T17:13:47Z

@ajtulloch I'm trying to implement what you suggested. The cusparseScsrmm takes a bunch of arguments.

cusparseScsrmm(cusparseHandle_t handle,
 cusparseOperation_t transA,
 int m, 
 int n,
 int k,
 int nnz, 
 const float *alpha,
 const cusparseMatDescr_t descrA,
 const float *csrValA, 
 const int *csrRowPtrA, 
 const int *csrColIndA, 
 const float *B,
 int ldb, 
 const float *beta, 
 float *C, 
 int ldc)

I don't know what to use for handle, transA, ldb and ldc.

On my pycaffe code I have the following:

sparse=csr_matrix(net.params[conv][0].data)
add_blob(sparse.data) #csrValA
add_blob(sparse.indices) #csrColIndA
add_blob(sparse.indptr) #csrRowPtrA

Could you give me an example on what to call on my SparseInnerProductlayer.cu?

P.S. What function would I call on cudnn_conv_layer.cu ?

Thank you very much.

beniz · 2016-06-16T18:05:30Z

This may not be what you need here, but I've fixed and merged an old PR with SparseInnerProduct in it, along with a larger set of changes for sparse computations on both CPU and GPU, see https://github.com/beniz/caffe/blob/master_dd_integ_sparse/src/caffe/layers/sparse_inner_product_layer.cu

I'm interested in any potential improvement to this sparse layer, though we already use it fine in many tasks.

ajtulloch · 2016-06-16T19:05:28Z

@beniz that's perfect, that's exactly what I'm talking about. Have you considered PR'ing that so it's more visible and maybe gets merged? It's a very useful layer, and that implementation looks really nice.

beniz · 2016-06-16T20:02:44Z

@ajtulloch I've tested the waters for a PR here #2364 (comment) but until now, no real feedback. For a full list of functionalities, see https://github.com/beniz/deepdetect/pull/142. The implementation originates from #2364 and @alemagnani, I've rebased and added support for MemorySparseDataLayer and batches.

I'd be happy to PR it because my fear is that it interferes with later Caffe changes and takes time to maintain on our own branch. Though we're committed to it, this is going into produciton. I'm a bit too busy immediately, so if this is helpful now and someone wants to PR quickly, please do.

Caenorst · 2016-06-16T21:39:30Z

Are you sure, that we are talking about the same thing ? If I understood well what you've done, the bottom is sparse, but deep compression is suppose to have sparse weights, isn't it ?

But even if I'm right this can be a nice inspiration for doing the speedup version.

EDIT:
I still think that doing the pruning offline with pycaffe is not so convenient because there is a lot of test to do to apply deep compression, and calculating the mask offline force us to keep the mask in the prototxt (which double the size of the data in the layer), so I think it's easier to just have one option in the prototxt.
And the slowdown is like nothing (if you don't use the pruning it just add one boolean check).

For the sparse representation it's another thing, let me know if you disagree.

beniz · 2016-06-16T23:21:22Z

If I understood well what you've done, the bottom is sparse, but deep compression is suppose to have sparse weights, isn't it ?

Correct, good point.

Harick1 · 2016-07-20T10:05:08Z

@beniz I saw your implementation of sparse matrix computing, that's perfect. But I have some questions.
Do you test the speed of the function caffe_cpu_csr_gemm() and which BLAS library do you used?
I used mkl to test the speed but I found that the sparse version was slower than the normal version. That's strange.
In my case, the input/output data is dense and the weights is sparse. Could you give me some advices?

beniz · 2016-07-20T10:20:12Z

@yeahkun I haven't tested against BLAS libraries, I use OpenBLAS everywhere. Have you tested the speed of caffe_cpu_csr_gemm() alone ? On general performances see #2364 (comment) and #2364 (comment)

For sparse weights, as stated by @Caenorst you certainly need to modify the code because at the moment it is casting the bottom layer blobs into SparseBlob and using dense weights.

Harick1 · 2016-07-21T02:08:07Z

@beniz Actually I have trained a sparse model according to deep compression mentioned by @Caenorst . But in inference time, I use the sparse matrix-matrix computing function in mkl ( mkl_scsrmm() ) to replace the original dense computing function cblas_sgemm() and I found it's slower to use the mkl_scsrmm() (~0.19s, compared to cblas_sgemm(), ~0.14s). I also test the speed of caffe_cpu_csr_gemm() (~0.2s).

I used the googlenet to do all above experiments and the compression rate is about 30% for the sparse model.

DanikKamilov · 2016-11-26T22:04:42Z

Hi, I have error: Error parsing text-format caffe.NetParameter: 109:17: Message type "caffe.LayerParameter" has no field named "pruning_param".
what I must to do?

mistiansen · 2017-06-02T15:20:54Z

Hi, is this usable?

Caenorst · 2017-06-02T15:53:54Z

Hi, it should still be usable, but please notice that it never have been merged (I actually forgot that I had to do the unit tests - -') so I guess you need to compile my own version.
@saiguruju: just add in your model prototxt

layer {
  ...
  type: InnerProduct
  ...
  pruning_param {
    coeff: _here goes the pruning rate, between 0 and 1, with 0 = no pruning_
  }
  ...
}

xizi · 2017-07-17T03:36:37Z

@beniz hi, is there a tutorial for how to use sparse matrix computing method?

xizi · 2017-07-17T03:39:48Z

@Caenorst hi, is there a detail document for how to use model pruning?

zyclonb · 2017-10-20T08:27:50Z

@yeahkun Does "trained a sparse model" mean first pruning a pre-trained model and then doing fine-tuning to recover the accuracy?

zyclonb · 2017-10-20T09:22:28Z

@Caenorst Thanks for your sharing! Here's some of my understanding about your work. Please correct me if anything mis-understood.

To prune a pre-trained caffemodel is simply to mask small weights according to the pruning parameter. And the reason to do it on-line, instead of off-line, is the flexibility to play different pruning parameter for checking inference accuracy/performance.
No modificaton to original graph topology involved in pruning.
Thus no re-training/fine-tuning is needed for the pruned model.

Caenorst · 2017-10-21T07:48:59Z

Hi @zyclonb,
given the original publication you do need to fine-tune the network after pruning it. My features allow you to do so.

I used an online approach because there is actually no reason to do it offline (it add another manipulation for the user and also you would have to stock the mask in memory). You don't need to save the mask, you can directly put the weights at 0. Then when you want to reprune, the weights at 0 will be obviously pruned first.

@xizi
I don't know how to explain better how to use the pruning method than what I wrote earlier on the thread.
Of course you have to know a little what is this pruning method about, otherwise it makes no sense to try to use it.

Also I haven't applied the sparse GEMM approach, but I believe it can be done separately.

Noiredd · 2017-10-24T12:42:34Z

src/caffe/proto/caffe.proto

@@ -389,6 +389,7 @@ message LayerParameter {
  optional PoolingParameter pooling_param = 121;
  optional PowerParameter power_param = 122;
  optional PReLUParameter prelu_param = 131;
+  optional PruningParameter pruning_param = 148;


Shouldn't it be 147? Since in the comment above you say that the next available ID is 148.

Indeed, it should be 147.

mristin · 2018-05-14T07:13:27Z

Hi,
Could someone please tell me what the status of this pull request is? Is any progress planned? Should we pick it up (and if so, could you maybe point me to what remains to be done)?

Is there maybe a parallel and similar pull request? I would be interested to test the compression proposed by Han et al. in "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding", ICLR 2016.

I suppose this pull request would handle the "pruning" part?

Caenorst force-pushed the deep_compression branch from c8166a4 to 81c9127 Compare June 12, 2016 17:50

Add pruning possibilities at inner_product_layer

ac154d3

Caenorst force-pushed the deep_compression branch from b5ed779 to ac154d3 Compare June 12, 2016 18:06

seanbell reviewed Jun 12, 2016
View reviewed changes

kamadforge mentioned this pull request Sep 9, 2016

Practical tutorial how to implement Deep Compression for a network in Caffe #4697

Closed

joyhuang9473 mentioned this pull request Jul 14, 2017

Pruning Convolutional Neural Networks for Resource Efficient Inference number9473/nn-algorithm#124

Open

Noiredd reviewed Oct 24, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pruning possibilities at inner_product_layer #4294

Add pruning possibilities at inner_product_layer #4294

Caenorst commented Jun 12, 2016

seanbell Jun 12, 2016 •

edited

seanbell commented Jun 12, 2016 •

edited

seanbell commented Jun 12, 2016 •

edited

Caenorst commented Jun 12, 2016

Caenorst commented Jun 12, 2016

seanbell commented Jun 13, 2016

ajtulloch commented Jun 14, 2016

Caenorst commented Jun 14, 2016

ajtulloch commented Jun 14, 2016

Caenorst commented Jun 14, 2016

jpiabrantes commented Jun 15, 2016

beniz commented Jun 16, 2016 •

edited

ajtulloch commented Jun 16, 2016

beniz commented Jun 16, 2016 •

edited

Caenorst commented Jun 16, 2016 •

edited

beniz commented Jun 16, 2016

Harick1 commented Jul 20, 2016

beniz commented Jul 20, 2016

Harick1 commented Jul 21, 2016

DanikKamilov commented Nov 26, 2016 •

edited

mistiansen commented Jun 2, 2017

Caenorst commented Jun 2, 2017

xizi commented Jul 17, 2017

xizi commented Jul 17, 2017

zyclonb commented Oct 20, 2017

zyclonb commented Oct 20, 2017

Caenorst commented Oct 21, 2017

Noiredd Oct 24, 2017

Caenorst Oct 29, 2017

mristin commented May 14, 2018

Add pruning possibilities at inner_product_layer #4294

Are you sure you want to change the base?

Add pruning possibilities at inner_product_layer #4294

Conversation

Caenorst commented Jun 12, 2016

seanbell Jun 12, 2016 • edited

Choose a reason for hiding this comment

seanbell commented Jun 12, 2016 • edited

seanbell commented Jun 12, 2016 • edited

Caenorst commented Jun 12, 2016

Caenorst commented Jun 12, 2016

seanbell commented Jun 13, 2016

ajtulloch commented Jun 14, 2016

Caenorst commented Jun 14, 2016

ajtulloch commented Jun 14, 2016

Caenorst commented Jun 14, 2016

jpiabrantes commented Jun 15, 2016

beniz commented Jun 16, 2016 • edited

ajtulloch commented Jun 16, 2016

beniz commented Jun 16, 2016 • edited

Caenorst commented Jun 16, 2016 • edited

beniz commented Jun 16, 2016

Harick1 commented Jul 20, 2016

beniz commented Jul 20, 2016

Harick1 commented Jul 21, 2016

DanikKamilov commented Nov 26, 2016 • edited

mistiansen commented Jun 2, 2017

Caenorst commented Jun 2, 2017

xizi commented Jul 17, 2017

xizi commented Jul 17, 2017

zyclonb commented Oct 20, 2017

zyclonb commented Oct 20, 2017

Caenorst commented Oct 21, 2017

Noiredd Oct 24, 2017

Choose a reason for hiding this comment

Caenorst Oct 29, 2017

Choose a reason for hiding this comment

mristin commented May 14, 2018

seanbell Jun 12, 2016 •

edited

seanbell commented Jun 12, 2016 •

edited

seanbell commented Jun 12, 2016 •

edited

beniz commented Jun 16, 2016 •

edited

beniz commented Jun 16, 2016 •

edited

Caenorst commented Jun 16, 2016 •

edited

DanikKamilov commented Nov 26, 2016 •

edited