Sparsity penalties for unsupervised learning #60

aravindhm · 2014-01-27T16:46:43Z

Is there an easy way to implement L1 regularization on the weight matrix of a fully connected network. Similarly I want to penalize the L1 norm of features in each layer. What is the best way to do that using caffe?

Yangqing · 2014-01-27T17:31:23Z

Something like a regularizer that could be attached to a layer, similar to
the one I wrote in decaf (see e.g.
https://github.com/UCB-ICSI-Vision-Group/decaf-release/blob/master/decaf/base.py#L217).
Caffe hasn't got a regularizer in place yet, mainly because I was simply
using weight decay for the imagenet training.

Yangqing

On Mon, Jan 27, 2014 at 8:46 AM, aravindhm notifications@github.com wrote:

Is there an easy way to implement L1 regularization on the weight matrix
of a fully connected network. Similarly I want to penalize the L1 norm of
features in each layer. What is the best way to do that using caffe?

Reply to this email directly or view it on GitHubhttps://github.com//issues/60
.

kloudkl · 2014-02-16T02:55:08Z

@aravindhm, I found that you have already implemented an L1 norm layer in you own branch. Would you please take a look at my implementation (#113) following the advice of @Yangqing and tell me whether we are solving the same problem? As far as I can see, your contribution is relatively independent of mine and is well worth being merged back into the master branch here.

There are a large number of public or private forks of Caffe out there. I had a look at some of the branches that are updated recently. Authors are working on various problems very actively. A diverse community will certainly accelerate the evolution of this project. It is a very healthy phenomenon.

But at the same time, I hope there are as few duplicate efforts as possible. I would like the project owners, contributors and everyone who cares about this project to discuss the issue and find out a solution.

aravindhm · 2014-02-16T04:43:11Z

My branch has made too many modifications. All the changes were made to the boost-eigen (I couldn't buy MKL) branch and a few were made to the master. Some of these are still broken. They include

A method to share weights across blobs by averaging out the gradient before updating it. The momentum is not averaged.
Make the euclidean layer output a loss layer (top->size() <= 1), so that the network at test time prints an error instead of having no output. I had added a gpu implementation but that didn't given any performance gain because of very few floating point operations per byte loaded. I therefore removed it.
A tanh layer. Sparse convolutional autoencoders used this instead of ReLU.
Change reLU layer to use thrust as otherwise it launches many threads and they don't do much work individually. I found that a problem because the older Tesla GPUs (like the ones on amazon ec2 cg1.4xlarge) cannot launch as many threads as the newer ones and I get a configuration error when the kernel is called.
L1Normlayer - I'm still working on this. The gradient check fails. But the regularizer layer implemented by @kloudkl is much better. The L1Norm layer is not useful in any popular architecture except as a regularization.
A bunch of example files to dump network parameters into stdout or network parameter differences into stdout etc.

Since the commits for these are interleaved, it makes merging very tough. Can merging be done on a file to file basis?

aravindhm · 2014-02-16T04:49:43Z

If a dev branch is made I can copy atleast the tanh layer into it and have that merged without disturbing other branches?

shelhamer · 2014-02-16T04:50:58Z

Merging can be done commit-by-commit through cherry-picking, and with interactive rebasing–see github help topic and git book chapter–anything is possible.

To start, branch from whatever branch has all your intermingled work, and then you can sift out the desired changes from there. For instance, you could create a tanh branch, weight-sharing branch, etc.

Rebasing is how I have been integrating boost-eigen changes while still tracking master and selecting from merges like #97 . Cherry-picking is sometimes helpful, but relying on it all the time usually suggests deeper workflow issues to sort out.

Hope these tips help.

kloudkl · 2014-02-17T11:48:10Z

It seems that @aravindhm has solved the problem in #116. We don't have to write a step by step guide by ourselves. A how to contribute doc with links to the most helpful external guides or tutorials is enough.

aravindhm · 2014-02-17T19:01:44Z

I didn't cherry pick this time. I created a new local copy of master and made a branch of master (tanh). I copied the files in manually - very small effort in this case and sent a pull request.

kloudkl · 2014-02-18T02:27:28Z

@aravindhm, your branch has a lot more good features and I hope they will be picked out and merged back too if you would like to. If they are mixed together in the commits, copying each of them separately is perhaps the only way to go. Any method that works is the best since we don't have to be bound by the tools.

shelhamer · 2014-03-22T09:13:03Z

Sparsity penalties are addressed by #113.

shelhamer added the enhancement label Feb 5, 2014

shelhamer added the training label Feb 12, 2014

This was referenced Feb 14, 2014

Alternative to weight decay: max column norm #109

Closed

Add & test regularizer class hierarchy: L1, L2 & skeleton of MaxNorm #113

Closed

shelhamer closed this as completed Mar 22, 2014

dsmic mentioned this issue Dec 25, 2014

Multivariate regression #881

Closed

tachiang mentioned this issue Dec 9, 2016

caffe with openblas #5079

Closed

shuguang101 mentioned this issue Jan 20, 2018

Segmentation Fault: 11 - OSX high sierra - please Help #6019

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparsity penalties for unsupervised learning #60

Sparsity penalties for unsupervised learning #60

aravindhm commented Jan 27, 2014

Yangqing commented Jan 27, 2014

kloudkl commented Feb 16, 2014

aravindhm commented Feb 16, 2014

aravindhm commented Feb 16, 2014

shelhamer commented Feb 16, 2014

kloudkl commented Feb 17, 2014

aravindhm commented Feb 17, 2014

kloudkl commented Feb 18, 2014

shelhamer commented Mar 22, 2014

Sparsity penalties for unsupervised learning #60

Sparsity penalties for unsupervised learning #60

Comments

aravindhm commented Jan 27, 2014

Yangqing commented Jan 27, 2014

kloudkl commented Feb 16, 2014

aravindhm commented Feb 16, 2014

aravindhm commented Feb 16, 2014

shelhamer commented Feb 16, 2014

kloudkl commented Feb 17, 2014

aravindhm commented Feb 17, 2014

kloudkl commented Feb 18, 2014

shelhamer commented Mar 22, 2014