Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparsity penalties for unsupervised learning #60

Closed
aravindhm opened this issue Jan 27, 2014 · 9 comments
Closed

Sparsity penalties for unsupervised learning #60

aravindhm opened this issue Jan 27, 2014 · 9 comments

Comments

@aravindhm
Copy link

Is there an easy way to implement L1 regularization on the weight matrix of a fully connected network. Similarly I want to penalize the L1 norm of features in each layer. What is the best way to do that using caffe?

@Yangqing
Copy link
Member

Something like a regularizer that could be attached to a layer, similar to
the one I wrote in decaf (see e.g.
https://github.com/UCB-ICSI-Vision-Group/decaf-release/blob/master/decaf/base.py#L217).
Caffe hasn't got a regularizer in place yet, mainly because I was simply
using weight decay for the imagenet training.

Yangqing

On Mon, Jan 27, 2014 at 8:46 AM, aravindhm notifications@github.com wrote:

Is there an easy way to implement L1 regularization on the weight matrix
of a fully connected network. Similarly I want to penalize the L1 norm of
features in each layer. What is the best way to do that using caffe?

Reply to this email directly or view it on GitHubhttps://github.com//issues/60
.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 16, 2014

@aravindhm, I found that you have already implemented an L1 norm layer in you own branch. Would you please take a look at my implementation (#113) following the advice of @Yangqing and tell me whether we are solving the same problem? As far as I can see, your contribution is relatively independent of mine and is well worth being merged back into the master branch here.

There are a large number of public or private forks of Caffe out there. I had a look at some of the branches that are updated recently. Authors are working on various problems very actively. A diverse community will certainly accelerate the evolution of this project. It is a very healthy phenomenon.

But at the same time, I hope there are as few duplicate efforts as possible. I would like the project owners, contributors and everyone who cares about this project to discuss the issue and find out a solution.

@aravindhm
Copy link
Author

My branch has made too many modifications. All the changes were made to the boost-eigen (I couldn't buy MKL) branch and a few were made to the master. Some of these are still broken. They include

  1. A method to share weights across blobs by averaging out the gradient before updating it. The momentum is not averaged.
  2. Make the euclidean layer output a loss layer (top->size() <= 1), so that the network at test time prints an error instead of having no output. I had added a gpu implementation but that didn't given any performance gain because of very few floating point operations per byte loaded. I therefore removed it.
  3. A tanh layer. Sparse convolutional autoencoders used this instead of ReLU.
  4. Change reLU layer to use thrust as otherwise it launches many threads and they don't do much work individually. I found that a problem because the older Tesla GPUs (like the ones on amazon ec2 cg1.4xlarge) cannot launch as many threads as the newer ones and I get a configuration error when the kernel is called.
  5. L1Normlayer - I'm still working on this. The gradient check fails. But the regularizer layer implemented by @kloudkl is much better. The L1Norm layer is not useful in any popular architecture except as a regularization.
  6. A bunch of example files to dump network parameters into stdout or network parameter differences into stdout etc.

Since the commits for these are interleaved, it makes merging very tough. Can merging be done on a file to file basis?

@aravindhm
Copy link
Author

If a dev branch is made I can copy atleast the tanh layer into it and have that merged without disturbing other branches?

@shelhamer
Copy link
Member

Merging can be done commit-by-commit through cherry-picking, and with interactive rebasing–see github help topic and git book chapter–anything is possible.

To start, branch from whatever branch has all your intermingled work, and then you can sift out the desired changes from there. For instance, you could create a tanh branch, weight-sharing branch, etc.

Rebasing is how I have been integrating boost-eigen changes while still tracking master and selecting from merges like #97 . Cherry-picking is sometimes helpful, but relying on it all the time usually suggests deeper workflow issues to sort out.

Hope these tips help.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 17, 2014

It seems that @aravindhm has solved the problem in #116. We don't have to write a step by step guide by ourselves. A how to contribute doc with links to the most helpful external guides or tutorials is enough.

@aravindhm
Copy link
Author

I didn't cherry pick this time. I created a new local copy of master and made a branch of master (tanh). I copied the files in manually - very small effort in this case and sent a pull request.

@kloudkl
Copy link
Contributor

kloudkl commented Feb 18, 2014

@aravindhm, your branch has a lot more good features and I hope they will be picked out and merged back too if you would like to. If they are mixed together in the commits, copying each of them separately is perhaps the only way to go. Any method that works is the best since we don't have to be bound by the tools.

@shelhamer
Copy link
Member

Sparsity penalties are addressed by #113.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants