Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DL4J (/SameDiff): Add L2/L1 regularization schedules #7076

Closed
AlexDBlack opened this issue Jan 26, 2019 · 3 comments
Closed

DL4J (/SameDiff): Add L2/L1 regularization schedules #7076

AlexDBlack opened this issue Jan 26, 2019 · 3 comments
Labels
DL4J General DeepLearning4j issues SameDiff Autodiff related issues

Comments

@AlexDBlack
Copy link
Contributor

AlexDBlack commented Jan 26, 2019

Currently, the L1/L2 regularization coefficients are fixed values.

From gitter, @stolsvik

What I see is that when using a schedule for lr, and the lr becomes very low and stays there, then it seems like the l2 "overwhelms" the parameters (another observation is that parameters:updates ratio goes up, and that the network "collapses"), and I've speculated that the result is basically zeroing out of all the weights.

Though fixed L1/L2 is most common in practice, that's a reasonable observation, and I can see how that could occur in some cases. Adding L1/L2 schedules would provide extra flexibility for situations like this.
(Side note: it woudl be nice to report in the UI the contribution of the loss function and the L2)

Aha! Link: https://skymindai.aha.io/features/ND4J-37

@AlexDBlack AlexDBlack added DL4J General DeepLearning4j issues SameDiff Autodiff related issues labels Jan 26, 2019
@stolsvik
Copy link

stolsvik commented Jan 26, 2019

Nice. But I also have a question of whether dl4j actually does this wrong? In that the more common setup for l2, AFAIU, is that l2 is affected by the learning rate with the l2-decay is "within the parenthesis" before being multiplied by lr (thus it will be proportionally affected by lr). But since you do the l2-correction as a separate step, you have fixed the l2-effect. (It just hit me: If the lr goes below the l2, you'd actually negate the gradient?!)

Here's a comment from another issue, which raises the same question:
#5843 (comment)
.. it might be interesting to read a couple of comments upstream for that.

@AlexDBlack
Copy link
Contributor Author

Right, as discussed is gitter, the "main" issue is this: #7079
It might still be nice to add this feature, however.

@lock
Copy link

lock bot commented Mar 3, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DL4J General DeepLearning4j issues SameDiff Autodiff related issues
Projects
None yet
Development

No branches or pull requests

2 participants