Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom weight init? #6813

Closed
DrChainsaw opened this issue Dec 6, 2018 · 6 comments

Comments

@DrChainsaw
Copy link

commented Dec 6, 2018

I was looking for a way to set identity mapping weight init for convolution layers, but from what I can tell the only way to do this is either to set in the shape from the outside before init or the pseudo-hack of extending OrthogonalDistribution as other distributions don't get to see the shape.

Would there be any issues in supporting an IWeightInit which just gets to see the correctly shaped paramView (or paramView and shape). I could take a stab at a PR if there aren't any reasons you don't want this.

I guess the implementations should be serializable in the same way as other network configs, but other than that it looks pretty straight forward.

@DrChainsaw

This comment has been minimized.

Copy link
Author

commented Dec 6, 2018

Ok, just realized that extending/implementing Distributions doesn't work either as they come from Distributions.createDistribution.

Old sins or is there some reason for the extra effort of not allowing custom distributions?

@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented Dec 6, 2018

Yeah, the only reason we don't have fully custom initializations (via an interface) is we haven't needed it internally, and there hasn't been much user demand... and specifying a distribution has been fine for most people who want custom inits.

I'd be in favor of adding a properly extensible/custom weight initialization interface - no real issues there other than it must be JSON serializable and any changes must be backward compatible for old serialized nets (depending on the design you go with, that would be trivial or a little challenging, but I can point you in the right direction in a PR review or whatever).

@DrChainsaw

This comment has been minimized.

Copy link
Author

commented Dec 7, 2018

I'll give it a shot then!

Just so I don't waste alot of time for nothing: Do I need to build libnd4j for this?

I was hoping I could avoid this as I won't be touching anything but java code (or?), but after spending some time with the cloned monorepo I'm not so sure anymore.

Assuming I manage to get going, I will look at how it was done for activation functions and try to copy that design (I assume they are also backwards compatible).

@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented Dec 7, 2018

@DrChainsaw if you're just modifying/building DL4J (using maven) ND4J snapshots should be pulled in automatically (including the native libraries - i.e., libnd4j) from sonatype.

@AlexDBlack

This comment has been minimized.

Copy link
Contributor

commented Jan 17, 2019

Implemented and merged some time ago here: #6820

@AlexDBlack AlexDBlack closed this Jan 17, 2019

@lock

This comment has been minimized.

Copy link

commented Feb 16, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Feb 16, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants
You can’t perform that action at this time.