Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Manipulating nn.Dense(...) layer parameters #11133

Closed
lu4 opened this issue Jun 4, 2018 · 4 comments
Closed

Manipulating nn.Dense(...) layer parameters #11133

lu4 opened this issue Jun 4, 2018 · 4 comments
Labels

Comments

@lu4
Copy link

lu4 commented Jun 4, 2018

I'm trying to implement my own optimization algorithm for MxNet (Imperative / Gluon) that does not rely gradients. My question is pretty simple is there a simple way to create new nn.Dense(...) layer initialized with parameters (i.e. Biases and Weights) represented by just two nd.array() instances?

Thank you in advance!

@anirudhacharya
Copy link
Member

@lu4 check here for resources on creating custom gluon layers - https://gluon.mxnet.io/chapter03_deep-neural-networks/custom-layer.html#Craft-a-bespoke-fully-connected-gluon-layer

@sandeep-krishnamurthy please label - "Gluon", "Question"

@lu4
Copy link
Author

lu4 commented Jun 4, 2018

@anirudhacharya please let me rephrase my question, since there were two parts to it:

First part:
Problem: It is hard to train large network. It's simpler to start with smaller layer size and incrementally increase the size along the training. In such way it will also reduce the chance of overfitting.

Motivation: It's straightforward and safe to augment weight matrix for hidden layers with zero matrix, as it won't affect the state of training.

Please, consider the following example: https://i.imgur.com/RSPJgAo.png

Please note that augmenting zeros (or weights very close to zero) won't affect the state of existing training if subject ActivationFunction at 0 is 0.

Second part:
Is there a way to not perform (re)implementation of all existing layer types but instead just pass the weights I desire to put into a net?

@thomelane
Copy link
Contributor

Hi @lu4,

You can create a clone of your network, and then make adjustments during the copy. If you're using a Sequential Block as a container for your network, you could create another Sequential Block and add all of the layers from one network to the other, which would save redefining the network. You would make changes to the necessary layers before adding them to the new Sequential Block.

As I understand the problem, you'll need to change the weights and biases for the layer you want to expand, and the weights for the next dense layer (as the weights shape depends on the units in the layer before which has changed). After constructing the news weights and biases (i.e. padding with 0s), you can then use set_data on the parameters of interest before adding to the new Sequential Block.

Unfortunately I don't think you can't mutate the original network like this, because you're changing the shape of the parameters. You'll hit shape assertion errors. And you can't just swap out a single layer in the original Sequential Block because they don't support assignment.

@ThomasDelteil
Copy link
Contributor

ThomasDelteil commented Jul 5, 2018

@lu4
To expand on @thomelane, here is a practical example of how you can set the data:

# Create a layer
net = gluon.nn.Dense(2, in_units=100, use_bias=True)
net.initialize()
​
# Update the weights of a layer
net.weight.set_data(mx.nd.ones((2,100)))
net.bias.set_data(mx.nd.ones((2)))

net(mx.nd.ones((1,100)))
[[101. 101.]]
<NDArray 1x2 @cpu(0)>

To expand on his warning, you'd need to initialize the network at the maximum size first, because you can't reshape the parameters but you can indeed fill them with weights padded with zeros.

@indhub Could you please close the issue? Thanks!
@lu4 if that doesn't answer your question and you would like to follow up, please create a post on https://discuss.mxnet.io Thanks!

@indhub indhub closed this as completed Jul 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants