-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does storing module layers in arrays break the learning process? #532
Comments
@Metritutus -- it's because "RegisterComponents" doesn't pick up on arrays, or lists, or any of the built-in collections as somewhere to look for modules. In the first case, the parameters are simply not being trained, so you get pretty much random behavior. Use ModuleList or ModuleDict, instead of standard .NET collections. That's precisely what they are meant to do. |
@Metritutus -- can you try your original code, but call 'RegisterComponents()' and the move to CUDA after the creation of 'Layers'? Sequential doesn't look like it's getting registered, either, so when you call 'parameters()' on the module, it's likely an empty collection. |
Also, and this has nothing to do with the issue you're seeing -- if you're going to stick the layers in a 'Sequential', there's really no reason to declare them as fields unless you want them for debugging purposes, but if that's the reason, Sequential often gets in the way, since you can't single-step the layers and see what is passed between them. |
@Metritutus -- your repro led me to another error -- double-reporting of parameters when the Sequential and its component modules are both registered. |
Ah, this explains it!
Ah, this was because I was merely trying to organise the layer types. I hadn't even reached the point of thinking about debugging through the individual layers as I don't have a sufficient understanding of how this all works yet. The use of
This did indeed have a positive impact on the issue (without updating the arrays to be
Is this implying that if I use |
No, it's implying that if you choose (and, as I said, there's no real reason to do so, but it's perfectly legal) to include both the layers and the Sequential as fields in the module, then TorchSharp should not be listing parameters twice, as it does now. There's a bug there, which I have a fix for. Sequential does allow you not to "faff around" with forwarding data. The only reason not to use it, ever, is that you want to debug the model and see what is passed around. Or, you may have a model that is more complex and does not just pass data in a linear fashion from first to last. Anyway, if you do use Sequential, there's no reason to also list the components as fields for the purpose of implementing forward(). There may be other reasons, such as manipulating their weights, or forming optimizer parameter groups from subsets (in a coming release) of the modules rather than all of them. |
That's because the Sequential module gets registered. |
Surely this would be disingenuous? It is clearly quoting a reply from another
I confess I couldn't find a good example of it being used with a model class. I was attempting to follow the MNIST example, which did not use it. I first found mention of it in the Memory article, but it held no example. The 'faff' I referred to is how each output from
Indeed, you inferred this earlier! Anyway, my issue has been resolved as a result of the suggestions above, so I'll close this now. Cheers! |
@Metritutus -- I agree, and it wasn't I who suggested it. |
Ack, my sincere apologies! |
This is all very new to me, so I apologise for using any incorrect terminology, and also if this is something fundamental that I've missed which is documented somewhere.
Using the MNIST example as a reference, I was attempting to create my own version of that, but found that it didn't seem to learn, and that accuracy was absolutely terrible (ie less than 11% with an average loss greater than 2).
I discovered, by experimentation, that it seemed to be failing because I had stored my layers in arrays for the module properties. For some reason this breaks it, and I am uncertain why.
This is how I had it before:
And this is what works:
This second model class results in an accuracy of greater than 98%, with an average loss of roughly 0.04.
I'm assuming there must be something fundamental that I'm missing, but I'm not sure what. When I looked at the documentation on modules, it mentions
"Note that the field names in the module class correspond to the names that were passed in to the Sequential constructor in the earlier section."
.I'm not sure what earlier section it refers to (if it's another Help article, the Memory one only mentions
Sequential
, but does not demonstrate it). And even with that said, even if I specify the names manually (in the commented-out section ofModel1
), it still doesn't seem to work correctly.The text was updated successfully, but these errors were encountered: