Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need ability to call input shape from GraphBuilder #5019

Closed
crockpotveggies opened this issue Apr 30, 2018 · 6 comments
Closed

Need ability to call input shape from GraphBuilder #5019

crockpotveggies opened this issue Apr 30, 2018 · 6 comments
Labels
Enhancement New features and other enhancements

Comments

@crockpotveggies
Copy link

Issue Description

When building a network such as NASNet, calling the shape of the input tensor is necessary. For example: https://github.com/keras-team/keras/blob/master/keras/applications/nasnet.py#L498

As per conversation with @AlexDBlack

hm... so I've been staring at this for quite a while (and the paper - figure 4 - https://arxiv.org/pdf/1707.07012.pdf)
seems like yes, they are dynamically looking at the current output size after adding each block
for doing this in DL4J: I see that we have 3 options
One-off manual calculation in this model's builder code: i.e., input/output sizes. Shouldn't be too hard, just tedious (we expose the calcs via ConvolutionUtils)
We augment GraphBuilder with "give me the output size of all layers as a map, for this input size" functionality. Doable, a little messy though (I'd want to keep GraphBuilder simple)... that sort of thing is better in ComputationGraphConfiguration.
We augment ComputationGraphConfiguration with "add layer" type functionality, and expose the existing InputType functionality for each layer (to get input/output shapes)
Option 3 seems like the most reasonable to me, and would allow you to follow the same sort of "inspect and add to an existing config" type pattern there. I guess give me an issue for that and I can try to squeeze it in this week

Version Information

v1.0.0-snapshot

@AlexDBlack
Copy link
Contributor

OK, so adding methods like addLayer etc to ComputationGraphConfiguration is straightforward - no issues there. (And is underway here: #5031)

One issue however is the idea of the global configuration overrides. Note that in the graph builder, we specify the global config defaults before the .graphBuilder() call. However, this default information isn't available in the instantiated ComputationGraphConfiguration (only in the GraphBuilder - or more accurately, the NeuralNetConfiguration.Builder in the GraphBuilder).

Now, I see basically 3 options here:

  1. Don't allow global config overrides when using ComputationGraphConfiguration.addLayer (downside: error prone - easy for users to not understand this unless they the javadoc (most won't for such a simple method))
  2. Store the global config in ComputationGraphConfiguration in case we need it (downside: not present in all pre-1.0.0-beta nets, hence also error prone for users)
  3. Allow users to (optionally) pass in a global configuration object

Not sure what the best option is, but for option 3, that would make the API basically:
.addLayer(String name, Layer layer, SomeGlobalConfigObj globalConfig, String... inputNames)

I'm somewhat hesitant to have a "no global config" method in addition for option 3, like .addLayer(String name, Layer layer, String... inputNames), due to the rather different behaviour with the GraphBuilder method of the same signature.

Alternatively: we look at option 2, but with something like:
.addLayer(String name, Layer layer, String... inputNames)
.addLayer(String name, Layer layer, boolean applyGlobalConfig, String... inputNames)
with the former calling the latter with a default value of true... then, if the global config isn't present (i.e., a pre-1.0.0-beta net, or maybe an imported Keras net) we simply throw an exception saying it's absent, and how they can set it.

The global config object would basically be like FineTuneConfiguration for transfer learning.

@crockpotveggies
Copy link
Author

I'm actually wondering that, while it may seem like more work, the first original option will be the cleanest when it comes to users

One-off manual calculation in this model's builder code: i.e., input/output sizes. Shouldn't be too hard, just tedious (we expose the calcs via ConvolutionUtils)

I'm happy to assist if needed on this since I think the payoff would be worthwhile. This would avoid any future confusion with global configs I think?

@AlexDBlack
Copy link
Contributor

OK, so following up on our gitter conversation:

To avoid this (possibly somewhat error-prone for users) "global configuration for ComputationGraphConfiguration" issue, we've decided that:

  1. We won't have addLayer etc methods on ComputationGraphConfiguration
  2. We'll add methods to get output sizes on the builder to get output sizes, and to avoid duplicate functionality, we'll simply instantiate a temporary ComputationGraphConfiguration for the output size calcs

@AlexDBlack
Copy link
Contributor

#5031

@crockpotveggies
Copy link
Author

Excellent work :)

@lock
Copy link

lock bot commented Sep 22, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Enhancement New features and other enhancements
Projects
None yet
Development

No branches or pull requests

3 participants