-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autograd Layer constructor #674
Comments
I think it's not an issue. When we use the computational graph, initialization operations won't be buffered since we just need to execute them once. For these operations, I just execute them immediately instead of buffering them into the graph at present. So before running the entire graph, all parameter tensors will be initialized. |
The module class just buffers the operations during calling the __call__ function of the Layer and the forward function and backward function of the Operation. |
I think the Module class is quite similar to Layer, we can think about combining them together. At present, users can't get params from the Module, they can only get them from the Layer. If the Layer class is the subclass of Module, this problem will be solved. |
Yes, if the initization is in the init function, it will be not buffered automatically. If the initialization is in call function, we can still add a few lines to turn off the buffering. In both case, the graph function won't be affected. |
Yes, when we buffer the ops, there won't be any actual run, but all the parameters size will be obtained after the call function. |
After calling the constructor of Layer class. |
Constructor might not be enough, because in the case of RNN the CudnnRNNHandle.weight_size is inferred from the input size. So this can only be obtained from the forward function. FYI, this is the new Shicong's code in his PR (I guess this issue is for solving this PR problem): singa/python/singa/autograd.py Line 3406 in 19f2020
|
do you mean making Module a subclass of Layer? |
how to turn it off? |
but you need to identify the operations for initialization.. how to tell if an operation is for initialization or not? |
To turn off just need to add three lines:
|
If the initization statement is in forward function: to tell that it is an initialization operation not for buffering, we need to add three lines:
|
I think it would be better to define a new function to wrap these lines:
|
@XJDKC Since we have used this stucture at least 2 times before, this wrapper should be useful. Could you help us wrap these line to turn off the buffering temporatory for tensor initialization (need to select the most suitable place/class to add the wrapper) |
we may need more discussion before the implementation, e.g., which option to go? |
My personal opinion is:
@XJDKC @dcslin @joddiy may you please give your suggestions as well |
We can make the code compatible in this way # in v3.1
def Foo(**kwargs):
check if the args are for v3.0 or v3.1 and then call FooV300 or FooV310 In V4.0, we change the API completely. yes. We may need to update Module later to support more functionalities like save and load. |
Yes, this selector for function version is very helpful (says v3.1), and will make the code compatible. Then later in V4.0, can change the API completely. |
Sorry, I haven't checked the newest code. Handles in the past code like ConvHandle are initialized in the constructor, so I thought CudnnRNNHandle was the same. |
Yes, or making Layer a subclass of Module. |
At present, if the operations are for initialization, they are likely to be in the constructor function. So I just turn on the graph before calling the __call__ function of Layer and forward and backward function of Operation. So all the initialization operations in the constructor will be executed immediately. For those initialization operations in the functions I mentioned above, I turn off the buffer before calling them. Like this, turn off graph. |
@XJDKC FYI Even ConvHandle is not initialized in the constructor of Layer but in the call function, but the RNN is different from conv2d just because there is tensor initization in the call function https://github.com/apache/singa/blob/master/python/singa/autograd.py#L1758 |
@XJDKC Yes, that' why prof. suggest add the wrapper to it:
|
@chrishkchris. Thanks for the correction, It is my fault. But for the initialization of the handle, it won't submit operations to Device. So they are not included in the graph either. |
Yes, we can use a decorator to achieve it. |
yes, constructor of handle don't contain any operations to submit to Device, no matter in conv2d or in rnn. In Shicong code the only thing to modify is to turn off buffering by wrapping the line in the link below singa/python/singa/autograd.py Line 3406 in 19f2020
While in long term (says V4.0), it is possible all the initizaltion will be moved to init function of Layer, not in the call function |
Got it, thanks. |
resolved in #697 |
The Layer class in Autograd is to maintain the model parameters.
It passes the parameters into the operation and thus operations are stateless.
Typically the parameter size depends on the input and layer configuration.
Currently, we require the users to provide the input size in the layer constructor.
Then we can create the parameter tensor and initialize it in the constructor, e.g., Linear layer. One potential problem is that the initialization operation may not be buffered. @XJDKC Is it an issue?
For some layers like RNN implemented using cudnn, although we can get the input size, the parameter size is unknown until the cudnn handle is created, which is done until the data is forwarded through the layer.
Another way is to delay the parameter tensor creation until the layer is called for forward propagation. At that time, we have the input tensor (and its device). Then in the layer constructor, we do not need the user to provide the input size. The drawback is that after the layer is created, the get_params() function would still fail to get the parameter tensors as they are not created yet. @dcslin To switch to this approach, we need to change the constructors of existing layer classes and examples. We also need to provide an initializer function/class into the constructor for initializing the parameter tensors after they are created.
Please add your comments.
The text was updated successfully, but these errors were encountered: