Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Complex TransferLearning graph - no updates of first layer #4964
DL4J 1.0.0-alpha, Native CPU
I create rnn model, train it and expand by adding second inputs with rnn layers. I use LastTimeStepVertex because of rnn of different branches use different timeseries length and I need to merge them.
It looks like this is a legitimate bug. We currently abort calculating more gradients once the first frozen layer according to the topological sort is found (https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/ComputationGraph.java#L2220-L2235)
In this configuration this results in the following iteration order (Names as in the screenshot above):
Due to short circuiting rules, gradient propagation stops at the first frozen layer (1.0.0-alpha):
In this case that is M1_1. Therefore there is no gradient calculated for M2_0. Now with the changes in master, the short circuit rules have changed a bit, and just resetting the hitFrozen Flag on each iteration through the vertices seems to solve the issue.
But I'm not quite sure if that is really the proper solution, or if the topological sort should result in a different ordering if parts of the graph are frozen.