Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complex TransferLearning graph - no updates of first layer #4964

Closed
ospavel opened this issue Apr 23, 2018 · 3 comments · Fixed by #5009

Comments

@ospavel
Copy link

commented Apr 23, 2018

DL4J 1.0.0-alpha, Native CPU

I create rnn model, train it and expand by adding second inputs with rnn layers. I use LastTimeStepVertex because of rnn of different branches use different timeseries length and I need to merge them.
The problem is that during training the first lstm layer of second branch (added) have no changes of Update and Ratio (seen in UI).

The code

FineTuneConfiguration fineTuneConf = new FineTuneConfiguration.Builder()
        .seed(1234567)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(new Adam(0.1)).l2(1e-5).activation(Activation.TANH).weightInit(WeightInit.XAVIER).build();

ComputationGraphConfiguration graph1Conf = new NeuralNetConfiguration.Builder()
        .seed(12345)
        .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
        .updater(new Adam(0.1)).l2(1e-5).activation(Activation.TANH).weightInit(WeightInit.XAVIER)        .trainingWorkspaceMode(WorkspaceMode.SEPARATE).inferenceWorkspaceMode(WorkspaceMode.SINGLE)
        .graphBuilder()
        .addInputs("M1_In")
        .addLayer("M1_0", new GravesLSTM.Builder().nIn(43).nOut(8).build(), "M1_In")
        .addLayer("M1_1", new GravesLSTM.Builder().nIn(8).nOut(8).build(), "M1_0")
        .addVertex("M1_Last", new LastTimeStepVertex("M1_In"), "M1_1")
        .addLayer("M1_Out", new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT).activation(Activation.SOFTMAX).nIn(8).nOut(2).build(), "M1_Last")
        .setOutputs("M1_Out")
        .pretrain(false).backprop(true).build();

ComputationGraph graph1 = new ComputationGraph(graph1Conf);
graph1.init();

// ... here is training of model 1

ComputationGraph graph = new @TransferLearning.GraphBuilder(graph1)
        .fineTuneConfiguration(fineTuneConf)
        .setFeatureExtractor("M1_Last")
        .removeVertexKeepConnections("M1_Out")
        .addInputs("M2_In")
        .addLayer("M2_0", new GravesLSTM.Builder().nIn(43).nOut(8).build(), "M2_In")
        .addLayer("M2_1", new GravesLSTM.Builder().nIn(8).nOut(8).build(), "M2_0")
        .addVertex("M2_Last", new LastTimeStepVertex("M2_In"), "M2_1")
        .addLayer("Merge", new DenseLayer.Builder().nIn(16).nOut(4).build(), "M1_Last", "M2_Last")
        .addLayer("Output", new OutputLayer.Builder(LossFunctions.LossFunction.MCXENT).activation(Activation.SOFTMAX).weightInit(WeightInit.XAVIER).nIn(4).nOut(2).build(), "Merge")
        .setOutputs("Output")
        .build();

// ... here is training of model

Model structure and no updates for layer M2_0
structure

@ospavel

This comment has been minimized.

Copy link
Author

commented Apr 23, 2018

@treo, I sent test data into PM

@raver119 raver119 added Bug Java labels Apr 26, 2018

treo added a commit that referenced this issue Apr 28, 2018
Reset hitFrozen Flag
Allows training of multi-input, multi-branch computation graphs where one branch is frozen, fixes #4964
@treo

This comment has been minimized.

Copy link

commented Apr 28, 2018

@AlexDBlack @agibsonccc

It looks like this is a legitimate bug. We currently abort calculating more gradients once the first frozen layer according to the topological sort is found (https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/ComputationGraph.java#L2220-L2235)

In this configuration this results in the following iteration order (Names as in the screenshot above):
Output, Merge, Merge-merge, M2_Last, M1_Last, M2_1, M1_1, M2_0, M1_0, M2_In, M1_in

Due to short circuiting rules, gradient propagation stops at the first frozen layer (1.0.0-alpha):
https://github.com/deeplearning4j/deeplearning4j/blob/deeplearning4j-1.0.0-alpha/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/graph/ComputationGraph.java#L1985

In this case that is M1_1. Therefore there is no gradient calculated for M2_0. Now with the changes in master, the short circuit rules have changed a bit, and just resetting the hitFrozen Flag on each iteration through the vertices seems to solve the issue.

But I'm not quite sure if that is really the proper solution, or if the topological sort should result in a different ordering if parts of the graph are frozen.

@lock

This comment has been minimized.

Copy link

commented Sep 22, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Sep 22, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants
You can’t perform that action at this time.