Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedules and Dropout #3985

Merged
merged 50 commits into from Sep 13, 2017

Conversation

@AlexDBlack
Copy link
Contributor

commented Aug 31, 2017

WIP DO NOT MERGE

  • Adds ISchedule interface: general purpose schedule mechanism (to be used for LR schedules, dropout schedules, momentum schedules, etc) based on iterations or epochs
    • Adds existing schedule types as classes (see: UpdaterBlock for existing) added, with support for iteration + epoch versions
    • Pushes LR and LR schedule config into updaters, will deprecate .learningRate() and similar methods
  • Refactors dropout into classes, adds:
    • Alpha dropout
    • Gaussian dropout
    • Gaussian noise
    • Schedules for dropout parameters, based on iteration and/or epoch counts
    • Support for custom dropout types/classes (user defined)
  • Removes a bunch of deprecated code (inc. old HistogramIterationListener etc)
  • Introduces IWeightNoise interface
    • DropConnect is now implemented using this (enables dropout/dropconnect simultaneously, and dropconnect schedules)
    • Adds additive/multiplicative weight noise implementation too
    • Could in principle be used to support things like weight binarization, etc (not implemented)
  • Regarding backward compatibility (at least so far)
    • Updaters, dropout and dropconnect still work back to 0.5.0
    • Not properly supported: LR schedules
  • Additional smaller changes:
    • Adds GraphBuilder .layer(...) method, alias for addLayer(...) method (to essentially match listBuilder .list() method)

To do, before this is ready for reviews/merging:

  • Ensure all tests pass

Breaking changes of note here:

  • Removes all deprecated updater config options - epsilon(double) etc. Set directly on Updater
  • Removes .learningRate(double) and .biasLearningRate(double) methods, using .updater(new Sgd(lr)) and .biasUpdater(new Sgd(lr))
  • Removes old learning rate schedule methods - use .updater(new Adam(new MapSchedule(...))) etc
  • Removes .useDropConnect(boolean) - use .weightNoise(new DropConnect(double))
  • Removes .activation(String) method (long deprecated)
  • Removes a bunch of deprecated Layer (layer implementation, not config) methods
  • Removes recently deprecated (and no longer used) .regularization(boolean) method

#3739
#1846
#3892
#3816
#3776
#3700

@maxpumperla
Copy link

left a comment

Just minor things, if any. I really like the new API, totally worth it. Just really a pity that github starts to collapse all the files to review from a certain point... very inconvenient.


public class TestUtils {

public static void testModelSerialization(MultiLayerNetwork net){

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Sep 11, 2017

+1 for this. There should be quite a few more cases where we could use this, no? Feels like I've seen this 100 times.

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Sep 12, 2017

Author Contributor

Yeah, maybe I'll do a pass and consolidate... there's a bunch of equivalent implementations of this test code in many tests.

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Sep 12, 2017

Author Contributor

Done - 67ed41c

@@ -47,7 +48,7 @@
public void testEarlyStoppingIris() {
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.updater(Updater.SGD).weightInit(WeightInit.XAVIER).list()
.updater(new Sgd(0.001)).weightInit(WeightInit.XAVIER).list()

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Sep 11, 2017

Cool, I really like this!

@@ -54,8 +54,8 @@ public void testGradient2dSimple() {
INDArray labels = ds.getLabels();

MultiLayerConfiguration.Builder builder =
new NeuralNetConfiguration.Builder().learningRate(1.0).regularization(false)
.updater(Updater.NONE).seed(12345L).weightInit(WeightInit.DISTRIBUTION)
new NeuralNetConfiguration.Builder().updater(new NoOp())

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Sep 11, 2017

NoOp is a little cryptic, I wouldn't get the purpose if this were to pop up in my IDE... maybe NoOptimizer or even None? What do you think?

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Sep 12, 2017

Author Contributor

Previously it was possible to do .updater(Updater.NONE) which was reasonably clear (it mapped to NoOp behind the scenes).
NoOpUpdater? NoUpdater? NullUpdater? Not sure on the best name here...

@@ -231,43 +238,6 @@ public void testOutputOrderDoesntChangeWhenCloning() {
assertEquals(json, jsonCloned);
}

@Test
public void testBiasLr() {
//setup the network

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Sep 11, 2017

if this is gone, what is it replaced by?

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Sep 12, 2017

Author Contributor

The test - or the configuration?
For config, it's .biasLearningRate(double) -> .biasUpdater(IUpdater)
There are other tests that check for bias updater config.

@@ -347,17 +349,17 @@ public void testBiasLr() {
org.deeplearning4j.nn.conf.layers.BaseLayer l2 = (BaseLayer) conf.getConf(2).getLayer();
org.deeplearning4j.nn.conf.layers.BaseLayer l3 = (BaseLayer) conf.getConf(3).getLayer();

assertEquals(0.5, l0.getBiasLearningRate(), 1e-6);
assertEquals(1e-2, l0.getLearningRate(), 1e-6);
assertEquals(0.5, ((Adam)l0.getIUpdaterByParam("b")).getLearningRate(), 1e-6);

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Sep 11, 2017

Even if what's returned is an IUpdater, I'd call this accessor getUpdaterByParam instead of getIUpdaterByParam.

This comment has been minimized.

Copy link
@AlexDBlack

AlexDBlack Sep 12, 2017

Author Contributor
@Deprecated
protected double adamMeanDecay;
@Deprecated
protected double adamVarDecay;

This comment has been minimized.

Copy link
@maxpumperla

maxpumperla Sep 11, 2017

+1 for getting rid of all the clutter

@huitseeker
Copy link

left a comment

Did you test compiling on Spark 2? It seems the imports you remove in the dl4j-spark model could have been correct shadows to invalid alternates.

I get:

[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/scala/org/deeplearning4j/spark/ml/impl/AutoEncoderWrapper.scala:119: error: not found: type UserDefinedFunction
[ERROR]     protected def udfTransformer : UserDefinedFunction
[ERROR]                                    ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:58: error: type Vector takes type parameters
[ERROR]     extends SparkDl4jModelWrapper[Vector, SparkDl4jModel](uid, network, multiLayerConfiguration) {
[ERROR]                                   ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/AutoEncoder.scala:20: error: type Vector takes type parameters
[ERROR]     override def mapVectorFunc = row => org.apache.spark.mllib.linalg.Vectors.fromML(row.get(0).asInstanceOf[Vector])
[ERROR]                                                                                                              ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/AutoEncoder.scala:36: error: not found: value VectorType
[ERROR]         SchemaUtils.appendColumn(schema, $(outputCol), VectorType, false)
[ERROR]                                                        ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/AutoEncoder.scala:43: error: kinds of the type arguments (Vector,Vector) do not conform to the expected kinds of the type parameters (type RT,type A1).
[ERROR] Vector's type parameters do not match type RT's expected parameters:
[ERROR] type Vector has one type parameter, but type RT has none, Vector's type parameters do not match type A1's expected parameters:
[ERROR] type Vector has one type parameter, but type A1 has none
[ERROR]     override def udfTransformer = udf[Vector, Vector](vec => {
[ERROR]                                      ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/AutoEncoder.scala:51: error: not found: value Vectors
[ERROR]         Vectors.dense(values)
[ERROR]         ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/AutoEncoder.scala:78: error: not found: value VectorType
[ERROR]         SchemaUtils.appendColumn(schema, $(outputCol), VectorType, false)
[ERROR]                                                        ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:25: error: type Vector takes type parameters
[ERROR]     extends SparkDl4jNetworkWrapper[Vector, SparkDl4jNetwork, SparkDl4jModel](
[ERROR]                                     ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:38: error: kinds of the type arguments (Vector) do not conform to the expected kinds of the type parameters (type T).
[ERROR] Vector's type parameters do not match type T's expected parameters:
[ERROR] type Vector has one type parameter, but type T has none
[ERROR]     override val mapVectorFunc: Row => LabeledPoint = row => new LabeledPoint(row.getAs[Double]($(labelCol)), Vectors.fromML(row.getAs[Vector]($(featuresCol))))
[ERROR]                                                                                                                                       ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:69: error: type Vector takes type parameters
[ERROR]     override def predict(features: Vector) : Double = {
[ERROR]                                    ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:78: error: type Vector takes type parameters
[ERROR]     def output(vector: Vector): Vector = org.apache.spark.ml.linalg.Vectors.dense(super.output(Vectors.fromML(vector)).toArray)
[ERROR]                                 ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:78: error: type Vector takes type parameters
[ERROR]     def output(vector: Vector): Vector = org.apache.spark.ml.linalg.Vectors.dense(super.output(Vectors.fromML(vector)).toArray)
[ERROR]                        ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:80: error: type Vector takes type parameters
[ERROR]     def outputFlattenedTensor(vector: Vector) : Vector = org.apache.spark.ml.linalg.Vectors.dense(super.outputFlattenedTensor(Vectors.fromML(vector)).toArray)
[ERROR]                                                 ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:80: error: type Vector takes type parameters
[ERROR]     def outputFlattenedTensor(vector: Vector) : Vector = org.apache.spark.ml.linalg.Vectors.dense(super.outputFlattenedTensor(Vectors.fromML(vector)).toArray)
[ERROR]                                       ^
[ERROR] /home/huitseeker/DL4J/deeplearning4j/deeplearning4j-scaleout/spark/dl4j-spark-ml/src/main/spark-2/scala/org/deeplearning4j/spark/ml/impl/SparkDl4jNetwork.scala:82: error: type Vector takes type parameters
[ERROR]     def outputTensor(vector: Vector) : INDArray = super.outputTensor(Vectors.fromML(vector))
[ERROR]                              ^
[ERROR] 15 errors found

Full log: https://gist.github.com/36f9a9dec93ee3d73465ce90c9a16196

Diff showing the switch to spark 2:
https://gist.github.com/753f581113309df827ec639116b76276

@AlexDBlack AlexDBlack force-pushed the ab_dropout branch from 1e2816e to c99b0f6 Sep 13, 2017

@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Sep 13, 2017

@huitseeker Thanks, we can blame IntelliJ's "optimize imports" for that :/
Confirmed fixed now.
https://gist.github.com/AlexDBlack/40c72fd524a81b00d5e6193e534b5f9c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.