feat: add interface function for updating learning_rate per each iteration in LightGBMDelegate #849

ocworld · 2020-04-01T11:56:29Z

Interface function is added for updating learning_rate per each iteration in LightGBMDelegate.

It is need if researcher want to decreasing or increasing learning rate per each boost round.

ocworld · 2020-04-01T11:56:43Z

@AhnLab-OSS

imatiach-msft · 2020-04-01T14:24:55Z

/azp run

azure-pipelines · 2020-04-01T14:25:06Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2020-04-01T14:31:17Z

Codecov Report

Merging #849 into master will increase coverage by 0.00%.
The diff coverage is 77.77%.

@@           Coverage Diff           @@
##           master     #849   +/-   ##
=======================================
  Coverage   85.16%   85.17%           
=======================================
  Files         186      187    +1     
  Lines        8603     8612    +9     
  Branches      508      521   +13     
=======================================
+ Hits         7327     7335    +8     
- Misses       1276     1277    +1

Impacted Files	Coverage Δ
...microsoft/ml/spark/lightgbm/LightGBMDelegate.scala	`0.00% <0.00%> (ø)`
...a/com/microsoft/ml/spark/lightgbm/TrainUtils.scala	`87.60% <100.00%> (+1.18%)`	⬆️
.../execution/streaming/continuous/HTTPSourceV2.scala	`93.04% <0.00%> (-0.74%)`	⬇️
...m/microsoft/ml/spark/lightgbm/LightGBMParams.scala	`86.74% <0.00%> (+0.55%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be366c5...5e7151d. Read the comment docs.

ocworld · 2020-04-01T15:16:56Z

@imatiach-msft Can I see detailed failure logs in unittest?

imatiach-msft · 2020-04-01T15:19:29Z

/azp run

azure-pipelines · 2020-04-01T15:19:40Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-01T15:22:08Z

Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.scalatest.Assertions$AssertionsHelper Serialization stack: - object not serializable (class: org.scalatest.Assertions$AssertionsHelper, value: org.scalatest.Assertions$AssertionsHelper@15d041ed) - field (class: org.scalatest.FunSuite, name: assertionsHelper, type: class org.scalatest.Assertions$AssertionsHelper) - object (class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier, VerifyLightGBMClassifier) - field (class: com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21, name: $outer, type: class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier) - object (class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21, ) - field (class: com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1, name: $outer, type: class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21) - object (class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1, com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1@7e609acb) - field (class: scala.Some, name: x, type: class java.lang.Object) - object (class scala.Some, Some(com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1@7e609acb)) - field (class: com.microsoft.ml.spark.lightgbm.LightGBMClassifier, name: delegate, type: class scala.Option) - object (class com.microsoft.ml.spark.lightgbm.LightGBMClassifier, LightGBMClassifier_58b8194d6ebe) - field (class: com.microsoft.ml.spark.lightgbm.LightGBMBase$$anonfun$6, name: $outer, type: interface com.microsoft.ml.spark.lightgbm.LightGBMBase) - object (class com.microsoft.ml.spark.lightgbm.LightGBMBase$$anonfun$6, ) - field (class: org.apache.spark.sql.execution.MapPartitionsExec, name: func, type: interface scala.Function1) - object (class org.apache.spark.sql.execution.MapPartitionsExec, MapPartitions , obj#29842: com.microsoft.ml.spark.lightgbm.LightGBMBooster +- DeserializeToObject createexternalrow(labels#29720, newInstance(class org.apache.spark.ml.linalg.VectorUDT).deserialize, StructField(labels,DoubleType,false), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true)), obj#29841: org.apache.spark.sql.Row +- Coalesce 2 +- *(1) Project [labels#28546 AS labels#29720, features#29074] +- *(1) Sample 0.0, 0.8, false, 42 +- *(1) Project [labels#28546, features#29074] +- *(1) Sort [age#27948 ASC NULLS FIRST, balance#27953 ASC NULLS FIRST, day#27957 ASC NULLS FIRST, month#27958 ASC NULLS FIRST, duration#27959 ASC NULLS FIRST, campaign#27960 ASC NULLS FIRST, pdays#27961 ASC NULLS FIRST, previous#27962 ASC NULLS FIRST, poutcome#27963 ASC NULLS FIRST, c_job#28349 ASC NULLS FIRST, c_marital#28368 ASC NULLS FIRST, c_education#28388 ASC NULLS FIRST, c_default#28409 ASC NULLS FIRST, c_housing#28431 ASC NULLS FIRST, c_loan#28454 ASC NULLS FIRST, c_contact#28478 ASC NULLS FIRST, labels#28546 ASC NULLS FIRST, features#29074 ASC NULLS FIRST], false, 0 +- InMemoryTableScan [age#27948, balance#27953, day#27957, month#27958, duration#27959, campaign#27960, pdays#27961, previous#27962, poutcome#27963, c_job#28349, c_marital#28368, c_education#28388, c_default#28409, c_housing#28431, c_loan#28454, c_contact#28478, labels#28546, features#29074] +- InMemoryRelation [age#27948, balance#27953, day#27957, month#27958, duration#27959, campaign#27960, pdays#27961, previous#27962, poutcome#27963, c_job#28349, c_marital#28368, c_education#28388, c_default#28409, c_housing#28431, c_loan#28454, c_contact#28478, labels#28546, features#29074], StorageLevel(disk, memory, deserialized, 1 replicas) +- *(1) Project [age#27948, balance#27953, day#27957, month#27958, durati

org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: org.scalatest.Assertions$AssertionsHelper
Serialization stack:
- object not serializable (class: org.scalatest.Assertions$AssertionsHelper, value: org.scalatest.Assertions$AssertionsHelper@15d041ed)
- field (class: org.scalatest.FunSuite, name: assertionsHelper, type: class org.scalatest.Assertions$AssertionsHelper)
- object (class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier, VerifyLightGBMClassifier)
- field (class: com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21, name: $outer, type: class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier)
- object (class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21, )
- field (class: com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1, name: $outer, type: class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21)
- object (class com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1, com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1@7e609acb)
- field (class: scala.Some, name: x, type: class java.lang.Object)
- object (class scala.Some, Some(com.microsoft.ml.spark.lightgbm.split1.VerifyLightGBMClassifier$$anonfun$21$TrainDelegate$1@7e609acb))
- field (class: com.microsoft.ml.spark.lightgbm.LightGBMClassifier, name: delegate, type: class scala.Option)
- object (class com.microsoft.ml.spark.lightgbm.LightGBMClassifier, LightGBMClassifier_58b8194d6ebe)
- field (class: com.microsoft.ml.spark.lightgbm.LightGBMBase$$anonfun$6, name: $outer, type: interface com.microsoft.ml.spark.lightgbm.LightGBMBase)
- object (class com.microsoft.ml.spark.lightgbm.LightGBMBase$$anonfun$6, )
- field (class: org.apache.spark.sql.execution.MapPartitionsExec, name: func, type: interface scala.Function1)
- object (class org.apache.spark.sql.execution.MapPartitionsExec, MapPartitions , obj#29842: com.microsoft.ml.spark.lightgbm.LightGBMBooster
+- DeserializeToObject createexternalrow(labels#29720, newInstance(class org.apache.spark.ml.linalg.VectorUDT).deserialize, StructField(labels,DoubleType,false), StructField(features,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true)), obj#29841: org.apache.spark.sql.Row
+- Coalesce 2
+- *(1) Project [labels#28546 AS labels#29720, features#29074]
+- *(1) Sample 0.0, 0.8, false, 42
+- *(1) Project [labels#28546, features#29074]
+- *(1) Sort [age#27948 ASC NULLS FIRST, balance#27953 ASC NULLS FIRST, day#27957 ASC NULLS FIRST, month#27958 ASC NULLS FIRST, duration#27959 ASC NULLS FIRST, campaign#27960 ASC NULLS FIRST, pdays#27961 ASC NULLS FIRST, previous#27962 ASC NULLS FIRST, poutcome#27963 ASC NULLS FIRST, c_job#28349 ASC NULLS FIRST, c_marital#28368 ASC NULLS FIRST, c_education#28388 ASC NULLS FIRST, c_default#28409 ASC NULLS FIRST, c_housing#28431 ASC NULLS FIRST, c_loan#28454 ASC NULLS FIRST, c_contact#28478 ASC NULLS FIRST, labels#28546 ASC NULLS FIRST, features#29074 ASC NULLS FIRST], false, 0
+- InMemoryTableScan [age#27948, balance#27953, day#27957, month#27958, duration#27959, campaign#27960, pdays#27961, previous#27962, poutcome#27963, c_job#28349, c_marital#28368, c_education#28388, c_default#28409, c_housing#28431, c_loan#28454, c_contact#28478, labels#28546, features#29074]
+- InMemoryRelation [age#27948, balance#27953, day#27957, month#27958, duration#27959, campaign#27960, pdays#27961, previous#27962, poutcome#27963, c_job#28349, c_marital#28368, c_education#28388, c_default#28409, c_housing#28431, c_loan#28454, c_contact#28478, labels#28546, features#29074], StorageLevel(disk, memory, deserialized, 1 replicas)
+- *(1) Project [age#27948, balance#2795

imatiach-msft · 2020-04-01T15:23:18Z

[info] - Verify LightGBM Classifier updating learning_rate on training by using LightGBMDelegate *** FAILED ***

[info] - object (class scala.collection.immutable.List$SerializationProxy, scala.collection.immutable.List$SerializationProxy@3d4ebb56)

[info] - writeReplace data (class: scala.collection.immutable.List$SerializationProxy)

[info] - object (class scala.collection.immutable.$colon$colon, List(org.apache.spark.OneToOneDependency@230f8966))

[info] - field (class: org.apache.spark.rdd.RDD, name: org$apache$spark$rdd$RDD$$dependencies_, type: interface scala.collection.Seq)

[info] - object (class org.apache.spark.rdd.MapPartitionsRDD, MapPartitionsRDD[4766] at reduce at LightGBMBase.scala:160)

[info] - field (class: scala.Tuple2, name: _1, type: class java.lang.Object)

[info] - object (class scala.Tuple2, (MapPartitionsRDD[4766] at reduce at LightGBMBase.scala:160,))

[info] at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)

[info] at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)

[info] at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)

[info] at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1155)

[info] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:1069)

[info] at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1013)

[info] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2067)

[info] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)

[info] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)

[info] at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)

[info] ...

imatiach-msft · 2020-04-01T15:24:34Z

@ocworld it looks like there was a serialization failure in the test case, will take a look and try to run locally when I get a chance

imatiach-msft · 2020-04-01T15:26:01Z

src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala

    while (!isFinished && iters < trainParams.numIterations) {

      if (delegate.isDefined) {
        delegate.get.beforeTrainIteration(partitionId, iters, log, trainParams, boosterPtr, hasValid)
+        val newLearningRate = delegate.get.getLearningRate(partitionId, iters, log, trainParams, learningRate)
+        if (newLearningRate != learningRate) {
+          log.info(s"LightGBM worker calling LGBM_BoosterResetParameter to reset learningRate" +


nice logging!

imatiach-msft · 2020-04-01T15:27:44Z

src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala

@@ -360,6 +362,47 @@ class VerifyLightGBMClassifier extends Benchmarks with EstimatorFuzzing[LightGBM
    assert(metric > 0.8)
  }

+  test("Verify LightGBM Classifier updating learning_rate on training by using LightGBMDelegate") {
+
+    class TrainDelegate extends LightGBMDelegate {


maybe the test is failing because this class is not serializable?

imatiach-msft · 2020-04-01T15:29:28Z

maybe add this to the class:

https://alvinalexander.com/scala/how-to-use-serialization-in-scala-serializable-trait/

give it a @serialversionuid(100L) and "extends Serializable"

imatiach-msft · 2020-04-01T15:38:35Z

/azp run

azure-pipelines · 2020-04-01T15:38:45Z

Azure Pipelines successfully started running 1 pipeline(s).

ocworld · 2020-04-01T15:47:14Z

@imatiach-msft Thanks to your comments and logs.
LightGBMDelegate already extend serializable.

So, Two things are fixed.

@serialversionuid(100L), you mentioned, is written in my code.
In unittest, TrainDelegate Class have been inner class. I assume that outter class, VerifyLightGBMClassifier, is not serializable. When I tested on my local sample program, this case cause not serialization error.
So, TrainDelegate is changed to independent class.

imatiach-msft · 2020-04-01T18:25:00Z

/azp run

azure-pipelines · 2020-04-01T18:25:11Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-01T21:26:42Z

/azp run

azure-pipelines · 2020-04-01T21:26:53Z

Azure Pipelines successfully started running 1 pipeline(s).

ocworld · 2020-04-02T11:52:33Z

@imatiach-msft Can I see error logs on Azure.mmlspark (UnitTests cognitive) and Azure.mmlspark (E2E)?

imatiach-msft · 2020-04-02T14:19:26Z

/azp run

azure-pipelines · 2020-04-02T14:19:37Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2020-04-02T14:20:31Z

@ocworld it looks like some new test infrastructure issue, let me see if @mhamilton723 might take a look since he is more familiar with the cognitive tests

imatiach-msft · 2020-04-07T01:59:52Z

/azp run

azure-pipelines · 2020-04-07T02:00:05Z

Azure Pipelines successfully started running 1 pipeline(s).

…ation in LightGBMDelegate (microsoft#849) * feat: add update learning_rate by using LightGBMDelegate * feat: add update learning_rate by using LightGBMDelegate * feat: add update learning_rate by using LightGBMDelegate * feat: add update learning_rate by using LightGBMDelegate * fix minor * fix serialization error * fix serialization error * change LightGBMDelegate to trait for scala style * change LightGBMDelegate to trait for scala style * change LightGBMDelegate to trait for scala style

Keunhyun Oh added 2 commits April 1, 2020 20:25

feat: add update learning_rate by using LightGBMDelegate

7829a95

feat: add update learning_rate by using LightGBMDelegate

bd6800f

ocworld requested a review from imatiach-msft as a code owner April 1, 2020 11:56

Keunhyun Oh added 2 commits April 1, 2020 21:02

feat: add update learning_rate by using LightGBMDelegate

e99e2bd

feat: add update learning_rate by using LightGBMDelegate

c937457

fix minor

c9194a8

imatiach-msft previously approved these changes Apr 1, 2020

View reviewed changes

imatiach-msft reviewed Apr 1, 2020

View reviewed changes

fix serialization error

affb069

ocworld dismissed imatiach-msft’s stale review via affb069 April 1, 2020 15:33

fix serialization error

fa62274

Keunhyun Oh added 3 commits April 2, 2020 21:13

change LightGBMDelegate to trait for scala style

280157c

change LightGBMDelegate to trait for scala style

31d40bd

change LightGBMDelegate to trait for scala style

5e7151d

imatiach-msft approved these changes Apr 2, 2020

View reviewed changes

imatiach-msft merged commit 4d99879 into microsoft:master Apr 7, 2020

ocworld deleted the feat-add-update-learningrate branch April 8, 2020 00:06

ocworld mentioned this pull request Apr 8, 2020

feat: add interface function for updating learning_rate per each iter… AhnLab-OSS/mmlspark#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add interface function for updating learning_rate per each iteration in LightGBMDelegate #849

feat: add interface function for updating learning_rate per each iteration in LightGBMDelegate #849

ocworld commented Apr 1, 2020

ocworld commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

codecov bot commented Apr 1, 2020 •

edited

Loading

ocworld commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft Apr 1, 2020

imatiach-msft Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

ocworld commented Apr 1, 2020 •

edited

Loading

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

ocworld commented Apr 2, 2020 •

edited

Loading

imatiach-msft commented Apr 2, 2020

azure-pipelines bot commented Apr 2, 2020

imatiach-msft commented Apr 2, 2020

imatiach-msft commented Apr 7, 2020

azure-pipelines bot commented Apr 7, 2020

feat: add interface function for updating learning_rate per each iteration in LightGBMDelegate #849

feat: add interface function for updating learning_rate per each iteration in LightGBMDelegate #849

Conversation

ocworld commented Apr 1, 2020

ocworld commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

codecov bot commented Apr 1, 2020 • edited Loading

Codecov Report

ocworld commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

imatiach-msft Apr 1, 2020

Choose a reason for hiding this comment

imatiach-msft Apr 1, 2020

Choose a reason for hiding this comment

imatiach-msft commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

ocworld commented Apr 1, 2020 • edited Loading

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

imatiach-msft commented Apr 1, 2020

azure-pipelines bot commented Apr 1, 2020

ocworld commented Apr 2, 2020 • edited Loading

imatiach-msft commented Apr 2, 2020

azure-pipelines bot commented Apr 2, 2020

imatiach-msft commented Apr 2, 2020

imatiach-msft commented Apr 7, 2020

azure-pipelines bot commented Apr 7, 2020

codecov bot commented Apr 1, 2020 •

edited

Loading

ocworld commented Apr 1, 2020 •

edited

Loading

ocworld commented Apr 2, 2020 •

edited

Loading