fix lightgbm stuck in multiclass scenario and added stratified repartition transformer #618

imatiach-msft · 2019-07-14T04:00:21Z

fix for issues #609 and #569

imatiach-msft · 2019-07-14T04:18:54Z

/azp run

azure-pipelines · 2019-07-14T04:19:03Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-io · 2019-07-14T04:47:24Z

Codecov Report

Merging #618 into master will increase coverage by 0.09%.
The diff coverage is 96.22%.

@@            Coverage Diff            @@
##           master    #618      +/-   ##
=========================================
+ Coverage    79.7%   79.8%   +0.09%     
=========================================
  Files         224     225       +1     
  Lines        8965    9016      +51     
  Branches      473     474       +1     
=========================================
+ Hits         7146    7195      +49     
- Misses       1819    1821       +2

Impacted Files	Coverage Δ
.../com/microsoft/ml/spark/lightgbm/TrainParams.scala	`100% <ø> (ø)`	⬆️
...a/com/microsoft/ml/spark/lightgbm/TrainUtils.scala	`90.62% <100%> (+0.67%)`	⬆️
...crosoft/ml/spark/lightgbm/LightGBMClassifier.scala	`88.09% <100%> (+0.75%)`	⬆️
...icrosoft/ml/spark/lightgbm/LightGBMConstants.scala	`100% <100%> (ø)`	⬆️
...rosoft/ml/spark/stages/StratifiedRepartition.scala	`93.1% <93.1%> (ø)`
...osoft/ml/spark/io/http/PartitionConsolidator.scala	`93.33% <0%> (-2.23%)`	⬇️
src/main/python/mmlspark/stages/__init__.py	`100% <0%> (ø)`	⬆️
...om/microsoft/ml/spark/lightgbm/LightGBMUtils.scala	`96.47% <0%> (+1.17%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 258cafb...4ff5a12. Read the comment docs.

src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala

src/main/scala/com/microsoft/ml/spark/stages/StratifiedRepartition.scala

mhamilton723 · 2019-07-14T12:43:29Z

src/main/scala/com/microsoft/ml/spark/stages/StratifiedRepartition.scala

+        labelToCount.map(lc => (lc._1, 1.0)).toMap
+      }
+
+    val spdata = dataset.toDF().rdd.keyBy(row => row.getInt(row.schema.fieldIndex(getLabelCol)))


Does this have any equivalent in data frame API?

hmm, it seems not

src/main/scala/com/microsoft/ml/spark/stages/StratifiedRepartition.scala

mhamilton723 · 2019-07-14T12:47:30Z

/app run

mhamilton723 · 2019-07-14T12:48:17Z

/azp run

azure-pipelines · 2019-07-14T12:48:25Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2019-07-14T13:49:33Z

Needs a fuzzer and also fails tests on build machine

imatiach-msft · 2019-07-19T03:59:50Z

/azp run

azure-pipelines · 2019-07-19T03:59:58Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2019-07-19T13:33:47Z

/azp run

azure-pipelines · 2019-07-19T13:33:59Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-13T03:38:03Z

/azp run

azure-pipelines · 2019-08-13T03:38:14Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-13T03:39:44Z

/azp run

azure-pipelines · 2019-08-13T03:39:55Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-13T04:07:52Z

/azp run

azure-pipelines · 2019-08-13T04:08:04Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-15T03:41:48Z

/azp run

imatiach-msft · 2019-08-19T05:00:36Z

/azp run

azure-pipelines · 2019-08-19T05:00:51Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-19T05:54:01Z

/azp run

azure-pipelines · 2019-08-19T05:54:16Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-19T10:04:56Z

/azp run

azure-pipelines · 2019-08-19T10:05:13Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-19T22:09:44Z

/azp run

azure-pipelines · 2019-08-19T22:10:00Z

Azure Pipelines successfully started running 1 pipeline(s).

src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala

src/test/scala/com/microsoft/ml/spark/stages/StratifiedRepartitionSuite.scala

src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala

mhamilton723

We should talk about this in person. I don't think it makes sense to add dummy rows into peoples datasets as this will change the computation and hurt the classifier. Instead consider throwing a helpful error message that points them in the direction of stratified sampling. Also this makes LightGBM the code more complex and less maintainable.

imatiach-msft · 2019-08-20T02:11:09Z

@mhamilton723 discussed, that was a mode added for debugging user issues, it is off by default. By default we just fail if the user does not have all labels on all partitions for classification.

imatiach-msft · 2019-08-20T02:11:13Z

/azp run

azure-pipelines · 2019-08-20T02:11:29Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-20T02:53:28Z

/azp run

azure-pipelines · 2019-08-20T02:53:44Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-08-20T03:14:45Z

/azp run

azure-pipelines · 2019-08-20T03:15:01Z

Azure Pipelines successfully started running 1 pipeline(s).

…ition transformer

imatiach-msft · 2019-08-20T04:01:43Z

/azp run

azure-pipelines · 2019-08-20T04:01:59Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft requested review from drdarshan and mhamilton723 as code owners July 14, 2019 04:00

imatiach-msft mentioned this pull request Jul 14, 2019

java.net.ConnectException: Connection refused (Connection refused) with LightGBMClassifier in Databricks #609

Closed

mhamilton723 requested changes Jul 14, 2019

View reviewed changes

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch 2 times, most recently from 614dfd6 to 0cad2fd Compare July 19, 2019 03:59

imatiach-msft mentioned this pull request Jul 19, 2019

LightGBMClassifier, LightGBMRegressor hang indefinitely without error at fit() #623

Open

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch 2 times, most recently from 15a9979 to ea9f8a3 Compare August 13, 2019 03:35

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from ea9f8a3 to 97f7d78 Compare August 13, 2019 03:39

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from 97f7d78 to 4eaca6c Compare August 13, 2019 03:55

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from 4eaca6c to 24d47a0 Compare August 15, 2019 03:41

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from c452eba to f640c6c Compare August 19, 2019 04:53

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch 2 times, most recently from 185d1e9 to 77f1b0f Compare August 19, 2019 05:53

mhamilton723 reviewed Aug 20, 2019

View reviewed changes

src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala Outdated Show resolved Hide resolved

mhamilton723 reviewed Aug 20, 2019

View reviewed changes

src/test/scala/com/microsoft/ml/spark/stages/StratifiedRepartitionSuite.scala Outdated Show resolved Hide resolved

mhamilton723 reviewed Aug 20, 2019

View reviewed changes

src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala Show resolved Hide resolved

mhamilton723 requested changes Aug 20, 2019

View reviewed changes

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from 77f1b0f to ab7b6cb Compare August 20, 2019 02:10

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from ab7b6cb to 4ff5a12 Compare August 20, 2019 03:14

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from 4ff5a12 to a12d825 Compare August 20, 2019 04:00

fix lightgbm stuck in multiclass scenario and added stratified repart…

b818c4e

…ition transformer

imatiach-msft force-pushed the ilmat/lgbm-multiclass-stuck branch from 93fe693 to b818c4e Compare August 20, 2019 04:01

mhamilton723 merged commit d518b8a into microsoft:master Aug 20, 2019

fix lightgbm stuck in multiclass scenario and added stratified repartition transformer #618

fix lightgbm stuck in multiclass scenario and added stratified repartition transformer #618

Conversation

imatiach-msft commented Jul 14, 2019

imatiach-msft commented Jul 14, 2019

azure-pipelines bot commented Jul 14, 2019

codecov-io commented Jul 14, 2019 • edited by codecov bot Loading

Codecov Report

mhamilton723 Jul 14, 2019

Choose a reason for hiding this comment

imatiach-msft Jul 19, 2019

Choose a reason for hiding this comment

mhamilton723 commented Jul 14, 2019

mhamilton723 commented Jul 14, 2019

azure-pipelines bot commented Jul 14, 2019

mhamilton723 commented Jul 14, 2019

imatiach-msft commented Jul 19, 2019

azure-pipelines bot commented Jul 19, 2019

mhamilton723 commented Jul 19, 2019

azure-pipelines bot commented Jul 19, 2019

imatiach-msft commented Aug 13, 2019

azure-pipelines bot commented Aug 13, 2019

imatiach-msft commented Aug 13, 2019

azure-pipelines bot commented Aug 13, 2019

imatiach-msft commented Aug 13, 2019

azure-pipelines bot commented Aug 13, 2019

imatiach-msft commented Aug 15, 2019

imatiach-msft commented Aug 19, 2019

azure-pipelines bot commented Aug 19, 2019

imatiach-msft commented Aug 19, 2019

azure-pipelines bot commented Aug 19, 2019

imatiach-msft commented Aug 19, 2019

azure-pipelines bot commented Aug 19, 2019

imatiach-msft commented Aug 19, 2019

azure-pipelines bot commented Aug 19, 2019

mhamilton723 left a comment • edited Loading

Choose a reason for hiding this comment

imatiach-msft commented Aug 20, 2019

imatiach-msft commented Aug 20, 2019

azure-pipelines bot commented Aug 20, 2019

imatiach-msft commented Aug 20, 2019

azure-pipelines bot commented Aug 20, 2019

imatiach-msft commented Aug 20, 2019

azure-pipelines bot commented Aug 20, 2019

imatiach-msft commented Aug 20, 2019

azure-pipelines bot commented Aug 20, 2019

codecov-io commented Jul 14, 2019 •

edited by codecov bot

Loading

mhamilton723 left a comment •

edited

Loading