-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix lightgbm stuck in multiclass scenario and added stratified repartition transformer #618
fix lightgbm stuck in multiclass scenario and added stratified repartition transformer #618
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #618 +/- ##
=========================================
+ Coverage 79.7% 79.8% +0.09%
=========================================
Files 224 225 +1
Lines 8965 9016 +51
Branches 473 474 +1
=========================================
+ Hits 7146 7195 +49
- Misses 1819 1821 +2
Continue to review full report at Codecov.
|
src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/ml/spark/stages/StratifiedRepartition.scala
Outdated
Show resolved
Hide resolved
src/main/scala/com/microsoft/ml/spark/stages/StratifiedRepartition.scala
Outdated
Show resolved
Hide resolved
labelToCount.map(lc => (lc._1, 1.0)).toMap | ||
} | ||
|
||
val spdata = dataset.toDF().rdd.keyBy(row => row.getInt(row.schema.fieldIndex(getLabelCol))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have any equivalent in data frame API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it seems not
src/main/scala/com/microsoft/ml/spark/stages/StratifiedRepartition.scala
Outdated
Show resolved
Hide resolved
/app run |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Needs a fuzzer and also fails tests on build machine |
614dfd6
to
0cad2fd
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
15a9979
to
ea9f8a3
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
ea9f8a3
to
97f7d78
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
97f7d78
to
4eaca6c
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
4eaca6c
to
24d47a0
Compare
/azp run |
c452eba
to
f640c6c
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
185d1e9
to
77f1b0f
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/ml/spark/stages/StratifiedRepartitionSuite.scala
Outdated
Show resolved
Hide resolved
src/test/scala/com/microsoft/ml/spark/lightgbm/split1/VerifyLightGBMClassifier.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should talk about this in person. I don't think it makes sense to add dummy rows into peoples datasets as this will change the computation and hurt the classifier. Instead consider throwing a helpful error message that points them in the direction of stratified sampling. Also this makes LightGBM the code more complex and less maintainable.
77f1b0f
to
ab7b6cb
Compare
@mhamilton723 discussed, that was a mode added for debugging user issues, it is off by default. By default we just fail if the user does not have all labels on all partitions for classification. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
ab7b6cb
to
4ff5a12
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
4ff5a12
to
a12d825
Compare
…ition transformer
93fe693
to
b818c4e
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
fix for issues #609 and #569