Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add predict_disable_shape_check in LightGBM #1273

Merged

Conversation

nhymxu
Copy link
Contributor

@nhymxu nhymxu commented Nov 24, 2021

Resolve #957
Resolve #1159

@ghost
Copy link

ghost commented Nov 24, 2021

CLA assistant check
All CLA requirements met.

@nhymxu nhymxu marked this pull request as ready for review November 24, 2021 13:28
@nhymxu
Copy link
Contributor Author

nhymxu commented Nov 24, 2021

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1273 in repo microsoft/SynapseML

@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2021

Codecov Report

Merging #1273 (307e1c0) into master (175fbc5) will increase coverage by 0.02%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1273      +/-   ##
==========================================
+ Coverage   83.13%   83.16%   +0.02%     
==========================================
  Files         300      300              
  Lines       13699    13709      +10     
  Branches      660      656       -4     
==========================================
+ Hits        11389    11401      +12     
+ Misses       2310     2308       -2     
Impacted Files Coverage Δ
...htgbm/src/main/python/synapse/ml/lightgbm/mixin.py 41.93% <20.00%> (-4.22%) ⬇️
...azure/synapse/ml/lightgbm/LightGBMClassifier.scala 91.11% <100.00%> (ø)
...oft/azure/synapse/ml/lightgbm/LightGBMRanker.scala 64.17% <100.00%> (ø)
.../azure/synapse/ml/lightgbm/LightGBMRegressor.scala 74.13% <100.00%> (ø)
.../synapse/ml/lightgbm/booster/LightGBMBooster.scala 89.70% <100.00%> (ø)
...re/synapse/ml/lightgbm/params/LightGBMParams.scala 84.46% <100.00%> (+0.29%) ⬆️
...crosoft/azure/synapse/ml/io/http/HTTPClients.scala 86.66% <0.00%> (+10.00%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 175fbc5...307e1c0. Read the comment docs.

@nhymxu
Copy link
Contributor Author

nhymxu commented Nov 26, 2021

@mhamilton723
I can't see detail on Failing CI.
Can you give me some screenshot? So I can fix it

@mhamilton723
Copy link
Collaborator

mhamilton723 commented Nov 30, 2021

com.microsoft.azure.synapse.ml.lightgbm.split1.VerifyLightGBMClassifier com.microsoft.azure.synapse.ml.lightgbm.split2.VerifyLightGBMRanker
[om.microsoft.azure.synapse.ml.lightgbm.split2.VerifyLightGBMRegressor
[

@nhymxu thanks for your patience during the holidays These test suites are failing. Not sure why though as there is no stacktrace. Perhaps run this test locally to debug.

This is the stack trace on the latter two
21/11/25 16:42:21 ERROR LightGBMRanker: {"uid":"LightGBMRanker_9c014592877b","className":"class com.microsoft.azure.synapse.ml.lightgbm.LightGBMRanker","method":"train","buildVersion":"0.9.4-12-de8d6f67-SNAPSHOT"}
java.lang.IllegalArgumentException: group column query must be of type Long, Int or String but is struct
at com.microsoft.azure.synapse.ml.lightgbm.dataset.DatasetUtils$.validateGroupColumn(DatasetUtils.scala:103)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.$anonfun$train$2(LightGBMBase.scala:45)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.$anonfun$train$2$adapted(LightGBMBase.scala:45)
at scala.Option.foreach(Option.scala:407)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.$anonfun$train$1(LightGBMBase.scala:45)
at com.microsoft.azure.synapse.ml.logging.BasicLogging.logVerb(BasicLogging.scala:63)

@imatiach-msft
Copy link
Contributor

@mhamilton723 @nhymxu I think that error above is a red herring, the real error is:

[info] No tests to run for Test / testOnly
[info] - Serialization Fuzzing *** FAILED ***
[info] java.lang.UnsupportedOperationException: The default jsonEncode only supports string, vector and matrix. org.apache.spark.ml.param.Param must override jsonEncode for java.lang.Boolean.
[info] at org.apache.spark.ml.param.Param.jsonEncode(params.scala:100)
[info] at org.apache.spark.ml.util.DefaultParamsWriter$.$anonfun$getMetadataToSave$3(ReadWrite.scala:435)
[info] at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
[info] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[info] at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[info] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[info] at scala.collection.TraversableLike.map(TraversableLike.scala:238)
[info] at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
[info] at scala.collection.AbstractTraversable.map(Traversable.scala:108)
[info] at org.apache.spark.ml.util.DefaultParamsWriter$.getMetadataToSave(ReadWrite.scala:434)

it looks like jsonEncode doesn't work for the new boolean parameter.

@imatiach-msft
Copy link
Contributor

this error was just an assertion in a negative test case and not an actual error in the test suite:
java.lang.IllegalArgumentException: group column query must be of type Long, Int or String but is struct
the test was validating that this exception was indeed thrown. It's not actually the reason why the test suite failed.

@@ -257,6 +257,14 @@ trait LightGBMPredictionParams extends Wrappable {

def getFeaturesShapCol: String = $(featuresShapCol)
def setFeaturesShapCol(value: String): this.type = set(featuresShapCol, value)

val predictDisableShapeCheck = new Param[Boolean](this, "predictDisableShapeCheck",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of Param[Boolean] you need to use BooleanParam here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imatiach-msft
I changed to BooleanParam. Please help re-run test again.

Thank you

@imatiach-msft
Copy link
Contributor

@nhymxu I think I figured out the issue causing tests to fail. Here:

val predictDisableShapeCheck = new Param[Boolean](this, "predictDisableShapeCheck"

you need to use BooleanParam instead of Param[Boolean].
I believe the test failures will be fixed then and serialization will pass.

imatiach-msft
imatiach-msft previously approved these changes Nov 30, 2021
val testModel = LightGBMClassificationModel.loadNativeModelFromString(oldModelString)
testModel.setPredictDisableShapeCheck(true)

assert(testModel.transform(testDF).collect().length > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

looks like on cognitive services python test failed:

[error] (run-main-8) java.util.NoSuchElementException: None.get
[error] java.util.NoSuchElementException: None.get
[error] at scala.None$.get(Option.scala:529)
[error] at scala.None$.get(Option.scala:527)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.pgId$lzycompute(FaceSuite.scala:196)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.pgId(FaceSuite.scala:194)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.id$lzycompute(FaceSuite.scala:242)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.id(FaceSuite.scala:238)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.testObjects(FaceSuite.scala:268)
[error] at com.microsoft.azure.synapse.ml.core.test.fuzzing.Fuzzing.pyTestObjects(Fuzzing.scala:305)
[error] at com.microsoft.azure.synapse.ml.core.test.fuzzing.Fuzzing.pyTestObjects$(Fuzzing.scala:305)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.pyTestObjects(FaceSuite.scala:179)
[error] at com.microsoft.azure.synapse.ml.core.test.fuzzing.PyTestFuzzing.saveTestData(Fuzzing.scala:74)
[error] at com.microsoft.azure.synapse.ml.core.test.fuzzing.PyTestFuzzing.saveTestData$(Fuzzing.scala:72)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.saveTestData(FaceSuite.scala:179)
[error] at com.microsoft.azure.synapse.ml.core.test.fuzzing.PyTestFuzzing.makePyTestFile(Fuzzing.scala:149)
[error] at com.microsoft.azure.synapse.ml.core.test.fuzzing.PyTestFuzzing.makePyTestFile$(Fuzzing.scala:147)
[error] at com.microsoft.azure.synapse.ml.cognitive.split1.IdentifyFacesSuite.makePyTestFile(FaceSuite.scala:179)
[error] at com.microsoft.azure.synapse.ml.codegen.TestGen$.$anonfun$generatePythonTests$1(TestGen.scala:25)
[error] at com.microsoft.azure.synapse.ml.codegen.TestGen$.$anonfun$generatePythonTests$1$adapted(TestGen.scala:23)
[error] at scala.collection.immutable.List.foreach(List.scala:392)
[error] at com.microsoft.azure.synapse.ml.codegen.TestGen$.generatePythonTests(TestGen.scala:23)
[error] at com.microsoft.azure.synapse.ml.codegen.TestGen$.main(TestGen.scala:81)
[error] at com.microsoft.azure.synapse.ml.codegen.TestGen.main(TestGen.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] stack trace is suppressed; run 'last cognitive / Test / bgRunMain' for the full output

let me requeue the build and retry, I'm guessing it's some transient failure

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@nhymxu
Copy link
Contributor Author

nhymxu commented Nov 30, 2021

Ya, look like all things done.

Thank you for your support in my first PR here.

Should I squash and rebase with master branch?

@mhamilton723 mhamilton723 merged commit db03b01 into microsoft:master Nov 30, 2021
@mhamilton723
Copy link
Collaborator

THank you @imatiach-msft and @nhymxu

@chickenfingerwu
Copy link

@mhamilton723 is it possible to use a snapshot of this version?

@nhymxu
Copy link
Contributor Author

nhymxu commented Dec 6, 2021

@chickenfingerwu
I think you can try master-snapshot version labeled on README.md file

Screen Shot 2021-12-06 at 19 43 37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants