New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] Issue in saving Xgboost model in spark scala #3704
Comments
@superpipal-yi since you are building from local, would you please instrument Line 73 in aa53e9f
|
sure by adding print p.name, the parameter which causes trouble is seed. I have it as integer in paramap |
so if you make it as Long, the problem solved? Line 149 in aa53e9f
|
#3716 this should fix the issue, will add the unit test to cover persisting bestModel but you can try to pick up code in the pull request and test |
does it solve the issue if you pick up the change in #3716? |
No. I have to force the casting from integer to long to save the model now. |
what's the version are you using? I didn't find any big difference between your code and the example https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/SparkMLlibPipeline.scala but I cannot reproduce your issue with the example code |
val sparkVersion = "2.3.1" |
can you reproduce by running the example code? |
@CodingCat Another user is reporting the same issue: https://discuss.xgboost.ai/t/jvm-packages-error-xgb-spark-save-model-in-version-0-80 |
cannot reproduce |
I trained a xgboost model in spark scala as follows:
val xgbParamGrid = new ParamGridBuilder()
.addGrid(xgb.maxDepth, Array(5, 10))
.addGrid(xgb.maxBins, Array(2))
.addGrid(xgb.minChildWeight, Array(0.2))
.addGrid(xgb.eta, Array(0.015))
.addGrid(xgb.alpha, Array(0.8, 0.9))
.addGrid(xgb.lambda, Array(0.9, 1.0))
.build()
// Create the XGBoost pipeline
val pipeline = new Pipeline().setStages(Array(xgb))
val evaluator = new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName("accuracy")
val cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(xgbParamGrid)
.setNumFolds(3)
val xgbModel = cv.fit(trainingData)
val xgbBest = xgbModel.bestModel.asInstanceOf[PipelineModel].stages(0).asInstanceOf[XGBoostClassificationModel]
then I tried to save it as
xgbBest.write.overwrite.save(modelSavePath)
but I got exceptions as
Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:105) at org.apache.spark.ml.param.LongParam.jsonEncode(params.scala:480) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:73) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:71) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1.apply(DefaultXGBoostParamsWriter.scala:71) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1.apply(DefaultXGBoostParamsWriter.scala:69) at scala.Option.getOrElse(Option.scala:121) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.getMetadataToSave(DefaultXGBoostParamsWriter.scala:69) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.saveMetadata(DefaultXGBoostParamsWriter.scala:51) at ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel$XGBoostClassificationModelWriter.saveImpl(XGBoostClassifier.scala:480) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) at classificationWithXgboost$.main(classificationWithXgboost.scala:125) at classificationWithXgboost.main(classificationWithXgboost.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I am using xgboost 0.81_snapshot. Cmake and maven build from local. Then include .m2 maven repo in build.sbt
please help
The text was updated successfully, but these errors were encountered: