[jvm-packages] Issue in saving Xgboost model in spark scala #3704

superpipal-yi · 2018-09-14T21:33:57Z

I trained a xgboost model in spark scala as follows:
val xgbParamGrid = new ParamGridBuilder()
.addGrid(xgb.maxDepth, Array(5, 10))
.addGrid(xgb.maxBins, Array(2))
.addGrid(xgb.minChildWeight, Array(0.2))
.addGrid(xgb.eta, Array(0.015))
.addGrid(xgb.alpha, Array(0.8, 0.9))
.addGrid(xgb.lambda, Array(0.9, 1.0))
.build()

// Create the XGBoost pipeline
val pipeline = new Pipeline().setStages(Array(xgb))

val evaluator = new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName("accuracy")

val cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(xgbParamGrid)
.setNumFolds(3)

val xgbModel = cv.fit(trainingData)

val xgbBest = xgbModel.bestModel.asInstanceOf[PipelineModel].stages(0).asInstanceOf[XGBoostClassificationModel]

then I tried to save it as
xgbBest.write.overwrite.save(modelSavePath)

but I got exceptions as
Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:105) at org.apache.spark.ml.param.LongParam.jsonEncode(params.scala:480) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:73) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:71) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1.apply(DefaultXGBoostParamsWriter.scala:71) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1.apply(DefaultXGBoostParamsWriter.scala:69) at scala.Option.getOrElse(Option.scala:121) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.getMetadataToSave(DefaultXGBoostParamsWriter.scala:69) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.saveMetadata(DefaultXGBoostParamsWriter.scala:51) at ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel$XGBoostClassificationModelWriter.saveImpl(XGBoostClassifier.scala:480) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) at classificationWithXgboost$.main(classificationWithXgboost.scala:125) at classificationWithXgboost.main(classificationWithXgboost.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I am using xgboost 0.81_snapshot. Cmake and maven build from local. Then include .m2 maven repo in build.sbt

please help

CodingCat · 2018-09-19T20:56:24Z

@superpipal-yi since you are building from local, would you please instrument

xgboost/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/DefaultXGBoostParamsWriter.scala

Line 73 in aa53e9f

p.name -> parse(p.jsonEncode(v))

by adding a line to print p.name and see which parameter is bringing the exception?

superpipal-yi · 2018-09-20T23:24:06Z

sure by adding print p.name, the parameter which causes trouble is seed. I have it as integer in paramap

CodingCat · 2018-09-20T23:27:09Z

so if you make it as Long, the problem solved?

xgboost/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/GeneralParams.scala

Line 149 in aa53e9f

final val seed = new LongParam(this, "seed", "random seed")

superpipal-yi · 2018-09-20T23:35:07Z

I have this line of code exactly as
final val seed = new LongParam(this, "seed", "random seed")
however, i am not sure why in the final xgboostclassificationmodel, it shows as integer in the paramap.....

CodingCat · 2018-09-20T23:38:34Z

#3716 this should fix the issue, will add the unit test to cover persisting bestModel

but you can try to pick up code in the pull request and test

CodingCat · 2018-09-21T20:05:04Z

does it solve the issue if you pick up the change in #3716?

superpipal-yi · 2018-09-21T20:59:51Z

No. I have to force the casting from integer to long to save the model now.
the dirty fix I am using now is
(DefaultXGBoostParamsWriter.scala)
val jsonParams = paramMap.getOrElse(render(params.filter{
case ParamPair(p, _) => p != null
}.map {
case ParamPair(p, v) =>
try {
p.name -> parse(p.jsonEncode(v))
}
catch {
case e: Exception =>
println(p.name)
p.name -> parse(p.jsonEncode(v.asInstanceOf[Number].longValue))
}
}.toList))

CodingCat · 2018-09-22T21:18:00Z

what's the version are you using?

I didn't find any big difference between your code and the example https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/SparkMLlibPipeline.scala

but I cannot reproduce your issue with the example code

superpipal-yi · 2018-09-24T16:44:07Z

val sparkVersion = "2.3.1"
scalaVersion := "2.11.8"
resolvers += Resolver.mavenLocal
libraryDependencies += "ml.dmlc" % "xgboost4j" % "0.81-SNAPSHOT"
libraryDependencies += "ml.dmlc" % "xgboost4j-spark" % "0.81-SNAPSHOT"

CodingCat · 2018-09-25T16:39:21Z

can you reproduce by running the example code?

hcho3 · 2018-10-17T10:50:40Z

@CodingCat Another user is reporting the same issue: https://discuss.xgboost.ai/t/jvm-packages-error-xgb-spark-save-model-in-version-0-80

CodingCat · 2019-03-07T04:51:23Z

cannot reproduce

hcho3 changed the title ~~Issue in saving Xgboost model in spark scala~~ [jvm-packages] Issue in saving Xgboost model in spark scala Sep 14, 2018

CodingCat closed this as completed Mar 7, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jvm-packages] Issue in saving Xgboost model in spark scala #3704

[jvm-packages] Issue in saving Xgboost model in spark scala #3704

superpipal-yi commented Sep 14, 2018 •

edited

CodingCat commented Sep 19, 2018

superpipal-yi commented Sep 20, 2018

CodingCat commented Sep 20, 2018 •

edited

superpipal-yi commented Sep 20, 2018 •

edited

CodingCat commented Sep 20, 2018

CodingCat commented Sep 21, 2018

superpipal-yi commented Sep 21, 2018

CodingCat commented Sep 22, 2018

superpipal-yi commented Sep 24, 2018 •

edited

CodingCat commented Sep 25, 2018

hcho3 commented Oct 17, 2018

CodingCat commented Mar 7, 2019

[jvm-packages] Issue in saving Xgboost model in spark scala #3704

[jvm-packages] Issue in saving Xgboost model in spark scala #3704

Comments

superpipal-yi commented Sep 14, 2018 • edited

CodingCat commented Sep 19, 2018

superpipal-yi commented Sep 20, 2018

CodingCat commented Sep 20, 2018 • edited

superpipal-yi commented Sep 20, 2018 • edited

CodingCat commented Sep 20, 2018

CodingCat commented Sep 21, 2018

superpipal-yi commented Sep 21, 2018

CodingCat commented Sep 22, 2018

superpipal-yi commented Sep 24, 2018 • edited

CodingCat commented Sep 25, 2018

hcho3 commented Oct 17, 2018

CodingCat commented Mar 7, 2019

superpipal-yi commented Sep 14, 2018 •

edited

CodingCat commented Sep 20, 2018 •

edited

superpipal-yi commented Sep 20, 2018 •

edited

superpipal-yi commented Sep 24, 2018 •

edited