Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Issue in saving Xgboost model in spark scala #3704

Closed
superpipal-yi opened this issue Sep 14, 2018 · 12 comments
Closed

[jvm-packages] Issue in saving Xgboost model in spark scala #3704

superpipal-yi opened this issue Sep 14, 2018 · 12 comments

Comments

@superpipal-yi
Copy link

superpipal-yi commented Sep 14, 2018

I trained a xgboost model in spark scala as follows:
val xgbParamGrid = new ParamGridBuilder()
.addGrid(xgb.maxDepth, Array(5, 10))
.addGrid(xgb.maxBins, Array(2))
.addGrid(xgb.minChildWeight, Array(0.2))
.addGrid(xgb.eta, Array(0.015))
.addGrid(xgb.alpha, Array(0.8, 0.9))
.addGrid(xgb.lambda, Array(0.9, 1.0))
.build()

// Create the XGBoost pipeline
val pipeline = new Pipeline().setStages(Array(xgb))

val evaluator = new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName("accuracy")

val cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(xgbParamGrid)
.setNumFolds(3)

val xgbModel = cv.fit(trainingData)

val xgbBest = xgbModel.bestModel.asInstanceOf[PipelineModel].stages(0).asInstanceOf[XGBoostClassificationModel]

then I tried to save it as
xgbBest.write.overwrite.save(modelSavePath)

but I got exceptions as
Exception in thread "main" java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:105) at org.apache.spark.ml.param.LongParam.jsonEncode(params.scala:480) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:73) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1$$anonfun$3.apply(DefaultXGBoostParamsWriter.scala:71) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1.apply(DefaultXGBoostParamsWriter.scala:71) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$$anonfun$1.apply(DefaultXGBoostParamsWriter.scala:69) at scala.Option.getOrElse(Option.scala:121) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.getMetadataToSave(DefaultXGBoostParamsWriter.scala:69) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.saveMetadata(DefaultXGBoostParamsWriter.scala:51) at ml.dmlc.xgboost4j.scala.spark.XGBoostClassificationModel$XGBoostClassificationModelWriter.saveImpl(XGBoostClassifier.scala:480) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:103) at classificationWithXgboost$.main(classificationWithXgboost.scala:125) at classificationWithXgboost.main(classificationWithXgboost.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I am using xgboost 0.81_snapshot. Cmake and maven build from local. Then include .m2 maven repo in build.sbt

please help

@hcho3 hcho3 changed the title Issue in saving Xgboost model in spark scala [jvm-packages] Issue in saving Xgboost model in spark scala Sep 14, 2018
@CodingCat
Copy link
Member

@superpipal-yi since you are building from local, would you please instrument

by adding a line to print p.name and see which parameter is bringing the exception?

@superpipal-yi
Copy link
Author

sure by adding print p.name, the parameter which causes trouble is seed. I have it as integer in paramap

@CodingCat
Copy link
Member

CodingCat commented Sep 20, 2018

so if you make it as Long, the problem solved?

@superpipal-yi
Copy link
Author

superpipal-yi commented Sep 20, 2018

I have this line of code exactly as
final val seed = new LongParam(this, "seed", "random seed")
however, i am not sure why in the final xgboostclassificationmodel, it shows as integer in the paramap.....
image

@CodingCat
Copy link
Member

#3716 this should fix the issue, will add the unit test to cover persisting bestModel

but you can try to pick up code in the pull request and test

@CodingCat
Copy link
Member

does it solve the issue if you pick up the change in #3716?

@superpipal-yi
Copy link
Author

No. I have to force the casting from integer to long to save the model now.
the dirty fix I am using now is
(DefaultXGBoostParamsWriter.scala)
val jsonParams = paramMap.getOrElse(render(params.filter{
case ParamPair(p, _) => p != null
}.map {
case ParamPair(p, v) =>
try {
p.name -> parse(p.jsonEncode(v))
}
catch {
case e: Exception =>
println(p.name)
p.name -> parse(p.jsonEncode(v.asInstanceOf[Number].longValue))
}
}.toList))

@CodingCat
Copy link
Member

what's the version are you using?

I didn't find any big difference between your code and the example https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-example/src/main/scala/ml/dmlc/xgboost4j/scala/example/spark/SparkMLlibPipeline.scala

but I cannot reproduce your issue with the example code

@superpipal-yi
Copy link
Author

superpipal-yi commented Sep 24, 2018

val sparkVersion = "2.3.1"
scalaVersion := "2.11.8"
resolvers += Resolver.mavenLocal
libraryDependencies += "ml.dmlc" % "xgboost4j" % "0.81-SNAPSHOT"
libraryDependencies += "ml.dmlc" % "xgboost4j-spark" % "0.81-SNAPSHOT"

@CodingCat
Copy link
Member

can you reproduce by running the example code?

@hcho3
Copy link
Collaborator

hcho3 commented Oct 17, 2018

@CodingCat
Copy link
Member

cannot reproduce

@lock lock bot locked as resolved and limited conversation to collaborators Jun 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants