-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not save Pipeline model in Spark Apache 3.2.0 with xgboost4j-spark_2.12-1.3.1 -1.4.1 - 1.5; they are not compatible with Spark 3.2.0 #7384
Comments
Could you please try nightly? Seems to be fixed by #7376 |
Thanks guy. But anyway, I try to find nightly build version 1.5.1 at aws but there is nothing 😢. |
Can you visit this link https://xgboost.readthedocs.io/en/latest/install.html#id4 and see if it helps? |
Feel free to close the issue if it's fixed. |
I searched in Maven repository at https://s3-us-west-2.amazonaws.com/xgboost-maven-repo/list.html, and failed with the nightly version 1.5.0 "xgboost4j_2.12-1.5.0-20210927.120858-111.jar" - it was built on 27 Sep 2021 , not work with Spark 3.2. And I test with 1.6.0 SNAPSHOT :xgboost4j-spark_2.12-1.6.0-20211030.101815-26.jar , I encountered the errors : |
This morning, I re-build jvm-package in Docker , I encounter the errors:
|
So @FelixDuong, you failed to build xgboost 1.6.0 snapshot? |
Yeah, I built successfully at the first time in VMWare Ubuntu, and failed at the second time in Docker Ubuntu. Maybe, I lacked experience in combining OpenBlas with Jni -Java (config). |
Hmm? XGBoost doesn't use openblas.
From your log, I think you have a weird Python installation (like 2.7) in your docker environment. Could you please verify that? |
Yeah, in running Xgboost on Spark , RabbitMQ uses python2.7 to run ( I was paintful before realizing that). so when early I just has just used Python 3.8 ,and the errors appeared at that step when fail connecting to workers until when I installed Python 2.7 --> the errors have gone ( in Spark 3.1.2 and Xgboost4j-spark 1.4.1 and 1.3.1 was ok, in Docker Ubuntu 20.04 enviroment) . So, when upgrading to Spark 3.2.0 Python 2.7 & 3.8 were still in system. |
That's an unexpected result. I think it's going to work with Python 3.8 and stop working when it's Python 2.7. I'm using Python 3.8. |
I pushed a PR for better shebang in the python data preprocessing script. #7389 . For JVM package those scripts are only used in testing. |
Closing. We have merged the fix for compatibility, doc update, and shebang improvement. Should be working now. Feel free to reopen if the issue remains in your test. |
Hi,
The python version is:
|
Hi, the minimum required version is 3.8. We are working to remove the Python dependency, but will take some time. |
Dear all,
I have met these errors when training model . I searched and found that the errors already happened with some previous version xgboost4j and xgboost4j-spark.
I hope to find down a solution which can run with Spark 3.2 ( it was improved BLAS performance). Thanks all guys. !!!
java.lang.NoSuchMethodError: 'org.json4s.JsonDSL$JsonAssoc org.json4s.JsonDSL$.pair2Assoc(scala.Tuple2, scala.Function1)' at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.getMetadataToSave(DefaultXGBoostParamsWriter.scala:75) at ml.dmlc.xgboost4j.scala.spark.params.DefaultXGBoostParamsWriter$.saveMetadata(DefaultXGBoostParamsWriter.scala:51) at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$XGBoostRegressionModelWriter.saveImpl(XGBoostRegressor.scala:454) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$5(Pipeline.scala:257) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent(events.scala:174) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent$(events.scala:169) at org.apache.spark.ml.util.Instrumentation.withSaveInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$4(Pipeline.scala:257) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$4$adapted(Pipeline.scala:254) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$1(Pipeline.scala:254) at org.apache.spark.ml.Pipeline$SharedReadWrite$.$anonfun$saveImpl$1$adapted(Pipeline.scala:247) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.Pipeline$SharedReadWrite$.saveImpl(Pipeline.scala:247) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.saveImpl(Pipeline.scala:346) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:168) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.super$save(Pipeline.scala:344) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$4(Pipeline.scala:344) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent(events.scala:174) at org.apache.spark.ml.MLEvents.withSaveInstanceEvent$(events.scala:169) at org.apache.spark.ml.util.Instrumentation.withSaveInstanceEvent(Instrumentation.scala:42) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$3(Pipeline.scala:344) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.$anonfun$save$3$adapted(Pipeline.scala:344) at org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) at org.apache.spark.ml.PipelineModel$PipelineModelWriter.save(Pipeline.scala:344) ... 75 elided
The text was updated successfully, but these errors were encountered: