Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Persist CrossValidator model with xgboost4j-spark error #2115

Closed
Widerstehen opened this issue Mar 17, 2017 · 35 comments
Closed

Comments

@Widerstehen
Copy link

Widerstehen commented Mar 17, 2017

Environment info

Operating System:
redhat 6.5(with spark-2.1.0)
Compiler:

Package: jvm, xgboost4j-spark
lastest xgboost version used

I want to save CrossValidator model ,but i got a error

java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: XGBoostEstimator_88624dc1e519 of type class ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
  at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:231)
  at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:228)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:228)
  at org.apache.spark.ml.Pipeline$PipelineWriter.<init>(Pipeline.scala:202)
  at org.apache.spark.ml.Pipeline.write(Pipeline.scala:188)
  at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:154)
  at org.apache.spark.ml.Pipeline.save(Pipeline.scala:96)
  at org.apache.spark.ml.tuning.ValidatorParams$.saveImpl(ValidatorParams.scala:148)
  at org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelWriter.saveImpl(CrossValidator.scala:256)
  at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:111)
  ... 50 elided

my code :

import org.apache.spark.ml.Pipeline
import ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
import org.apache.spark.ml.evaluation._
import org.apache.spark.ml.feature.{HashingTF, Tokenizer}
import org.apache.spark.ml.linalg.Vector
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.sql.Row
import scala.collection.mutable
import scala.util.Random

// xgboost parameters
def get_param(): mutable.HashMap[String, Any] = {
  val params = new mutable.HashMap[String, Any]()
  params += "eta" -> 0.1
  params += "max_depth" -> 4
  params += "min_child_weight" -> 4
  params += "num_rounds" -> 20
  params += "silent" -> 1
  params += "objective" -> "binary:logistic"
  params += "booster" -> "gbtree"
  params += "gamma" -> 0.0
  params += "colsample_bylevel" -> 1

  return params
}

val r = new Random(0)

val training = spark.createDataFrame(
  Seq.fill(10000)(r.nextInt(2)).map(i => (i, i))
).toDF("feature", "label")

val test = spark.createDataFrame(
  Seq.fill(10000)(r.nextInt(2)).map(i => (i, i))
).toDF("feature", "label")


// create pipeline
val assembler = new VectorAssembler()
  .setInputCols(Array("feature"))
  .setOutputCol("features")

val xgb = new XGBoostEstimator(get_param().toMap)
  .setFeaturesCol("features")

val pipeline = new Pipeline()
  .setStages(Array(assembler, xgb))


// grid
val paramGrid = new ParamGridBuilder()
//  .addGrid(hashingTF.numFeatures, Array(10, 100, 1000))
  .addGrid(xgb.round, Array(10, 30, 50,100))
 .addGrid(xgb.maxDepth,  Array(5, 6, 8,10))
 .addGrid(xgb.minChildWeight, Array(0.5, 0.7, 1.0))
  .build()
// cv
// val evaluator = new BinaryClassificationEvaluator().setRawPredictionCol("probabilities")
val evaluator = new RegressionEvaluator().setLabelCol("label")
val cv = new CrossValidator()
  .setEstimator(pipeline)
   .setEvaluator(new RegressionEvaluator()
      .setLabelCol("label")
      .setPredictionCol("prediction")
      .setMetricName("rmse")
      )
  .setEstimatorParamMaps(paramGrid)
  .setNumFolds(5)


// Run cross-validation, and choose the best set of parameters.
val cvModel = cv.fit(training)
cvModel.write.overwrite.save("/tmp/xgbModel")

@CodingCat
Copy link
Member

@geoHeil can you look at what is happening?

@Widerstehen
Copy link
Author

@CodingCat I guess there is no write method for CVModel for xgboost-spark , but I find pipeline model can be saved in the lastest version xgboost

geoHeil added a commit to geoHeil/xgboost that referenced this issue Mar 17, 2017
@geoHeil
Copy link
Contributor

geoHeil commented Mar 17, 2017

@frank111 I added your sample https://github.com/geoHeil/xgboost/blob/518/jvm-packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/XGBoostSparkPipelinePersistence.scala#L105-L180 as a unit test. So far it is failing with

XGBoostModel training failed
ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed and
17/03/17 08:11:46 ERROR RabitTracker: Uncaught exception thrown by worker:
java.lang.IllegalStateException: SparkContext has been shutdown
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
	at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anon$2.run(XGBoost.scala:295)
17/03/17 08:11:46 ERROR RabitTracker: java.lang.InterruptedException: sleep interrupted

Please can you check if that is the same for you.

@Widerstehen
Copy link
Author

Widerstehen commented Mar 17, 2017

@geoHeil Thank you for your reply,i run your code from L105 to L180 in spark-shell, i cannot get your error, but i get following error

scala> val tempPath = TestUtils.createTempDir("cvModel")
tempPath: String = /tmp/cvModel

scala>   cvModel.write.overwrite.save(tempPath)
java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: XGBoostEstimator_3a7b71829693 of type class ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
  at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:231)
  at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:228)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:228)
  at org.apache.spark.ml.Pipeline$PipelineWriter.<init>(Pipeline.scala:202)
  at org.apache.spark.ml.Pipeline.write(Pipeline.scala:188)
  at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:154)
  at org.apache.spark.ml.Pipeline.save(Pipeline.scala:96)
  at org.apache.spark.ml.tuning.ValidatorParams$.saveImpl(ValidatorParams.scala:148)
  at org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelWriter.saveImpl(CrossValidator.scala:256)
  at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:111)
  ... 54 elided

@geoHeil
Copy link
Contributor

geoHeil commented Mar 17, 2017

@frank111 I tried to rebuild xgboost on my laptop, but am facing a strange problem of a lot of failing test cases:
(git clone xgb into clean folder)

ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR Executor: Exception in task 4.0 in stage 0.0 (TID 4)
ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR Executor: Exception in task 5.0 in stage 0.0 (TID 5)
ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR Executor: Exception in task 7.0 in stage 0.0 (TID 7)
ml.dmlc.xgboost4j.java.XGBoostError: [15:57:28] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

Stack trace returned 2 entries:
[bt] (0) 0   libxgboost4j5544838633935815740.dylib 0x000000012733ba99 _ZN4dmlc15LogMessageFatalD1Ev + 41
[bt] (1) 1   libstdc++.6.dylib                   0x00000001276f7f60 _ZTVNSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEEE + 16

	at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/03/17 15:57:28 ERROR TaskSetManager: Task 2 in stage 0.0 failed 1 times; aborting job
Tracker started, with env={}
- test transformLeaf *** FAILED ***
  ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:322)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithRDD(XGBoost.scala:303)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:119)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:34)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:96)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithDataFrame(XGBoost.scala:187)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$2.apply$mcV$sp(XGBoostDFSuite.scala:90)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$2.apply(XGBoostDFSuite.scala:85)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$2.apply(XGBoostDFSuite.scala:85)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  ...
17/03/17 15:57:42 ERROR RabitTracker: Uncaught exception thrown by worker:
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
	at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:619)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:925)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:923)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
	at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:923)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anon$2.run(XGBoost.scala:295)
rabit::Init is already called in this thread
rabit::Init is already called in this thread
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] xgboost-jvm ........................................ SUCCESS [  3.329 s]
[INFO] xgboost4j .......................................... SUCCESS [02:09 min]
[INFO] xgboost4j-spark .................................... FAILURE [01:02 min]
[INFO] xgboost4j-flink .................................... SKIPPED
[INFO] xgboost4j-example .................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:15 min
[INFO] Finished at: 2017-03-17T15:57:43+01:00
[INFO] Final Memory: 49M/1029M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:1.0:test (test) on project xgboost4j-spark: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :xgboost4j-spark

my computer has had very strange problems for a while. Not sure if it is related to that. For now I don't know why all these tests fail. I installed with -DskipTests, but am facing the issue above. Unfortunately, I will only be able to help you when I have fixed or know why all these test cases fail now.

@geoHeil
Copy link
Contributor

geoHeil commented Mar 17, 2017

@CodingCat do you maybe have an Idea why I see all these test cases failing on my system?

@geoHeil
Copy link
Contributor

geoHeil commented Mar 17, 2017

#2116 is also referencing a similar issue. I tried it on another (working laptop ) with archlinux. Again, I see some test failures. Strangely, this is a different bug i.e. there seem to be problems with the embedded C library.

 test consistency and order preservation of dataframe-based model *** FAILED ***
  ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:322)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithRDD(XGBoost.scala:303)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:119)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:34)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:96)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithDataFrame(XGBoost.scala:187)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$1.apply$mcV$sp(XGBoostDFSuite.scala:67)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$1.apply(XGBoostDFSuite.scala:50)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$1.apply(XGBoostDFSuite.scala:50)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  ...
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=192.168.0.15, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=8}
rabit::Init is already called in this thread
rabit::Init is already called in this thread
rabit::Init is already called in this thread
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7e0a7db615, pid=26468, tid=0x00007f7d08dd3700
#
# JRE version: OpenJDK Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
# Java VM: OpenJDK 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] xgboost-jvm ........................................ SUCCESS [  1.901 s]
[INFO] xgboost4j .......................................... SUCCESS [ 17.152 s]
[INFO] xgboost4j-spark .................................... FAILURE [ 34.604 s]
[INFO] xgboost4j-flink .................................... SKIPPED
[INFO] xgboost4j-example .................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 53.740 s
[INFO] Finished at: 2017-03-17T22:11:23+01:00
[INFO] Final Memory: 41M/442M

@Widerstehen
Copy link
Author

@geoHeil yeah, i also noticed it is similar with #2116, i download xgboost three days ago,and compile successfully, is there any change during this period?

@CodingCat
Copy link
Member

I think it is related to rabit....will look into it with @tqchen

@CodingCat CodingCat changed the title Persist CrossValidator model with xgboost4j-spark error [jvm-packages] Persist CrossValidator model with xgboost4j-spark error Mar 18, 2017
@CodingCat
Copy link
Member

@geoHeil , what's your OS?

@CodingCat
Copy link
Member

and would you please post the output when you run

python dmlc-core/tracker/dmlc_tracker/tracker.py --log-level=DEBUG --num-workers=(any number you like)

@geoHeil
Copy link
Contributor

geoHeil commented Mar 19, 2017

@CodingCat one is Archlinux, (Manjaro) the other one OS X 10.12.3

on osx I get

python dmlc-core/tracker/dmlc_tracker/tracker.py --log-level=DEBUG --num-workers=2
2017-03-19 09:32:06,649 WARNING gethostbyname(socket.getfqdn()) failed... trying on hostname()
Traceback (most recent call last):
  File "dmlc-core/tracker/dmlc_tracker/tracker.py", line 389, in get_host_ip
    hostIP = socket.gethostbyname(socket.getfqdn())
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dmlc-core/tracker/dmlc_tracker/tracker.py", line 475, in <module>
    main()
  File "dmlc-core/tracker/dmlc_tracker/tracker.py", line 470, in main
    start_rabit_tracker(args)
  File "dmlc-core/tracker/dmlc_tracker/tracker.py", line 432, in start_rabit_tracker
    rabit = RabitTracker(hostIP=get_host_ip(args.host_ip), nslave=args.num_workers)
  File "dmlc-core/tracker/dmlc_tracker/tracker.py", line 392, in get_host_ip
    hostIP = socket.gethostbyname(socket.gethostname())
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

when running checking in bash for a hostname below is the output.

hostname                                                                                                                   
Georgs-MacBook-Pro.local

Retrying installation on the archlinux machine I see the following test failure when mvn clean installing xgb:

- test consistency and order preservation of dataframe-based model *** FAILED ***
  ml.dmlc.xgboost4j.java.XGBoostError: XGBoostModel training failed
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.postTrackerReturnProcessing(XGBoost.scala:322)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithRDD(XGBoost.scala:303)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:119)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:34)
  at org.apache.spark.ml.Predictor.fit(Predictor.scala:96)
  at ml.dmlc.xgboost4j.scala.spark.XGBoost$.trainWithDataFrame(XGBoost.scala:187)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$1.apply$mcV$sp(XGBoostDFSuite.scala:67)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$1.apply(XGBoostDFSuite.scala:50)
  at ml.dmlc.xgboost4j.scala.spark.XGBoostDFSuite$$anonfun$1.apply(XGBoostDFSuite.scala:50)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)

@CodingCat
Copy link
Member

the main reason for the failed test cases is that...Rabit Tracker cannot get hostname and ip address (after some changes)...I haven't locked down the change causing all of these troubles..@tqchen, any idea on who is the bad actor?

@geoHeil
Copy link
Contributor

geoHeil commented Mar 20, 2017

@CodingCat the suggestion of #2166 is only a partial solution. Downgrading to python 3.5.3 (which had worked previously) did not help to fix the problem. Here a list of test failures for 3.6 https://gist.github.com/geoHeil/17ea7fa96f402f10a9a90517406330f6 and here the problems for 3.5.3 which are the same for my arch linux laptop ond 3.6 and causing the inconsistencies error.


at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

strangely JNI only seems to be a problem for the spark tests.

[INFO] xgboost-jvm ........................................ SUCCESS [  2.849 s]
[INFO] xgboost4j .......................................... SUCCESS [ 23.449 s]
[INFO] xgboost4j-spark .................................... FAILURE [01:01 min]

@CodingCat
Copy link
Member

@terrytangyuan, @tqchen it seems that the current implementation does not work with python 3.6? any idea on how to fix it?

@geoHeil
Copy link
Contributor

geoHeil commented Mar 21, 2017

@CodingCat not really sure because the archlinux laptop was running with 3.6 and only showed the JNI Problems. Do you have an idea how to fix these?

@CodingCat
Copy link
Member

It's a problem with python tracker not jni code , if you look at your log, no tracker was started

@codeexplorer
Copy link

I am seeing a similar issue for saving cvModel after building the latest xgboost with spark jvm on linux:
scala> model.write.overwrite.save(modelDir)
java.lang.UnsupportedOperationException: Pipeline write will fail on this Pipeline because it contains a stage which does not implement Writable. Non-Writable stage: XGBoostEstimator_d334e220bcbe of type class ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:225)
at org.apache.spark.ml.Pipeline$SharedReadWrite$$anonfun$validateStages$1.apply(Pipeline.scala:222)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at org.apache.spark.ml.Pipeline$SharedReadWrite$.validateStages(Pipeline.scala:222)
at org.apache.spark.ml.Pipeline$PipelineWriter.(Pipeline.scala:198)
at org.apache.spark.ml.Pipeline.write(Pipeline.scala:184)
at org.apache.spark.ml.util.MLWritable$class.save(ReadWrite.scala:160)
at org.apache.spark.ml.Pipeline.save(Pipeline.scala:92)
at org.apache.spark.ml.tuning.ValidatorParams$.saveImpl(ValidatorParams.scala:148)
at org.apache.spark.ml.tuning.CrossValidatorModel$CrossValidatorModelWriter.saveImpl(CrossValidator.scala:250)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:114)
... 48 elided

Is XGBoostEstimator not writable ?

@geoHeil
Copy link
Contributor

geoHeil commented Mar 23, 2017 via email

@Widerstehen
Copy link
Author

I can complie lastest version xgboost , environment: redhat 6.5 python2.7

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] xgboost-jvm ....................................... SUCCESS [7.574s]
[INFO] xgboost4j ......................................... SUCCESS [57.097s]
[INFO] xgboost4j-spark ................................... SUCCESS [2:28.354s]
[INFO] xgboost4j-flink ................................... SUCCESS [34.185s]
[INFO] xgboost4j-example ................................. SUCCESS [37.245s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4:44.711s
[INFO] Finished at: Sun Apr 02 21:14:45 EDT 2017
[INFO] Final Memory: 442M/1060M
[INFO] ------------------------------------------------------------------------

@geoHeil
Copy link
Contributor

geoHeil commented Apr 3, 2017

This must be something else. I created a conda environment with 2.7.13 and still see the

at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)

please see the error log here https://gist.github.com/geoHeil/913dc5cf5f48af1614c3e1550a294815

What is strange though, that xgboost-jvm and xgboost4j build and test fine, but xgboost4j-spark is showing these problems.

@geoHeil
Copy link
Contributor

geoHeil commented Apr 3, 2017

@CodingCat

to further clarify the problem please find the following docker file:

FROM buildpack-deps:xenial-scm

RUN git clone --recursive https://github.com/dmlc/xgboost
RUN apt-get update && apt-get install -y maven make g++ python openjdk-8-jdk
RUN cd xgboost && make -j4
RUN cd xgboost && cd jvm-packages && mvn clean install


[22:59:25] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 0 pruned nodes, max_depth=2
- fast histogram algorithm parameters are exposed correctly *** FAILED ***
  0.16945997 was not less than 0.1 (XGBoostDFSuite.scala:205)
- test with fast histo depthwidth with max depth and max bin *** FAILED ***
  0.16945997 was not less than 0.1 (XGBoostGeneralSuite.scala:200)

This will report one test failure regarding the spark-histogram test (unfortunately not yet the error I see above)

@CodingCat
Copy link
Member

are they consistently failing?

@CodingCat
Copy link
Member

@geoHeil until so far, I cannot reproduce this error

@geoHeil
Copy link
Contributor

geoHeil commented Apr 4, 2017

The histogram tests yes, and unfortunately the others as well.

@CodingCat
Copy link
Member

I cannot reproduce this, in Azure machine, Travis CI, mac book

@stefan-nikolic
Copy link

Hi, I have the similar issue, when installing xgboost with:
mvn -Dspark.version=2.1.0 clean package

Only 'xgboost4j-spark' module fails with this error:

at ml.dmlc.xgboost4j.java.JNIErrorHandle.checkCall(JNIErrorHandle.java:48)
	at ml.dmlc.xgboost4j.java.Booster.update(Booster.java:133)
	at ml.dmlc.xgboost4j.java.XGBoost.train(XGBoost.java:115)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:53)
	at ml.dmlc.xgboost4j.scala.XGBoost$.train(XGBoost.scala:83)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:132)
	at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:111)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:796)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:336)
	at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:334)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:957)
	at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:888)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:948)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:694)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
17/04/06 22:55:52 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
ml.dmlc.xgboost4j.java.XGBoostError: [22:55:52] src/objective/regression_obj.cc:41: Check failed: base_score > 0.0f && base_score < 1.0f base_score must be in (0,1) for logistic loss

I have followed the installation instructions from this page: http://xgboost.readthedocs.io/en/latest/jvm/
My setup is: Ubuntu 16.04, Python 2.7.12. Do you have any suggestions, what can be wrong?

@Widerstehen
Copy link
Author

Widerstehen commented Apr 9, 2017

@stefan-nikolic My environment is redhat 6.5 python 2.7.10

cd xgboost-master
make -j4
cd jvm-packages
mvn install

you can try it.

@Widerstehen
Copy link
Author

@CodingCat It is have been repaired in lastest change?

@CodingCat
Copy link
Member

No, haven't look at this one

@CodingCat
Copy link
Member

got a chance to look at the problem this afternoon, it is simply because we didn't implement MLWritable for XGBoostEstimator, the problem should be fixed by #2265

@Widerstehen
Copy link
Author

yeah, i will test it later. thanks your replay

@Widerstehen
Copy link
Author

It works well now ,thanks !

@LucasEstevam
Copy link

Getting the same test errors as @geoHeil , with fast histogram related tests failing. Did you ever manage to figure out what the problem was @geoHeil ?

@geoHeil
Copy link
Contributor

geoHeil commented May 16, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants