[XGBoost4J-Spark] Problem for (de)serialization of Spark implementation of custom objective and eval #7274

nicovdijk · 2021-09-28T18:34:58Z

Suggested solution to the issue in the title.

trivialfis · 2021-09-30T09:03:12Z

Thanks for the PR!

@wbo4958 Could you please take a look when you are available?

nicovdijk · 2021-09-30T18:48:52Z

Looking forward to getting some feedback. Also, I'm just realizing that my implementation does not add type hints when loading is the first thing that happens (now the type hints are added when instantiating the custom obj/eval for the first time). I guess one could make sure one always creates an instance before loading, but there must be a better way. I would appreciate any suggestions. Thanks!

wbo4958 · 2021-10-01T01:02:13Z

@nicovdijk What issue does your PR try to solve? Could you add more descriptions?

nicovdijk · 2021-10-01T05:12:32Z

@nicovdijk What issue does your PR try to solve? Could you add more descriptions?

@wbo4958: This PR is a solution to issue #7224 with the same name.

wbo4958

@nicovdijk I just figured out what your PR tries to fix. Overall, You PR can fix this issue. But it's not straight forward for users who would like to customize their obj/eval, it requires users explicitly set the type hint.

Could we have callback way to do it by xgboost?

...packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala

wbo4958 · 2021-10-05T08:32:22Z

jvm-packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/CustomObj.scala

+import ml.dmlc.xgboost4j.scala.{DMatrix, ObjectiveTrait}
+import ml.dmlc.xgboost4j.scala.spark.params.CustomObjParam._
+import org.apache.commons.logging.LogFactory
+import org.json4s.ShortTypeHints


this is not used. better to remove it

nicovdijk · 2021-10-05T10:31:33Z

@nicovdijk I just figured out what your PR tries to fix. Overall, You PR can fix this issue. But it's not straight forward for users who would like to customize their obj/eval, it requires users explicitly set the type hint.

Could we have callback way to do it by xgboost?

@wbo4958, I would like to automate it, yes, but I'm not sure what is the best way.

We could implement adding the type hint in abstract classes SparkCustomObjective and SparkCustomEval, see the new commit. What do you think?

Also:

I was also wondering how persistence works in xgboost4j. Is it possible there to save and load including a custom objective and eval? I don't see any persistence test, but maybe I'm not looking in the right place...
The type hints are now added when an instance of a custom objective or eval is created. How can I make sure the hints are added when creating the classes? Right now, saving a model, then restarting the java program and trying to load would not work...

PS The comments have been fixed (I don't seem to have a proper linter).

wbo4958 · 2021-10-10T23:15:21Z

XGBoost2MLlibParams https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/GeneralParams.scala#L260 will check and set the param, Could we add some matches for CustomObjParam and CustomEvalParam and set the TypeHints here?

nicovdijk · 2021-10-11T09:39:53Z

XGBoost2MLlibParams https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/GeneralParams.scala#L260 will check and set the param, Could we add some matches for CustomObjParam and CustomEvalParam and set the TypeHints here?

That works nicely, thanks for the suggestion.

The only problem left is that we need to make sure that the type hint gets added, also before creating an instance for the first time (now it gets added when we save first). Any suggestions?

Maybe one would like to create a dummy instance somewhere when creating the class itself. However, I don't really how this should work for someone using XGBoost..

wbo4958 · 2021-10-11T22:03:23Z

XGBoost2MLlibParams https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/GeneralParams.scala#L260 will check and set the param, Could we add some matches for CustomObjParam and CustomEvalParam and set the TypeHints here?

That works nicely, thanks for the suggestion.

The only problem left is that we need to make sure that the type hint gets added, also before creating an instance for the first time (now it gets added when we save first). Any suggestions?

Maybe one would like to create a dummy instance somewhere when creating the class itself. However, I don't really how this should work for someone using XGBoost..

We could add define an Interface (Trait) for this

nicovdijk · 2021-10-12T07:30:58Z

We could add define an Interface (Trait) for this

I'm not sure I understand.. Ideally we should register the type hint before instantiation. However, it seems to me that we cannot do that for someone using XGBoost. They would have to do that themselves.

We would like to do the registration when defining the class (maybe in the companion object?). In python, for instance, you can run some code when importing a module, but in Scala I'm not sure how to accomplish this (objects are lazy). I don't know how a Trait can help me with this. Can you explain more?

The following code demonstrates the issue (I think):

import org.scalatest.FunSuite

class MyTest extends FunSuite {

  trait TestTrait {}

  object TestTrait {
    private var typeHintsAdded = Set[String]()

    def addTypeHint(instance: Any): Unit = {
      val clazz = instance.getClass()
      val className = clazz.getSimpleName()
      if (!typeHintsAdded.contains(className)) {
        typeHintsAdded += className
      }
    }

    def printTypeHints() {
      println(s"Type hints:\n$typeHintsAdded")
    }
  }

  class TestClass extends TestTrait {
    TestTrait.addTypeHint(this)
  }

  object TestClass extends TestTrait {
  }

  test("testing adding type hint while defining class") {
    println("Before creating instance\n")
    TestTrait.printTypeHints()
    val testClass = new TestClass
    println("After creating instance\n")
    TestTrait.printTypeHints()
  }
}

wbo4958 · 2021-10-12T10:18:24Z

If users customize their obj, then they must extends ObjectiveTrait, So what am I think is whether we can add another trait like ABC.scala where define some methods to add TypeHints, and let ObjectiveTrait extends ABC. Then when users try to customize, they will be forced to implement the method adding TypeHint.

nicovdijk · 2021-10-12T11:01:32Z

I would like a general TypeHint adding trait. However, modifying ObjectiveTrait in that way would also force users that do not use Spark to implement that method... Is that a good idea?

Also, I still don't see how that solves the problem about adding type hints before instantiation of the corresponding class. Sorry if I'm slow..

wbo4958 · 2021-10-12T22:06:55Z

would like a general TypeHint adding trait. However, modifying ObjectiveTrait in that way would also force users that do not use Spark to implement that method... Is that a good idea?

Also, I still don't see how that solves the problem about adding type hints before instantiation of the corresponding class. Sorry if I'm slow..

End users customize obj, then they must implement the method adding type hint. So the customized obj has the instance of type hint.
We can add a case to match CustomObjParam and add another method to set the type hint from the customized obj into the CustomObjParam in XGBoost2MLlibParams

Please correct me.

nicovdijk · 2021-10-13T08:28:52Z

would like a general TypeHint adding trait. However, modifying ObjectiveTrait in that way would also force users that do not use Spark to implement that method... Is that a good idea?
Also, I still don't see how that solves the problem about adding type hints before instantiation of the corresponding class. Sorry if I'm slow..

End users customize obj, then they must implement the method adding type hint. So the customized obj has the instance of type hint.

We can add a case to match CustomObjParam and add another method to set the type hint from the customized obj into the CustomObjParam in XGBoost2MLlibParams

Please correct me.

The issue is not that a method has to be implemented (we can implement the method for the end user), but that the user needs to execute that method in order to actually add the TypeHint (see the code in one of my previous comments).
I had already implemented two additional cases in XGBoost2MLlibParams that took care of adding TypeHints without the need for an additional method.

However, the downside of adding TypeHints in XGBoost2MLlibParams is that creating an instance of the CustomObj or CustomEval is no longer enough. One would need to use them as parameters in an XGBoostRegressor or XGBoostClassifier... I think it's better to execute the logic upon instantiation of a custom objective or eval.

wbo4958 · 2021-10-13T21:56:33Z

Hi @nicovdijk, Personally, this still seems like a workaround instead of a solution. I will try to make a commit locally to see if my suggestion can work.

nicovdijk · 2021-10-13T22:01:53Z

@wbo4958, I'm still not really sure what you mean, so I would like to see any suggestion you have, thanks!

By the way, I did not include the type hint trait in the objective trait, because that's in the xgboost4j package which does not serialize the parameters, if I'm correct. I don't think it has the json4s dependency either. But that's definitely possible, if that's what you mean. Is it?

wbo4958 · 2021-10-14T09:23:21Z

@nicovdijk Thx for your feed back. Yeah, I just realize the ObjectiveTrait is only in XGBoost4j.

BTW, Have you tested the scenario just load the model with customized obj/eval without any other train process?

nicovdijk · 2021-10-14T09:34:09Z

@wbo4958, I haven't tested it using XGBoost (the tests create temporary files, to do this test we would need a saved model as resource), but the script I provided earlier demonstrates the issue. So I'm pretty sure that it would fail at the moment.

The user would have to create temporary instances of all custom obj/evals when starting the program, then everything works. Unless we have some initialization logic where we could do this for the user.

wbo4958 · 2021-10-14T10:41:15Z

Ok, Could you refine your PR, then we can review it

nicovdijk · 2021-10-14T10:49:39Z

Ok, Could you refine your PR, then we can review it

@wbo4958 Sorry, what refinements exactly would you like?

...kages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/AddTypeHints.scala

...kages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/CustomParams.scala

...packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala

codecov-commenter · 2021-10-15T08:41:58Z

Codecov Report

Merging #7274 (3ad1414) into master (4fd149b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #7274   +/-   ##
=======================================
  Coverage   83.68%   83.68%           
=======================================
  Files          13       13           
  Lines        3885     3885           
=======================================
  Hits         3251     3251           
  Misses        634      634

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fd149b...3ad1414. Read the comment docs.

nicovdijk · 2021-10-15T08:45:39Z

@wbo4958, I refined 😄 the PR.

There is one last thing I would like to add. We use the Spark API in a Java project. Using traits with implementations is a bit cumbersome then. Can we add an abstract class SparkCustomObjective or JavaCustomObjective (or any other name) to add support for that? It should probably extends IObjective with TypeHintsTrait.

wbo4958 · 2021-10-15T10:31:44Z

@wbo4958, I refined smile the PR.

There is one last thing I would like to add. We use the Spark API in a Java project. Using traits with implementations is a bit cumbersome then. Can we add an abstract class SparkCustomObjective or JavaCustomObjective (or any other name) to add support for that? It should probably extends IObjective with TypeHintsTrait.

In that case, SparkCustomObjective will break the public API.

Anyway, this PR looks good for now. We can merge it for now and we will refine this once we figured out a better way for this.

Thx @nicovdijk

wbo4958 · 2021-10-15T10:32:30Z

@trivialfis Please help to review it, and please change the [XGBoost-spark] to [jvm-packages]

nicovdijk · 2021-10-15T10:51:57Z

There is one last thing I would like to add. We use the Spark API in a Java project. Using traits with implementations is a bit cumbersome then. Can we add an abstract class SparkCustomObjective or JavaCustomObjective (or any other name) to add support for that? It should probably extends IObjective with TypeHintsTrait.

In that case, SparkCustomObjective will break the public API.

Just to be clear: I would add a Scala abstract class that would be easier to use in Java code, nothing more. On the other hand, I guess there is a work-around by manually using the SavedTypeHints object to add TypeHints.

wbo4958 · 2021-10-15T11:34:53Z

There is one last thing I would like to add. We use the Spark API in a Java project. Using traits with implementations is a bit cumbersome then. Can we add an abstract class SparkCustomObjective or JavaCustomObjective (or any other name) to add support for that? It should probably extends IObjective with TypeHintsTrait.

In that case, SparkCustomObjective will break the public API.

Just to be clear: I would add a Scala abstract class that would be easier to use in Java code, nothing more. On the other hand, I guess there is a work-around by manually using the SavedTypeHints object to add TypeHints.

You mean xgboost4j introduces the trait from xgboost4j-spark?

nicovdijk · 2021-10-15T12:05:19Z

You mean xgboost4j introduces the trait from xgboost4j-spark?

No, I only want to add the following to xgboost4j-spark:

abstract class SparkCustomEval extends EvalTrait with TypeHintsTrait

abstract class SparkCustomObjective extends ObjectiveTrait with TypeHintsTrait

Then in Java, you use SparkCustomEval and SparkCustomObjective directly (you cannot have multiple inheritance in Java).

nicovdijk · 2021-10-18T14:37:54Z

@wbo4958, I think I did what you wanted. It was the first time I used rebasing, so please check if the result is correct, thanks.

jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/Utils.scala

wbo4958

LGTM

wbo4958 · 2021-10-20T03:16:18Z

@trivialfis Could you help to review it

trivialfis

Look good to me. Thanks for the nice work!

abugh · 2021-11-24T15:06:59Z

@trivialfis @nicovdijk Hi I'm a user of XGBoost and get caught by this issue for a long time. Just as @wbo4958 said, is it possible for XGBoost to solve it so that XGboostModel can be integrated as a stage of a Spark PipelineMode when we are using custom objectives. For now, we must use XGBoostClassificationModel.load(), or else it will cause errors. It's really inconvenient for the users.

nicovdijk · 2021-11-24T15:18:49Z

@abugh, I believe this merged PR does exactly that. Is this not the case?

wbo4958 · 2021-11-25T00:03:17Z

@abugh, Yeah, Please try the newest release

abugh · 2021-11-25T01:24:13Z

@nicovdijk But I see that in the jvm-packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala unit-test, you are still using XGBoostClassificationModel .save() and XGBoostClassificationModel .load() .

How about if we transfer XGBoostClassificationModel into a stage of Spark PipelineModel, just like this val pipelineModel = new Pipeline().setStages(Array(XGBoostClassificationModelxxx)).fit(xxx)? and then use pipelineModel.save and PipelineModel.load to save and load. Do you mean that the newest xgboost support this now? I didn't see such case in unit-test in this PR.
Thanks for replying.

abugh · 2021-11-25T05:21:48Z

Now I'm using XGBoost-Spark 1.3.1, with scala 2.1.2 . When I'm using XGBoostRegressionModel with Customized Objectives, I can't even load the model with XGBoostRegressionModel.load() and (I used XGBoostRegressionModel.save() to save the model).
I thought that the customized objectives are the core module of the XGBoost, but seems that it doesn't work for me even when I'm doing the standard operations(XGBoostRegressionModel.load()). Is there anything wrong in my operations? Or is there anyway for me to solve this problem except by upgrading the xgboost?
If I must upgrade the xgboost version, which version at least I need to upgrade to?

abugh · 2021-11-25T06:28:39Z

And I see that in the latest version(1.5.1), the unit-test about customized objectives is replaced by using integrated objective.
https://github.com/dmlc/xgboost/blob/release_1.5.1/jvm-packages/xgboost4j-spark/src/test/scala/ml/dmlc/xgboost4j/scala/spark/PersistenceSuite.scala

nicovdijk · 2021-11-25T08:25:48Z

@abugh, this PR is not included in the 1.5.0 release or the patch release 1.5.1. It is included in the master branch. I do not know why it's not included yet nor when it will be.

I have even tested saving and loading in a pipeline in master - it works (which is expected, since it's just using the save and load implementation of the xgboost stage).

abugh · 2021-11-25T08:49:16Z

@nicovdijk Really thanks for your applying. So do you mean that loading in a pipeline is same to loading in XGBoostRegressionModel? Or else, If loading in XGBoostRegressionModel works well, loading in a pipeline should also work well?

I have even tested saving and loading in a pipeline in master - it works Yeah I think it makes sense, as masters branch has already admit this PR, and as you said, this PR is just used to solve this problem. But if I'm using the release Patch(1.5.1 or 1.3.1)m it doesn't work. Any suggestion you can give for me if I can only use release patch for some reasons?

abugh · 2021-11-25T08:52:07Z

Oh! good news. I change to use 1.5.1 release patch, seems it works now when using loading in XGBoostRegressionModel(loading in pipelineMode isn't tested yet, but I guess it works.)

I don't know the reason, as you mentioned that 1.5.1 release patch doesn't admit this PR. But I just share this information to you.

nicovdijk · 2021-11-25T22:42:23Z

@nicovdijk Really thanks for your applying. So do you mean that loading in a pipeline is same to loading in XGBoostRegressionModel? Or else, If loading in XGBoostRegressionModel works well, loading in a pipeline should also work well?

I have even tested saving and loading in a pipeline in master - it works Yeah I think it makes sense, as masters branch has already admit this PR, and as you said, this PR is just used to solve this problem. But if I'm using the release Patch(1.5.1 or 1.3.1)m it doesn't work. Any suggestion you can give for me if I can only use release patch for some reasons?

@trivialfis, do you know if/when the code in this PR gets released?

abugh · 2021-11-26T06:20:20Z

And also could you tell me which spark version is needed when using xgboost4j-spark_1.51 with Scala 2.12?
Is 3.2.0 needed? Or just 3.1.x is OK?

abugh · 2021-11-27T12:33:05Z

Oh! good news. I change to use 1.5.1 release patch, seems it works now when using loading in XGBoostRegressionModel(loading in pipelineMode isn't tested yet, but I guess it works.)

I don't know the reason, as you mentioned that 1.5.1 release patch doesn't admit this PR. But I just share this information to you.

Unfortunately, seems that the successful run was a coincidence. I can't recurrent it even with the same conditions, and not know the reason.

It would be nice if you can tell me the plan/schedule to release this PR in the release patch. I will wait for it.

nicovdijk force-pushed the master branch from c8fb713 to 3d237e8 Compare September 28, 2021 18:50

nicovdijk mentioned this pull request Sep 28, 2021

[XGBoost4J-Spark] Problem for (de)serialization of Spark implementation of custom objective and eval #7224

Closed

wbo4958 reviewed Oct 5, 2021

View reviewed changes

wbo4958 reviewed Oct 15, 2021

View reviewed changes

nicovdijk and others added 10 commits October 18, 2021 16:19

impl in GeneralParams, removed abstract classes

85aff26

reverting unnecessary change to EvalError

57b9429

using a single SavedTypeHints object

4384bc2

moved logic to AddTypeHints trait

cbac4f9

style change

643a064

fixed review comments

8e51b6c

scalastyle changes

6272a72

a new way to handle json encode/decode

f447df8

moved some logic (CustomGeneralParam and params.Utils)

0247104

updated copyright year and a code comment

aa6c2b4

nicovdijk force-pushed the master branch from b24f9ec to aa6c2b4 Compare October 18, 2021 14:35

wbo4958 reviewed Oct 18, 2021

View reviewed changes

jvm-packages/xgboost4j-spark/src/main/scala/ml/dmlc/xgboost4j/scala/spark/params/Utils.scala Outdated Show resolved Hide resolved

wbo4958 approved these changes Oct 18, 2021

View reviewed changes

removed comment as suggested

3ad1414

trivialfis approved these changes Oct 21, 2021

View reviewed changes

trivialfis merged commit 31a307c into dmlc:master Oct 21, 2021

[XGBoost4J-Spark] Problem for (de)serialization of Spark implementation of custom objective and eval #7274

[XGBoost4J-Spark] Problem for (de)serialization of Spark implementation of custom objective and eval #7274

Conversation

nicovdijk commented Sep 28, 2021

trivialfis commented Sep 30, 2021

nicovdijk commented Sep 30, 2021 via email • edited

wbo4958 commented Oct 1, 2021

nicovdijk commented Oct 1, 2021 • edited

wbo4958 left a comment

Choose a reason for hiding this comment

wbo4958 Oct 5, 2021

Choose a reason for hiding this comment

nicovdijk commented Oct 5, 2021 • edited

wbo4958 commented Oct 10, 2021

nicovdijk commented Oct 11, 2021

wbo4958 commented Oct 11, 2021

nicovdijk commented Oct 12, 2021 • edited

wbo4958 commented Oct 12, 2021

nicovdijk commented Oct 12, 2021

wbo4958 commented Oct 12, 2021

nicovdijk commented Oct 13, 2021 • edited

wbo4958 commented Oct 13, 2021

nicovdijk commented Oct 13, 2021 • edited

wbo4958 commented Oct 14, 2021

nicovdijk commented Oct 14, 2021 • edited

wbo4958 commented Oct 14, 2021

nicovdijk commented Oct 14, 2021 • edited

codecov-commenter commented Oct 15, 2021 • edited

Codecov Report

nicovdijk commented Oct 15, 2021 • edited

wbo4958 commented Oct 15, 2021

wbo4958 commented Oct 15, 2021

nicovdijk commented Oct 15, 2021

wbo4958 commented Oct 15, 2021 • edited

nicovdijk commented Oct 15, 2021 • edited

nicovdijk commented Oct 18, 2021

wbo4958 left a comment

Choose a reason for hiding this comment

wbo4958 commented Oct 20, 2021

trivialfis left a comment

Choose a reason for hiding this comment

abugh commented Nov 24, 2021 • edited

nicovdijk commented Nov 24, 2021

wbo4958 commented Nov 25, 2021

abugh commented Nov 25, 2021

abugh commented Nov 25, 2021 • edited

abugh commented Nov 25, 2021

nicovdijk commented Nov 25, 2021

abugh commented Nov 25, 2021

abugh commented Nov 25, 2021

nicovdijk commented Nov 25, 2021

abugh commented Nov 26, 2021 • edited

abugh commented Nov 27, 2021

nicovdijk commented Sep 30, 2021 via email •

edited

nicovdijk commented Oct 1, 2021 •

edited

nicovdijk commented Oct 5, 2021 •

edited

nicovdijk commented Oct 12, 2021 •

edited

nicovdijk commented Oct 13, 2021 •

edited

nicovdijk commented Oct 13, 2021 •

edited

nicovdijk commented Oct 14, 2021 •

edited

nicovdijk commented Oct 14, 2021 •

edited

codecov-commenter commented Oct 15, 2021 •

edited

nicovdijk commented Oct 15, 2021 •

edited

wbo4958 commented Oct 15, 2021 •

edited

nicovdijk commented Oct 15, 2021 •

edited

abugh commented Nov 24, 2021 •

edited

abugh commented Nov 25, 2021 •

edited

abugh commented Nov 26, 2021 •

edited