Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] cross-version spark support #4350

Closed
CodingCat opened this Issue Apr 9, 2019 · 5 comments

Comments

Projects
None yet
3 participants
@CodingCat
Copy link
Member

CodingCat commented Apr 9, 2019

@hcho3 I am going to work on support spark 2.4.1 and have compatibility test over spark 2.3

my plan is to trigger two builds for spark 2.4/2.3 respectively and also have version-specific test to ensure the compatibility

shall I wait for java worker to be ready in jenkins or I should work on travis?

@CodingCat CodingCat self-assigned this Apr 9, 2019

@hcho3

This comment has been minimized.

Copy link
Collaborator

hcho3 commented Apr 9, 2019

Let me add Java workers to Jenkins. Can you provide commands to compile JARs?

@CodingCat

This comment has been minimized.

Copy link
Member Author

CodingCat commented Apr 9, 2019

yes, just mvn package

@srowen

This comment has been minimized.

Copy link

srowen commented Apr 11, 2019

That's great @CodingCat -- would be great to get a 2.4.x build going as 2.3.x is EOL in a few months. I suspect you have this well in hand, but if you're hitting weird problems updating to 2.4 (shouldn't be much) I'd be happy to try to debug.

@CodingCat

This comment has been minimized.

Copy link
Member Author

CodingCat commented Apr 16, 2019

I would limit the definition of cross-version support to "support loading models trained in previous version" in XGBoost

I have done several experiments on running a spark-2.4-built xgboost with spark 2.3 or vice versa. The most significant problem is from the library which Spark depends on and brings some breaking changes by their own. In that way we cannot guarantee a spark 2.4 built version can be run with spark 2.3 runtimes

even to "support loading models trained in previous version", we need some code to handle (1) breaking changes in XGBoost parameters, e.g. reg:linear doesn't exist anymore; (2) breaking changes in Spark, e.g. vectorAssembler will fail with Float.NaN by default...

@srowen

This comment has been minimized.

Copy link

srowen commented Apr 16, 2019

Personally, I'd relax that condition if needed -- require the same version of xgboost / Spark to read/write things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.