Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade XgBoost to 1.7.x #362

Merged
merged 7 commits into from
Mar 6, 2023
Merged

Upgrade XgBoost to 1.7.x #362

merged 7 commits into from
Mar 6, 2023

Conversation

malav-shastri
Copy link
Contributor

@malav-shastri malav-shastri commented Jan 24, 2023

Issue #, if available:

  • We are upgrading XgBoost version to 1.7.3 (Update :) 1.7.4, with this first revision of PR we are targeting the Unit test failures which have resulted from this upgrade. -- https://quip-amazon.com/sZqYAQpmTGtR/Breaking-features-in-Xgboost152-173
    (Note: According to the version upgrade process and steps this PR will only be merged after successful integration tests and container test and changes done accordingly in the following revisions)

Description of changes:

  • A new callback interface was designed for Python package starting from version 1.3.x. Even though the old callback style was deprecated the code traces were not removed until 1.6.x. And hence we on 1P algorithms side are still using old callback style function and code in 1.5.2 specifically in checkpointing.py. Since in 1.7.3 we can’t use deprecated callback style, there’s a need for updating checkpointing.py and test_checkpointing.py according to the new callback style functions.
  • According to open source XgBoost training code the callbacks are applied at the end of each iteration. Here in this new implementation I have override and implemented after_training() function of new callback API in our existing set of callbacks. According to the opensource XgBoost code this will be called after the update function which updates the model after an iteration. After the execution of update function, code calls after_iteration function of CallbackContainer class, which in turns invoke the after_iteration function defined in each of these callback function from the list.
  • There's still a failure with test_training.py file, but the error stack trace suggests that it's commit from smdebug hook class we already have a ticket with the debugger team to make their code compatible with the latest XgBoost version and we'll be following up on the issue again. Update we have tested all the unit tests and integration tests without debugger code to check this implementation and we aren't experiencing any failures with the unit and integration tests so far. (Update : ) Debugger has fixed the issue and we are not having any further failures/errors

Testing:

  • pytest test/unit/
  • flake8 tests
  • Integration tests

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@amzn-choeric
Copy link
Contributor

amzn-choeric commented Jan 24, 2023

Just adding a side note. Please check to see if the override for tracker.py is really necessary at this point. I believe it was added as an ad-hoc fix until the open source library rolled out the changes on their side, but I have not looked into it too deeply. Also, please make sure that the environment flag for disabling container support is added by default in serving_mms.py as oppose to requiring an opt-in.

@amzn-choeric
Copy link
Contributor

Also, you should be able to run integration tests prior to merging.

@malav-shastri
Copy link
Contributor Author

malav-shastri commented Jan 24, 2023

Also, you should be able to run integration tests prior to merging.

sure thanks, that's the plan. Will not be merging this prior to ensuring the succeeding integration tests.

@malav-shastri malav-shastri marked this pull request as draft January 25, 2023 00:09
@dewan-c
Copy link
Contributor

dewan-c commented Jan 27, 2023

Looks like you're planning on submitting another revision and testing. LGTM otherwise.

@malav-shastri
Copy link
Contributor Author

Just adding a side note. Please check to see if the override for tracker.py is really necessary at this point. I believe it was added as an ad-hoc fix until the open source library rolled out the changes on their side, but I have not looked into it too deeply. Also, please make sure that the environment flag for disabling container support is added by default in serving_mms.py as oppose to requiring an opt-in.

I think I'll sync with you offline on this, as I have some questions

@malav-shastri
Copy link
Contributor Author

malav-shastri commented Feb 1, 2023

Updated the description with some more details on new callback API

Copy link
Contributor

@mabunday mabunday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the detailed description and deep dive into the new callback behavior.

Optionally, feel free to run black -l 120 src/sagemaker_xgboost_container/checkpointing.py and black -l 120 test/unit/test_checkpointing.py, but this can also be saved for another PR.

src/sagemaker_xgboost_container/checkpointing.py Outdated Show resolved Hide resolved
@@ -334,7 +330,6 @@ def interaction_constraints_validator(value, dependencies):
hpv.ContinuousHyperparameter(
name="aft_loss_distribution_scale", range=hpv.Interval(min_closed=0), required=False
),
hpv.CategoricalHyperparameter(name="single_precision_histogram", range=["true", "false"], required=False),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not supported in 1.7 anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malav-shastri malav-shastri marked this pull request as ready for review March 3, 2023 20:23
@malav-shastri malav-shastri merged commit 6dcd442 into aws:master Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants