Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix script mode training hang with logging enabled #77

Merged
merged 1 commit into from
Dec 4, 2019

Conversation

aws-patlin
Copy link
Contributor

@aws-patlin aws-patlin commented Dec 2, 2019

Description of changes:
A user discovered that after a recent release, their training script stops to work when adding a stdout stream handler to the logger. The issue was traced back to this commit in sagemaker-containers.

capture_error=True appends stderr to the error message that gets thrown if training fails. For context, this was specifically a workaround for PyTorch, which can throw a specific error even if training succeeds, so I don't believe this is necessary for XGBoost.

Corresponding PR for 0.90-2: #78

Testing:
Using the prod image with capture_error enabled, the training script would hang with no log output. With capture_error disabled on a custom image, I was able to complete the training job successfully with the expected log output.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@aws-patlin aws-patlin merged commit 363e39d into aws:master Dec 4, 2019
aws-patlin added a commit to aws-patlin/sagemaker-xgboost-container that referenced this pull request Dec 5, 2019
aws-patlin added a commit to aws-patlin/sagemaker-xgboost-container that referenced this pull request Dec 5, 2019
aws-patlin added a commit that referenced this pull request Dec 5, 2019
edwardjkim pushed a commit to edwardjkim/sagemaker-xgboost-container that referenced this pull request Mar 17, 2021
edwardjkim pushed a commit to edwardjkim/sagemaker-xgboost-container that referenced this pull request Mar 17, 2021
edwardjkim pushed a commit to edwardjkim/sagemaker-xgboost-container that referenced this pull request Mar 17, 2021
edwardjkim pushed a commit to edwardjkim/sagemaker-xgboost-container that referenced this pull request Mar 17, 2021
edwardjkim pushed a commit that referenced this pull request Mar 17, 2021
… image (#179)

* Bump Python to 3.7.10

* Merge commits from 0.90-1 back to reverted master

* Fix CSV Pipe parsing argument to use weight instead of weights. Fix requirements for tox. (#81)

* Fix script mode training hang with logging enabled. (#77)

* Fix training unit test to match PR #77. (#84)

* Fix label concatenation for RecordIO-protobuf dmatrix (#85)

Closes #83

* Add verbosity to hyperparameter validation. (#87)

* Add verbosity to hyperparameter validation.

* Set scipy requirement to 1.2.2 for sagemaker-containers.

* Add missing eval_metrics to hp validation. (#82)

* Added aucpr and cox-nloglik to eval_metric hp validation.
* Add two separate list for MAXIMIZE and MINIMIZE metrics.

Co-authored-by: ericangelokim <39601338+ericangelokim@users.noreply.github.com>
Co-authored-by: Patrick Lin <52252844+aws-patlin@users.noreply.github.com>
Co-authored-by: rizwangilani <rizwan.gl@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants