-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fixing issue #82 #83
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@@ -150,6 +150,7 @@ def _generate_mms_config_properties(): | |||
"default_workers_per_model": env.model_server_workers, | |||
"inference_address": "http://0.0.0.0:{}".format(env.inference_http_port), | |||
"management_address": "http://0.0.0.0:{}".format(env.management_http_port), | |||
"vmargs": "-XX:-UseContainerSupport", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we should make this configurable via SM env
? Not sure if it would require additional changes anywhere else +@dhanainme
"vmargs": env.vmargs if env.vmargs else "-XX:-UseContainerSupport"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. TS_VM_ARGS could be the env variable where we can pick up from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs to happen here in this file
self._module_name = os.environ.get(parameters.USER_PROGRAM_ENV, DEFAULT_MODULE_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tried the fix in this PR that should sovle #82 but we see no difference in the number of CPU's logged in cloudwatch. So not sure if more changes are involved, but this fix as seperate change seems not solving the issue. The container we use (pytorch 1.7.1, torch-serve 0.4.0), uses JDK 11 which should have the property -XX:-UseContainerSupport
enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it won't work for PyTorch >= 1.6 containers since torchserve
model server is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why this doesn't fix PT >=1.6 is because the pytorch inference toolkit needs similar fix.
Curious to know why this PR is still pending? |
Hi, we are facing the same issue and would like to use this fix in Sagemaker. Is there a plan to cut a release anytime soon? |
Issue #, if available:
Fixing issue #82
Description of changes:
The change includes using the option `` as suggested in the documentation
Testing done:
I have built a custom container using the patched version and the CPU count matches available CPU count in the container.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.