-
Notifications
You must be signed in to change notification settings - Fork 1.1k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct use of model_server_workers #1275
Comments
Hi Antoine, Thanks for reaching out! I couldn't reproduce it with an arbitrary estimator. Would you please provide the exact sagemaker-python-sdk code that you used to run into this? |
hi @knakad, version of sagemaker package = 1.42.1 My code looks like that:
for more info, i print the result of the function MXNetModel.prepare_container_def:
As expected there is mention to SAGEMAKER_MODEL_SERVER_WORKERS variable, because the model_server_workers parameter is None. i think this is the correct behaviour. My guess is that the problem is in the code that is loaded in the inference instance (sagemaker_inference_toolkit or sagemaker-mxnet-serving-container) that does not handle correctly the case when SAGEMAKER_MODEL_SERVER_WORKERS is None. I have printed the Environment variables from the inference instance
|
I tried to reproduce this, and got this in my logs:
"Default workers per model: 4" does match the number of CPUs. edit: did some more digging - the inference toolkit does not have a defined default for the number of workers, and the underlying model server's default is the number of CPUs, so from a code standpoint, things look as though they should align with the documentation as well. |
That is strange!! |
i have deployed another time with:
why do my logs shows a number of CPUS equal to 1? i have seen that in your log you have
According https://aws.amazon.com/ec2/physicalcores/, c5.2xlarge has 4 Physical Core Count. is C5.2xlarge the same as ml.c5.2xlarge?? |
actually when i print the number of CPU from my script deployed i have different results:
|
In my previous response, I had been running batch transform jobs (I think with ml.m4.xlarge instances) because I happened to have those handy. I tried again, this time modifying this notebook for a basic endpoint deployment, and saw what you got:
I tried this with an ml.c5.2xlarge, ml.c5.xlarge, and ml.m4.2xlarge. Since I replicated the issue you're encountering through deploying to an endpoint but not when using batch transform, I'm going to reach out to the team that owns SageMaker Hosting and see if they have any insight. |
HI @laurenyu, At least you have been able to reproduce it! ;-) Small question, do you know what image are you using with the batch transform? is is the same as 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3? |
I've passed along this issue link, so we'll keep updates here for now. For batch transform, I was using the same image, which is why I wonder if there is something happening with the hosting platform rather than the MXNet serving image itself. |
I ran a simple java code on the sagemaker inference instance to get the number of CPUs; Runtime.getRuntime().availableProcessors()); , and it returns "1". This should be investigated from sagemaker side. |
This happens because Runtime.getRuntime().availableProcessors()); returns 1 in a docker environment by default More details: |
I have the same problem deploying models via aws cdk. Image: "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.6.0-gpu-py3" Log:
@laurenyu Does any update about this issue? |
I have the same problem as @romavlasov when deploying with ml.g4dn.xlarge. |
@ldong87 Did you find any workaround? |
@romavlasov it's possible they confuse vCPU and physical CPUs in virtualization. I tried with ml.g4dn.2xlarge and the default number of workers is ok I think. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the bug
The documentation states that when i deploy a model with model_server_workers = None,
However what i found is when i deploy my model in a ml.c5.2xlarge (8 vCPU, one CPU i Guess), it only uses 1 worker (show logs below)
if i pass the parameters into the deploy function, it correctly set the Default workers per model to the number i have specified through the model_server_workers parameter.
As a conclusion, the documentation is not updated, or the behaviour when model_server_workers = NOne does not work.
To reproduce
Deploy any model on a ml.c5.2xlarge, check the log and the entry Default workers per model
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots or logs
This is an extract of the log from the endpoint:
System information
A description of your system. Please provide:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: