Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct use of model_server_workers #1275

Closed
anotinelg opened this issue Jan 30, 2020 · 15 comments
Closed

Correct use of model_server_workers #1275

anotinelg opened this issue Jan 30, 2020 · 15 comments

Comments

@anotinelg
Copy link

Describe the bug
The documentation states that when i deploy a model with model_server_workers = None,

model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.

However what i found is when i deploy my model in a ml.c5.2xlarge (8 vCPU, one CPU i Guess), it only uses 1 worker (show logs below)

if i pass the parameters into the deploy function, it correctly set the Default workers per model to the number i have specified through the model_server_workers parameter.
As a conclusion, the documentation is not updated, or the behaviour when model_server_workers = NOne does not work.

To reproduce
Deploy any model on a ml.c5.2xlarge, check the log and the entry Default workers per model

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots or logs

This is an extract of the log from the endpoint:

**Number of CPUs: 1**
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
**Default workers per model: 1**

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: '1.42.1'
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): custom script on MXNET 1.4.1
  • Framework version: 1.4.1
  • Python version: 3
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): AWS docker

Additional context
Add any other context about the problem here.

@anotinelg anotinelg added the bug label Jan 30, 2020
@knakad
Copy link
Contributor

knakad commented Jan 30, 2020

Hi Antoine,

Thanks for reaching out!

I couldn't reproduce it with an arbitrary estimator. Would you please provide the exact sagemaker-python-sdk code that you used to run into this?

@anotinelg
Copy link
Author

hi @knakad,
I am using MXNetModel object to deploy, in a standard way. i think it does not depend on the script, any dummy one will do the job.

version of sagemaker package = 1.42.1

My code looks like that:

from sagemaker.mxnet.model import MXNetModel
args ={'model_data': 's3://../model.tar.gz',
           'name': 'mymodel',
           'role': 'arn:aws:iam::...',
           'entry_point': '.anyscript_will_do_the_job',
           'dependencies': [],
            'framework_version': '1.4.1',
              'py_version': 'py3',
         'code_location': 's3://..',
         'vpc_config': None,
        'sagemaker_session': None, 
        'model_server_workers': None}

net = MXNetModel(**args)
net.deploy(instance_type='ml.c5.xlarge', initial_instance_count=1)

for more info, i print the result of the function MXNetModel.prepare_container_def:

{'Image': 'CCCCCCCC.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3', 'Environment': {'SAGEMAKER_PROGRAM': 'inference_script.py', 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://.../model.tar.gz', 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'SAGEMAKER_REGION': 'us-east-1'}, 'ModelDataUrl': 's3://.../model.tar.gz'}

As expected there is mention to SAGEMAKER_MODEL_SERVER_WORKERS variable, because the model_server_workers parameter is None. i think this is the correct behaviour. My guess is that the problem is in the code that is loaded in the inference instance (sagemaker_inference_toolkit or sagemaker-mxnet-serving-container) that does not handle correctly the case when SAGEMAKER_MODEL_SERVER_WORKERS is None.

I have printed the Environment variables from the inference instance

2020-02-11 10:34:20,782 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - environ({'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/301670ea-2a02-4517-a501-3dd3b5c6a4c4', 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'AWS_DEFAULT_REGION': 'us-east-1', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2', 'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/0f7fade6-e2c8-42ef-a2cb-144678ab33f3', 'MXNET_KVSTORE_REDUCTION_NTHREADS': '1', 'LANG': 'C.UTF-8', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'LD_LIBRARY_PATH': ':/usr/local/lib', 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://letgo-data-science-data/sagemaker/listingclass/models/dev-listingclass-v100-0-0-2020-02-11-11-29-28/model.tar.gz', 'PYTHONPATH': '/opt/ml/model/code::/.sagemaker/mms/models/model', '__KMP_REGISTERED_LIB_1': '0x7fddcff64a08-cafe1df8-libiomp5.so', 'AWS_REGION': 'us-east-1', 'PYTHONIOENCODING': 'UTF-8', 'SAGEMAKER_REGION': 'us-east-1', 'OMP_NUM_THREADS': '1', 'PYTHONDONTWRITEBYTECODE': '1', 'MXNET_CPU_PRIORITY_NTHREADS': '1', 'MXNET_CPU_WORKER_NTHREADS': '1', 'TEMP': '/home/model-server/tmp', 'PYTHONUNBUFFERED': '1', 'SAGEMAKER_SAFE_PORT_RANGE': '25000-25999', 'HOSTNAME': 'model.aws.local', 'LC_ALL': 'C.UTF-8', 'HOME': '/root', 'SAGEMAKER_PROGRAM': 'inference_script.py', 'MMS_DECODE_INPUT_REQUEST': 'false', 'MXNET_USE_OPERATOR_TUNING': '0'})

@laurenyu
Copy link
Contributor

laurenyu commented Feb 14, 2020

I tried to reproduce this, and got this in my logs:

2020-02-14 00:05:38,498 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 2766 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

"Default workers per model: 4" does match the number of CPUs.

edit: did some more digging - the inference toolkit does not have a defined default for the number of workers, and the underlying model server's default is the number of CPUs, so from a code standpoint, things look as though they should align with the documentation as well.

@anotinelg
Copy link
Author

anotinelg commented Feb 14, 2020

That is strange!!
What version of SDK sagemaker are you using? the 1.42.1 ? Can you i test your code from my side?

@anotinelg
Copy link
Author

i have deployed another time with:

  • image 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3
  • ml.c5.2xlarge
  • instance = 1
  • Single model type

why do my logs shows a number of CPUS equal to 1? i have seen that in your log you have Number of CPUs: 4

2020-02-14 11:36:09,267 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

According https://aws.amazon.com/ec2/physicalcores/, c5.2xlarge has 4 Physical Core Count. is C5.2xlarge the same as ml.c5.2xlarge??

@anotinelg
Copy link
Author

actually when i print the number of CPU from my script deployed i have different results:

logging.warning(f"CPU COUNT: {multiprocessing.cpu_count()}")
logging.warning(f"CPU COUNT (os): {os.cpu_count()}")
2020-02-14 14:50:40,497 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - CPU COUNT: 8
2020-02-14 14:50:40,497 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - CPU COUNT (os): 8

@laurenyu
Copy link
Contributor

In my previous response, I had been running batch transform jobs (I think with ml.m4.xlarge instances) because I happened to have those handy. I tried again, this time modifying this notebook for a basic endpoint deployment, and saw what you got:

[INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

I tried this with an ml.c5.2xlarge, ml.c5.xlarge, and ml.m4.2xlarge.

Since I replicated the issue you're encountering through deploying to an endpoint but not when using batch transform, I'm going to reach out to the team that owns SageMaker Hosting and see if they have any insight.

@anotinelg
Copy link
Author

HI @laurenyu, At least you have been able to reproduce it! ;-)
Will you update the issue in that ticket or on another one created in https://github.com/aws/sagemaker-mxnet-serving-container?

Small question, do you know what image are you using with the batch transform? is is the same as 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3?

@laurenyu
Copy link
Contributor

I've passed along this issue link, so we'll keep updates here for now.

For batch transform, I was using the same image, which is why I wonder if there is something happening with the hosting platform rather than the MXNet serving image itself.

@muhyun
Copy link

muhyun commented Mar 30, 2020

I ran a simple java code on the sagemaker inference instance to get the number of CPUs;

Runtime.getRuntime().availableProcessors());

, and it returns "1". This should be investigated from sagemaker side.

@dotgc
Copy link

dotgc commented May 14, 2020

This happens because Runtime.getRuntime().availableProcessors()); returns 1 in a docker environment by default

More details:

@romavlasov
Copy link

romavlasov commented Jan 13, 2021

I have the same problem deploying models via aws cdk.

Image: "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.6.0-gpu-py3"
Instance: "ml.g4dn.xlarge" (number of vCPU: 4)

Log:

main org.pytorch.serve.ModelServer
Torchserve version: 0.2.1
TS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 1
Max heap size: 3806 M
Python executable: /opt/conda/bin/python
Config file: /etc/sagemaker-ts.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Metrics address: http://127.0.0.1:8082
Model Store: /.sagemaker/ts/models

@laurenyu Does any update about this issue?

@ldong87
Copy link

ldong87 commented Feb 5, 2021

I have the same problem as @romavlasov when deploying with ml.g4dn.xlarge.

@romavlasov
Copy link

@ldong87 Did you find any workaround?

@ldong87
Copy link

ldong87 commented Mar 26, 2021

@romavlasov it's possible they confuse vCPU and physical CPUs in virtualization. I tried with ml.g4dn.2xlarge and the default number of workers is ok I think.
In the meantime, setting the model_server_workers in endpoint deployment code resolves the problem for my pytorch code in ml.g4dn.xlarge.

@aws aws locked and limited conversation to collaborators May 20, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

8 participants