Correct use of model_server_workers #1275

anotinelg · 2020-01-30T11:56:15Z

Describe the bug
The documentation states that when i deploy a model with model_server_workers = None,

model_server_workers (int) – Optional. The number of worker processes used by the inference server. If None, server will use one worker per vCPU.

However what i found is when i deploy my model in a ml.c5.2xlarge (8 vCPU, one CPU i Guess), it only uses 1 worker (show logs below)

if i pass the parameters into the deploy function, it correctly set the Default workers per model to the number i have specified through the model_server_workers parameter.
As a conclusion, the documentation is not updated, or the behaviour when model_server_workers = NOne does not work.

To reproduce
Deploy any model on a ml.c5.2xlarge, check the log and the entry Default workers per model

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots or logs

This is an extract of the log from the endpoint:

**Number of CPUs: 1**
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
**Default workers per model: 1**

System information
A description of your system. Please provide:

SageMaker Python SDK version: '1.42.1'
Framework name (eg. PyTorch) or algorithm (eg. KMeans): custom script on MXNET 1.4.1
Framework version: 1.4.1
Python version: 3
CPU or GPU: CPU
Custom Docker image (Y/N): AWS docker

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

knakad · 2020-01-30T21:50:40Z

Hi Antoine,

Thanks for reaching out!

I couldn't reproduce it with an arbitrary estimator. Would you please provide the exact sagemaker-python-sdk code that you used to run into this?

anotinelg · 2020-02-12T10:40:23Z

hi @knakad,
I am using MXNetModel object to deploy, in a standard way. i think it does not depend on the script, any dummy one will do the job.

version of sagemaker package = 1.42.1

My code looks like that:

from sagemaker.mxnet.model import MXNetModel
args ={'model_data': 's3://../model.tar.gz',
           'name': 'mymodel',
           'role': 'arn:aws:iam::...',
           'entry_point': '.anyscript_will_do_the_job',
           'dependencies': [],
            'framework_version': '1.4.1',
              'py_version': 'py3',
         'code_location': 's3://..',
         'vpc_config': None,
        'sagemaker_session': None, 
        'model_server_workers': None}

net = MXNetModel(**args)
net.deploy(instance_type='ml.c5.xlarge', initial_instance_count=1)

for more info, i print the result of the function MXNetModel.prepare_container_def:

{'Image': 'CCCCCCCC.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3', 'Environment': {'SAGEMAKER_PROGRAM': 'inference_script.py', 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://.../model.tar.gz', 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'SAGEMAKER_REGION': 'us-east-1'}, 'ModelDataUrl': 's3://.../model.tar.gz'}

As expected there is mention to SAGEMAKER_MODEL_SERVER_WORKERS variable, because the model_server_workers parameter is None. i think this is the correct behaviour. My guess is that the problem is in the code that is loaded in the inference instance (sagemaker_inference_toolkit or sagemaker-mxnet-serving-container) that does not handle correctly the case when SAGEMAKER_MODEL_SERVER_WORKERS is None.

I have printed the Environment variables from the inference instance

2020-02-11 10:34:20,782 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - environ({'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI': '/v2/credentials/301670ea-2a02-4517-a501-3dd3b5c6a4c4', 'SAGEMAKER_ENABLE_CLOUDWATCH_METRICS': 'false', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'AWS_DEFAULT_REGION': 'us-east-1', 'AWS_EXECUTION_ENV': 'AWS_ECS_EC2', 'ECS_CONTAINER_METADATA_URI': 'http://169.254.170.2/v3/0f7fade6-e2c8-42ef-a2cb-144678ab33f3', 'MXNET_KVSTORE_REDUCTION_NTHREADS': '1', 'LANG': 'C.UTF-8', 'SAGEMAKER_CONTAINER_LOG_LEVEL': '20', 'LD_LIBRARY_PATH': ':/usr/local/lib', 'SAGEMAKER_SUBMIT_DIRECTORY': 's3://letgo-data-science-data/sagemaker/listingclass/models/dev-listingclass-v100-0-0-2020-02-11-11-29-28/model.tar.gz', 'PYTHONPATH': '/opt/ml/model/code::/.sagemaker/mms/models/model', '__KMP_REGISTERED_LIB_1': '0x7fddcff64a08-cafe1df8-libiomp5.so', 'AWS_REGION': 'us-east-1', 'PYTHONIOENCODING': 'UTF-8', 'SAGEMAKER_REGION': 'us-east-1', 'OMP_NUM_THREADS': '1', 'PYTHONDONTWRITEBYTECODE': '1', 'MXNET_CPU_PRIORITY_NTHREADS': '1', 'MXNET_CPU_WORKER_NTHREADS': '1', 'TEMP': '/home/model-server/tmp', 'PYTHONUNBUFFERED': '1', 'SAGEMAKER_SAFE_PORT_RANGE': '25000-25999', 'HOSTNAME': 'model.aws.local', 'LC_ALL': 'C.UTF-8', 'HOME': '/root', 'SAGEMAKER_PROGRAM': 'inference_script.py', 'MMS_DECODE_INPUT_REQUEST': 'false', 'MXNET_USE_OPERATOR_TUNING': '0'})

laurenyu · 2020-02-14T00:33:17Z

I tried to reproduce this, and got this in my logs:

2020-02-14 00:05:38,498 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 4
Max heap size: 2766 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 4
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

"Default workers per model: 4" does match the number of CPUs.

edit: did some more digging - the inference toolkit does not have a defined default for the number of workers, and the underlying model server's default is the number of CPUs, so from a code standpoint, things look as though they should align with the documentation as well.

anotinelg · 2020-02-14T11:29:33Z

That is strange!!
What version of SDK sagemaker are you using? the 1.42.1 ? Can you i test your code from my side?

anotinelg · 2020-02-14T14:11:35Z

i have deployed another time with:

image 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3
ml.c5.2xlarge
instance = 1
Single model type

why do my logs shows a number of CPUS equal to 1? i have seen that in your log you have Number of CPUs: 4

2020-02-14 11:36:09,267 [INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

According https://aws.amazon.com/ec2/physicalcores/, c5.2xlarge has 4 Physical Core Count. is C5.2xlarge the same as ml.c5.2xlarge??

anotinelg · 2020-02-14T15:03:52Z

actually when i print the number of CPU from my script deployed i have different results:

logging.warning(f"CPU COUNT: {multiprocessing.cpu_count()}")
logging.warning(f"CPU COUNT (os): {os.cpu_count()}")

2020-02-14 14:50:40,497 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - CPU COUNT: 8
2020-02-14 14:50:40,497 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - CPU COUNT (os): 8

laurenyu · 2020-02-14T23:03:40Z

In my previous response, I had been running batch transform jobs (I think with ml.m4.xlarge instances) because I happened to have those handy. I tried again, this time modifying this notebook for a basic endpoint deployment, and saw what you got:

[INFO ] main com.amazonaws.ml.mms.ModelServer - 
MMS Home: /usr/local/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 0
Number of CPUs: 1
Max heap size: 3739 M
Python executable: /usr/local/bin/python3.6
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://127.0.0.1:8081
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500

I tried this with an ml.c5.2xlarge, ml.c5.xlarge, and ml.m4.2xlarge.

Since I replicated the issue you're encountering through deploying to an endpoint but not when using batch transform, I'm going to reach out to the team that owns SageMaker Hosting and see if they have any insight.

anotinelg · 2020-02-17T09:46:30Z

HI @laurenyu, At least you have been able to reproduce it! ;-)
Will you update the issue in that ticket or on another one created in https://github.com/aws/sagemaker-mxnet-serving-container?

Small question, do you know what image are you using with the batch transform? is is the same as 763104351884.dkr.ecr.us-east-1.amazonaws.com/mxnet-inference:1.4.1-cpu-py3?

laurenyu · 2020-02-17T17:37:23Z

I've passed along this issue link, so we'll keep updates here for now.

For batch transform, I was using the same image, which is why I wonder if there is something happening with the hosting platform rather than the MXNet serving image itself.

muhyun · 2020-03-30T16:58:08Z

I ran a simple java code on the sagemaker inference instance to get the number of CPUs;

Runtime.getRuntime().availableProcessors());

, and it returns "1". This should be investigated from sagemaker side.

dotgc · 2020-05-14T17:08:44Z

This happens because Runtime.getRuntime().availableProcessors()); returns 1 in a docker environment by default

More details:

romavlasov · 2021-01-13T16:54:42Z

I have the same problem deploying models via aws cdk.

Image: "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.6.0-gpu-py3"
Instance: "ml.g4dn.xlarge" (number of vCPU: 4)

Log:

main org.pytorch.serve.ModelServer
Torchserve version: 0.2.1
TS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of GPUs: 1
Number of CPUs: 1
Max heap size: 3806 M
Python executable: /opt/conda/bin/python
Config file: /etc/sagemaker-ts.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Metrics address: http://127.0.0.1:8082
Model Store: /.sagemaker/ts/models

@laurenyu Does any update about this issue?

ldong87 · 2021-02-05T04:55:02Z

I have the same problem as @romavlasov when deploying with ml.g4dn.xlarge.

romavlasov · 2021-03-23T09:52:13Z

@ldong87 Did you find any workaround?

ldong87 · 2021-03-26T15:54:58Z

@romavlasov it's possible they confuse vCPU and physical CPUs in virtualization. I tried with ml.g4dn.2xlarge and the default number of workers is ok I think.
In the meantime, setting the model_server_workers in endpoint deployment code resolves the problem for my pytorch code in ml.g4dn.xlarge.

anotinelg added the bug label Jan 30, 2020

laurenyu added type: question and removed bug labels Feb 11, 2020

laurenyu added the Pending information label Apr 22, 2020

laurenyu added the NON-PY-SDK label May 8, 2020

amaharek mentioned this issue Mar 15, 2021

JVM detect the CPU count as 1 when more CPUs are available for the container. aws/sagemaker-inference-toolkit#82

Closed

bveeramani closed this as completed May 20, 2021

aws locked and limited conversation to collaborators May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Correct use of model_server_workers #1275

Correct use of model_server_workers #1275

anotinelg commented Jan 30, 2020

knakad commented Jan 30, 2020

anotinelg commented Feb 12, 2020

laurenyu commented Feb 14, 2020 •

edited

Loading

anotinelg commented Feb 14, 2020 •

edited

Loading

anotinelg commented Feb 14, 2020

anotinelg commented Feb 14, 2020

laurenyu commented Feb 14, 2020

anotinelg commented Feb 17, 2020

laurenyu commented Feb 17, 2020

muhyun commented Mar 30, 2020

dotgc commented May 14, 2020

romavlasov commented Jan 13, 2021 •

edited

Loading

ldong87 commented Feb 5, 2021

romavlasov commented Mar 23, 2021

ldong87 commented Mar 26, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

Correct use of model_server_workers #1275

Correct use of model_server_workers #1275

Comments

anotinelg commented Jan 30, 2020

knakad commented Jan 30, 2020

anotinelg commented Feb 12, 2020

laurenyu commented Feb 14, 2020 • edited Loading

anotinelg commented Feb 14, 2020 • edited Loading

anotinelg commented Feb 14, 2020

anotinelg commented Feb 14, 2020

laurenyu commented Feb 14, 2020

anotinelg commented Feb 17, 2020

laurenyu commented Feb 17, 2020

muhyun commented Mar 30, 2020

dotgc commented May 14, 2020

romavlasov commented Jan 13, 2021 • edited Loading

ldong87 commented Feb 5, 2021

romavlasov commented Mar 23, 2021

ldong87 commented Mar 26, 2021

This issue was moved to a discussion.

laurenyu commented Feb 14, 2020 •

edited

Loading

anotinelg commented Feb 14, 2020 •

edited

Loading

romavlasov commented Jan 13, 2021 •

edited

Loading