-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
I am trying to deploy a serverless endpoint on sagemaker with inference script and requirements.txt
`
from sagemaker.tensorflow import TensorFlowModel
from sagemaker import get_execution_role
from sagemaker import Session
import boto3
from sagemaker.serverless import ServerlessInferenceConfig
print('starting ...')
model_data = "s3://datascience--sagemaker/model_repository/project_name/model.tar.gz"
role = get_execution_role()
sess = Session()
bucket = sess.default_bucket()
region = boto3.Session().region_name
tf_framework_version = '2.0.0'
sm_model = TensorFlowModel(
model_data = model_data,
framework_version = tf_framework_version,
role=role,
container_log_level=10,
source_dir='code',
entry_point='inference.py'
)
predictor = sm_model.deploy(
endpoint_name = 'classifier-serverless',
serverless_inference_config = ServerlessInferenceConfig(
memory_size_in_mb= 2048,
max_concurrency= 1,
)
)
`
Inference script and requirements are placed at the correct place.
While deploying the model, package is downloaded but not installed due to error:
2023-12-06 05:20:37.931396: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle. Collecting numpy Downloading https://files.pythonhosted.org/packages/14/32/d3fa649ad7ec0b82737b92fefd3c4dd376b0bb23730715124569f38f3a08/numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8MB) Collecting Pillow Downloading https://files.pythonhosted.org/packages/ea/0f/2fa195c2d8c6fe0b3dc2df5fc6ac6b8dbd005ea30aaa0fa43eca88b8c664/Pillow-8.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1MB) Installing collected packages: numpy, Pillow ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/dist-packages/numpy' Consider using the --user option or check the permissions. WARNING: You are using pip version 19.3.1; however, version 21.3.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. ERROR:__main__:failed to install required packages, exiting. INFO:__main__:stopping services Traceback (most recent call last): File "/sagemaker/serve.py", line 212, in _setup_gunicorn subprocess.check_call(pip_install_cmd.split()) File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['pip3', 'install', '-r', '/opt/ml/model/code/requirements.txt']' returned non-zero exit status 1. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/sagemaker/serve.py", line 388, in <module> ServiceManager().start() File "/sagemaker/serve.py", line 355, in start self._setup_gunicorn() File "/sagemaker/serve.py", line 215, in _setup_gunicorn self._stop() File "/sagemaker/serve.py", line 303, in _stop os.kill(self._nginx.pid, signal.SIGQUIT) AttributeError: 'NoneType' object has no attribute 'pid' Traceback (most recent call last): File "/usr/local/bin/dockerd-entrypoint.py", line 24, in <module> subprocess.check_call(shlex.split(' '.join(sys.argv[1:]))) File "/usr/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['serve']' returned non-zero exit status 1.
On the other hand, if I try to deploy the same model in real-time endpoint, it is getting deployed successfully without any errors.