Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to update existing endpoint with newly trained model #101

Closed
professoroakz opened this issue Mar 19, 2018 · 10 comments
Closed

Unable to update existing endpoint with newly trained model #101

professoroakz opened this issue Mar 19, 2018 · 10 comments

Comments

@professoroakz
Copy link

professoroakz commented Mar 19, 2018

Hello!

I am investigating the Sagemaker API for use in production (without notebooks). I am able to train a model, create an endpoint and delete the endpoint without any problems with the API.

However, in a very common situation where I have a newly trained model on new data, I would like to be able to update/change the model that is currently serving in the specified endpoint and not have to update other services. In production, I would like to update the model serving without any downtime.

Currently when I try to do this operation, simply train a new model and deploy to an endpoint using deploy with:

    def deploy(self):
        self.estimator.deploy(
                initial_instance_count=1000,
                instance_type=ml.c4.xlarge,
                endpoint_name="iris"
            )

I get the following error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateEndpoint operation: Cannot create already existing endpoint "arn:aws:sagemaker:eu-west-1:166488713907:endpoint/iris".

Am I missing something here? Do I have to / can I do this operation manually with the boto3 api instead?

Thank you

@professoroakz professoroakz changed the title Unable to update existing endpoint with new trained model Unable to update existing endpoint with newly trained model Mar 19, 2018
@winstonaws
Copy link
Contributor

Unfortunately, the feature for allowing you to update existing endpoints directly with .deploy is still on our backlog. We'll look again at its prioritization. In the meantime, you can try the workaround described in this issue: #58

@professoroakz
Copy link
Author

professoroakz commented Apr 15, 2018

I've made this work, the implementation is pretty straightforward. I'd contribute if I wasn't busy with building our ML infra around Sagemaker. Here's what I did:

    def deploy(self):
        """ Deploy a new model """
        self.logger.info(
            'Deploying a new model with name %s to new endpoint with name %s' %
            (self.config.endpoint_name, self.config.train_data_location)
        )

        if self.config.update_endpoint:
            self.update()
            return

        try:
            self.estimator.deploy(
                initial_instance_count=self.config.initial_instance_count,
                instance_type=self.config.instance_type,
                endpoint_name=self.config.endpoint_name
            )
        except RuntimeError:
            self.logger.info(
                '%s %s %s' % (
                    'raise RuntimeError: Estimator has not been fit yet,',
                    'AWS Expects to train & deploy in same step.',
                    'Please copy job name from AWS Sagemaker and set in model config.'
                )
            )

    def update(self):
        """ Deploy a new model to existing endpoint """
        try:
            self.create_endpoint_configuration()
        except botocore.exceptions.ClientError:
            pass

        self.update_endpoint()

    def postdeploy(self):
        """ Deploy a trained model, create corresponding endpoint configuration and endpoint """
        self.create_model_from_job()
        self.create_endpoint_configuration()
        self.session.create_endpoint(
            endpoint_name=self.config.endpoint_name,
            config_name=self.endpoint_config_name,
        )

    def create_model_from_job(self):
        """ Create a model from the trained Tensorflow Model """
        self.logger.info(
            'Creating a new model with name %s from training job %s' %
            (self.config.model_name, self.training_job_name)
        )

        self.session.create_model_from_job(
            training_job_name=self.training_job_name,
            name=self.config.model_name,
            role=self.config.role
        )

    def create_endpoint_configuration(self):
        self.logger.info(
            'Creating new endpoint config with name: %s, instance count: %d instance_type: %s' % (
                self.endpoint_name,
                self.config.initial_instance_count,
                self.config.instance_type,
            )
        )

        self.endpoint_config_name = self.session.create_endpoint_config(
            name=self.endpoint_name,
            model_name=self.config.model_name,
            initial_instance_count=self.config.initial_instance_count,
            instance_type=self.config.instance_type
        )

    def update_endpoint(self):
        """ Updates an existing endpoint with EndpointName
            and updates its corresponding Endpoint
            configuration with a new EndpointConfigName
        """
        self.logger.info(
            'Updating endpoint with endpoint name: %s with train job name: %s' %
            (self.config.model_name, self.training_job_name)
        )
        self.client.update_endpoint(
            EndpointName=self.config.endpoint_name, EndpointConfigName=self.endpoint_config_name
        )

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
@ChoiByungWook
Copy link
Contributor

This feature was added within this PR: #606. Updating the endpoint can be done by specifying update_endpoint to be True within the deploy method, usage case can be found here : example

@ygcao
Copy link

ygcao commented Feb 17, 2020

@ChoiByungWook what is the availability impact for the in-place updating? And the example link is looking broken. Thanks!

@itsderek23
Copy link

I'm still seeing this error w/1.55.0 when trying to deploy a PyTorch Model:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateEndpointConfig operation: Cannot create already existing endpoint configuration

Example code:

pytorch_model = PyTorchModel(
        model_data = env.setting('model_data_path'),
        name = env.setting('model_name'),
        framework_version = '1.4.0',
        role = env.setting("aws_role"),
        env = {"DEPLOY_ENV": env.current_env()},
        entry_point = 'deploy/sagemaker/serve.py')

predictor = pytorch_model.deploy(
        instance_type = env.setting('instance_type'),
        update_endpoint = True,
        initial_instance_count = 1)

@laurenyu
Copy link
Contributor

laurenyu commented Apr 1, 2020

can you try specifying endpoint_name to be something else in the deploy call?

@itsderek23
Copy link

Hi @laurenyu - seems like I get the same error including a new endpoint_name in the call:

predictor = pytorch_model.deploy(
        endpoint_name = env.setting('model_name')+"-1",
        instance_type = env.setting('instance_type'),
        update_endpoint = True,
        initial_instance_count = 1)

@laurenyu
Copy link
Contributor

laurenyu commented Apr 1, 2020

could you open a new issue in this repo? (sorry for the inconvenience, but it'll help with our internal tracking and making sure we respond)

@hubtub2
Copy link

hubtub2 commented Sep 2, 2020

Still not working on my side. Same error as the original bug report, even when using update_endpoint = true. Did anyone open a new issue?

@laurenyu
Copy link
Contributor

laurenyu commented Sep 2, 2020

@hubtub2 the behavior around this changed with v2.0+, so it's probably best if you open a new issue and include your specific code and Python SDK version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants