Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to update existing endpoint with newly trained model #101

Closed
OktayGardener opened this issue Mar 19, 2018 · 3 comments

Comments

@OktayGardener
Copy link

commented Mar 19, 2018

Hello!

I am investigating the Sagemaker API for use in production (without notebooks). I am able to train a model, create an endpoint and delete the endpoint without any problems with the API.

However, in a very common situation where I have a newly trained model on new data, I would like to be able to update/change the model that is currently serving in the specified endpoint and not have to update other services. In production, I would like to update the model serving without any downtime.

Currently when I try to do this operation, simply train a new model and deploy to an endpoint using deploy with:

    def deploy(self):
        self.estimator.deploy(
                initial_instance_count=1000,
                instance_type=ml.c4.xlarge,
                endpoint_name="iris"
            )

I get the following error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateEndpoint operation: Cannot create already existing endpoint "arn:aws:sagemaker:eu-west-1:166488713907:endpoint/iris".

Am I missing something here? Do I have to / can I do this operation manually with the boto3 api instead?

Thank you

@OktayGardener OktayGardener changed the title Unable to update existing endpoint with new trained model Unable to update existing endpoint with newly trained model Mar 19, 2018

@winstonaws

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2018

Unfortunately, the feature for allowing you to update existing endpoints directly with .deploy is still on our backlog. We'll look again at its prioritization. In the meantime, you can try the workaround described in this issue: #58

@OktayGardener

This comment has been minimized.

Copy link
Author

commented Apr 15, 2018

I've made this work, the implementation is pretty straightforward. I'd contribute if I wasn't busy with building our ML infra around Sagemaker. Here's what I did:

    def deploy(self):
        """ Deploy a new model """
        self.logger.info(
            'Deploying a new model with name %s to new endpoint with name %s' %
            (self.config.endpoint_name, self.config.train_data_location)
        )

        if self.config.update_endpoint:
            self.update()
            return

        try:
            self.estimator.deploy(
                initial_instance_count=self.config.initial_instance_count,
                instance_type=self.config.instance_type,
                endpoint_name=self.config.endpoint_name
            )
        except RuntimeError:
            self.logger.info(
                '%s %s %s' % (
                    'raise RuntimeError: Estimator has not been fit yet,',
                    'AWS Expects to train & deploy in same step.',
                    'Please copy job name from AWS Sagemaker and set in model config.'
                )
            )

    def update(self):
        """ Deploy a new model to existing endpoint """
        try:
            self.create_endpoint_configuration()
        except botocore.exceptions.ClientError:
            pass

        self.update_endpoint()

    def postdeploy(self):
        """ Deploy a trained model, create corresponding endpoint configuration and endpoint """
        self.create_model_from_job()
        self.create_endpoint_configuration()
        self.session.create_endpoint(
            endpoint_name=self.config.endpoint_name,
            config_name=self.endpoint_config_name,
        )

    def create_model_from_job(self):
        """ Create a model from the trained Tensorflow Model """
        self.logger.info(
            'Creating a new model with name %s from training job %s' %
            (self.config.model_name, self.training_job_name)
        )

        self.session.create_model_from_job(
            training_job_name=self.training_job_name,
            name=self.config.model_name,
            role=self.config.role
        )

    def create_endpoint_configuration(self):
        self.logger.info(
            'Creating new endpoint config with name: %s, instance count: %d instance_type: %s' % (
                self.endpoint_name,
                self.config.initial_instance_count,
                self.config.instance_type,
            )
        )

        self.endpoint_config_name = self.session.create_endpoint_config(
            name=self.endpoint_name,
            model_name=self.config.model_name,
            initial_instance_count=self.config.initial_instance_count,
            instance_type=self.config.instance_type
        )

    def update_endpoint(self):
        """ Updates an existing endpoint with EndpointName
            and updates its corresponding Endpoint
            configuration with a new EndpointConfigName
        """
        self.logger.info(
            'Updating endpoint with endpoint name: %s with train job name: %s' %
            (self.config.model_name, self.training_job_name)
        )
        self.client.update_endpoint(
            EndpointName=self.config.endpoint_name, EndpointConfigName=self.endpoint_config_name
        )

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018

@ChoiByungWook

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2019

This feature was added within this PR: #606. Updating the endpoint can be done by specifying update_endpoint to be True within the deploy method, usage case can be found here : example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.