Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: JS Model with non-TGI/non-DJL deployment failure #4688

Merged
merged 10 commits into from
May 16, 2024

Conversation

makungaj1
Copy link
Collaborator

@makungaj1 makungaj1 commented May 16, 2024

Issue #, if available:

Description of changes:
Fix: JS Model with non-TGI/non-DJL deployment failure

Testing done:
Notebook

model_builder = ModelBuilder(
    model="huggingface-sentencesimilarity-bge-m3",
    schema_builder=SchemaBuilder(sample_input, sample_output),
)

model = model_builder.build()
predictor = model.deploy()
INFO:sagemaker:2024-05-16T06:05:27,302 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 9161
INFO:sagemaker:2024-05-16T06:05:27,302 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1
INFO:sagemaker:2024-05-16T06:05:29,865 [INFO ] pool-2-thread-3 ACCESS_LOG - /169.254.178.2:46500 "GET /ping HTTP/1.1" 200 0
INFO:sagemaker:2024-05-16T06:05:34,865 [INFO ] pool-2-thread-3 ACCESS_LOG - /169.254.178.2:46500 "GET /ping HTTP/1.1" 200 0
INFO:sagemaker:2024-05-16T06:05:39,865 [INFO ] pool-2-thread-3 ACCESS_LOG - /169.254.178.2:46500 "GET /ping HTTP/1.1" 200 0
INFO:sagemaker:2024-05-16T06:05:44,865 [INFO ] pool-2-thread-3 ACCESS_LOG - /169.254.178.2:46500 "GET /ping HTTP/1.1" 200 0
INFO:sagemaker:Created endpoint with name hf-sentencesimilarity-bge-m3-2024-05-16-06-01-26-451
INFO:sagemaker:2024-05-16T06:05:49,865 [INFO ] pool-2-thread-3 ACCESS_LOG - /169.254.178.2:46500 "GET /ping HTTP/1.1" 200 0
ModelBuilder: DEBUG:     ModelBuilder metrics emitted.
CPU times: user 514 ms, sys: 47 ms, total: 561 ms
Wall time: 4min 36s

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • [ x] I have read the CONTRIBUTING doc
  • [x ] I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • [x ] I used the commit message format described in CONTRIBUTING
  • [x ] I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • [x ] I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • [x ] I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • [x ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • [x ] I have checked that my tests are not configured for a specific region or account (if appropriate)
  • [x ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@makungaj1 makungaj1 marked this pull request as ready for review May 16, 2024 17:13
@makungaj1 makungaj1 requested a review from a team as a code owner May 16, 2024 17:13
@makungaj1 makungaj1 requested review from nargokul and removed request for a team May 16, 2024 17:13
@samruds samruds requested review from mufaddal-rohawala and removed request for nargokul May 16, 2024 20:56
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: b345aff
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala mufaddal-rohawala merged commit 06e6f9d into aws:master May 16, 2024
12 checks passed
@makungaj1 makungaj1 deleted the local-mode branch May 16, 2024 22:27
knikure added a commit that referenced this pull request Jun 5, 2024
* fix: mainline alt config parsing (#4602)

* fix: parsing

* fix: commit tests

* fix: types

* updated

* fix

* Add Triton v24.03 URI (#4605)

Co-authored-by: Nikhil Kulkarni <nikhilsk@amazon.com>

* feature: support session tag chaining for training job (#4596)

* feature: support session tag chaining for training job

* fix: resolve typo

* fix: resolve typo and build failure

* fix: resolve typo and unit test failure

---------

Co-authored-by: Jessica Zhu <jessicazhu3@106775307+jessicazhu3@users.noreply.github.com>

* prepare release v2.217.0

* update development version to v2.217.1.dev0

* fix: properly close files in lineage queries and tests (#4587)

Closes #4458

* feature: set default allow_pickle param to False (#4557)

* breaking: set default allow_pickle param to False

* breaking: fix unit tests and linting

NumpyDeserializer will not allow deserialization
unless allow_pickle flag is set to True explicitly

* fix: black-check

---------

Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>

* Fix:invalid component error with new metadata (#4634)

* fix: invalid component name

* tests

* format

* fix vulnerable model integ tests llama 2

* updated

* fix: training dataset location

* prepare release v2.218.0

* update development version to v2.218.1.dev0

* chore: update skipped flaky tests (#4644)

* Update skipped flaky tests

* flake8

* format

* format

* chore: release tgi 2.0.1 (#4642)

* chore: release tgi 2.0.1

* minor fix

---------

Co-authored-by: Zhaoqi <52220743+zhaoqizqwang@users.noreply.github.com>

* fix: Fix UserAgent logging in Python SDK (#4647)

* prepare release v2.218.1

* update development version to v2.218.2.dev0

* feature: allow choosing js payload by alias in private method

* Updates for SMP v2.3.1 (#4660)

Co-authored-by: Suhit Kodgule <skodgule@amazon.com>

* chore(deps): bump jinja2 from 3.1.3 to 3.1.4 in /doc (#4655)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump tqdm from 4.66.2 to 4.66.3 in /tests/data/serve_resources/mlflow/pytorch (#4650)

Bumps [tqdm](https://github.com/tqdm/tqdm) from 4.66.2 to 4.66.3.
- [Release notes](https://github.com/tqdm/tqdm/releases)
- [Commits](tqdm/tqdm@v4.66.2...v4.66.3)

---
updated-dependencies:
- dependency-name: tqdm
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump jinja2 from 3.1.3 to 3.1.4 in /requirements/extras (#4654)

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* prepare release v2.219.0

* update development version to v2.219.1.dev0

* fix: skip flakey tests pending investigation (#4667)

* change: update image_uri_configs  05-09-2024 07:17:41 PST

* Add tensorflow_serving support for mlflow models and enable lineage tracking for mlflow models (#4662)

* Initial commit for tensorflow_serving support of MLflow

* Add integ tests for mlflow tf_serving

* fix style issues

* remove unused attributes from tf builder

* Add deep ping for tf_serving local mode

* Initial commit for lineage impl

* Initial commit for tensorflow_serving support of MLflow

* Add integ tests for mlflow tf_serving

* fix style issues

* remove unused attributes from tf builder

* Add deep ping for tf_serving local mode

* Add integ tests and uts

* fix local mode for tf_serving

* Allow lineage tracking only in sagemaker endpoint mode

* fix regex pattern

* fix style issues

* fix regex pattern and hard coded py version in ut

* fix missing session

* Resolve pr comments and fix regex for mlflow registry and ids

* fix: model builder race condition on sagemaker session (#4673)

Co-authored-by: Jonathan Makunga <makung@amazon.com>

* feat: Add telemetry support for mlflow models (#4674)

* Initial commit for telemetry support

* Fix style issues and add more logger messages

* fix value error messages in ut

* feat: add new images for HF TGI release (#4677)

* chore: add new images for HF TGI release

* test

* feature: AutoGluon 1.1.0 image_uris update (#4679)

Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-154.us-west-2.compute.internal>

* change: add debug logs to workflow container dist creation (#4682)

* prepare release v2.220.0

* update development version to v2.220.1.dev0

* fix: Image URI should take precedence for HF models (#4684)

* Fix: Image URI should take precedence for HF models

* Fix formatting

* Fix formatting

* Fix formatting

* Increase coverage -  UT pass

* feat: onboard tei image config to pysdk (#4681)

* feat: onboard tei image config to pysdk

* fix formatting issue

* minor fix func name

* fix unit tests

---------

Co-authored-by: Mufaddal Rohawala <89424143+mufaddal-rohawala@users.noreply.github.com>

* fix: model builder limited container support for endpoint mode. (#4683)

* Allow ModelBuilder's endpoint mode for Jumpstart models packaged with containers other than TGI and DJL

* increase coverage

* Add JS Support for MMS Serving

* Add JS Support for MMS Serving

* Unit tests

* Refactoring

* Refactoring

* Refactoring

---------

Co-authored-by: Jonathan Makunga <makung@amazon.com>

* change: Add more debuging (#4687)

* change: cover tei with image_uris.retrieve API (#4689)

* fix: JS Model with non-TGI/non-DJL deployment failure (#4688)

* Debug

* Debug

* Debug

* Debug

* Debug

* Debug

* fix docstyle

* Refactoring

* Add Integ tests

---------

Co-authored-by: Jonathan Makunga <makung@amazon.com>

* Feat: Pull latest tei container for sentence similiarity models on HuggingFace hub (#4686)

* Update: Pull latest tei container for sentence similiarity models

* Fix formatting

* Address PR comments

* Fix formatting

* Fix check

* Switch sentence similarity to be deployed on tgi

* Fix formatting

* Fix formatting

* Fix formatting

* Fix formatting

* Introduce TEI builder with TGI server

* Fix formmatting

* Add integ test

* Fix formatting

* Add integ test

* Add integ test

* Add integ test

* Add integ test

* Add integ test

* Fix formatting

* Move to G5 for integ test

* Fix formatting

* Integ test updates

* Integ test updates

* Integ test updates

* Fix formatting

* Integ test updates

* Move back to generate for ping

* Integ test updates

* Integ test updates

* Fix: Add Image URI overrides for transformers models (#4693)

* Fix: Add Image URI overrides for transformers models

* Increase coverage

* Fix formatting

* prepare release v2.221.0

* update development version to v2.221.1.dev0

* Add tei cpu image (#4695)

* Add tei cpu image

* fix format issue

* fix unit tests

* fix typo

* fix typo

* Feat: Add TEI support for ModelBuilder (#4694)

* Add TEI Serving

* Add TEI Serving

* Add TEI Serving

* Add TEI Serving

* Add TEI Serving

* Add TEI Serving

* Notebook testing

* Notebook testing

* Notebook testing

* Refactoring

* Refactoring

* UT

* UT

* Refactoring

* Test coverage

* Refactoring

* Refactoring

---------

Co-authored-by: Jonathan Makunga <makung@amazon.com>

* Convert pytorchddp distribution to smdistributed distribution (#4698)

* rewrite pytorchddp to smdistributed

* remove instance type check

* Update estimator.py

* remove validate_pytorch_distribution

* fix

* fix unit tests

* fix formatting

* check instance type not None

* prepare release v2.221.1

* update development version to v2.221.2.dev0

* Update: SM Endpoint Routing Strategy Support. (#4702)

* RoutingConfig

* Refactoring

* Docstring

* UT

* Refactoring

* Refactoring

---------

Co-authored-by: Jonathan Makunga <makung@amazon.com>

* change: update image_uri_configs  05-29-2024 07:17:35 PST

* Making project name in workflow files dynamic (#4708)

* fix: Fix ci unit-tests (#4713)

* chore(deps): bump requests from 2.31.0 to 2.32.2 in /tests/data/serve_resources/mlflow/pytorch (#4709)

Bumps [requests](https://github.com/psf/requests) from 2.31.0 to 2.32.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](psf/requests@v2.31.0...v2.32.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump apache-airflow from 2.9.0 to 2.9.1 in /requirements/extras (#4703)

* chore(deps): bump apache-airflow in /requirements/extras

Bumps [apache-airflow](https://github.com/apache/airflow) from 2.9.0 to 2.9.1.
- [Release notes](https://github.com/apache/airflow/releases)
- [Changelog](https://github.com/apache/airflow/blob/main/RELEASE_NOTES.rst)
- [Commits](apache/airflow@2.9.0...2.9.1)

---
updated-dependencies:
- dependency-name: apache-airflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update tox.ini to bump apache-airflow

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kalyani Nikure <110067132+knikure@users.noreply.github.com>

* chore(deps): bump mlflow from 2.10.2 to 2.12.1 in /tests/data/serve_resources/mlflow/pytorch (#4690)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.10.2 to 2.12.1.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.10.2...v2.12.1)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump mlflow from 2.11.1 to 2.12.1 in /tests/data/serve_resources/mlflow/xgboost (#4692)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.11.1 to 2.12.1.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.11.1...v2.12.1)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump mlflow from 2.11.1 to 2.12.1 in /tests/data/serve_resources/mlflow/tensorflow (#4691)

Bumps [mlflow](https://github.com/mlflow/mlflow) from 2.11.1 to 2.12.1.
- [Release notes](https://github.com/mlflow/mlflow/releases)
- [Changelog](https://github.com/mlflow/mlflow/blob/master/CHANGELOG.md)
- [Commits](mlflow/mlflow@v2.11.1...v2.12.1)

---
updated-dependencies:
- dependency-name: mlflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* change: Updates for DJL 0.28.0 release (#4701)

* Sync Branch

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Haotian An <33510317+Captainia@users.noreply.github.com>
Co-authored-by: Nikhil Kulkarni <knikhil29@gmail.com>
Co-authored-by: Nikhil Kulkarni <nikhilsk@amazon.com>
Co-authored-by: jessicazhu3 <106775307+jessicazhu3@users.noreply.github.com>
Co-authored-by: Jessica Zhu <jessicazhu3@106775307+jessicazhu3@users.noreply.github.com>
Co-authored-by: ci <ci>
Co-authored-by: Justin <justinm088@hotmail.com>
Co-authored-by: ASHWIN KRISHNA <38850354+akrishna1995@users.noreply.github.com>
Co-authored-by: Ashwin Krishna <ashwikri@amazon.com>
Co-authored-by: Haixin Wang <98612668+haixiw@users.noreply.github.com>
Co-authored-by: Zhaoqi <52220743+zhaoqizqwang@users.noreply.github.com>
Co-authored-by: Kalyani Nikure <110067132+knikure@users.noreply.github.com>
Co-authored-by: Keerthan Vasist <kvasist@amazon.com>
Co-authored-by: SuhitK <kodgule.suhit@gmail.com>
Co-authored-by: Suhit Kodgule <skodgule@amazon.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: sagemaker-bot <sagemaker-bot@amazon.com>
Co-authored-by: jiapinw <95885824+jiapinw@users.noreply.github.com>
Co-authored-by: Jonathan Makunga <makung@amazon.com>
Co-authored-by: Prateek M Desai <prateekmdesai04@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-9-154.us-west-2.compute.internal>
Co-authored-by: Mufaddal Rohawala <89424143+mufaddal-rohawala@users.noreply.github.com>
Co-authored-by: Samrudhi Sharma <154457034+samruds@users.noreply.github.com>
Co-authored-by: Tom Bousso <tombousso@gmail.com>
Co-authored-by: Zhaoqi <jzhaoqwa@amazon.com>
Co-authored-by: Tyler Osterberg <tyoster@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants