Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add transformers 4.29 - PyTorch 2.0.0 training and inference #3904

Closed
wants to merge 3 commits into from

Conversation

phall1
Copy link

@phall1 phall1 commented Jun 6, 2023

Issue #, if available: #3896

Description of changes:
Update Image URI for:

Transformers 4.29 - PyTorch 2.0 Inference DLC
Transformers 4.29 - PyTorch 2.0 Training DLC

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@phall1 phall1 requested a review from a team as a code owner June 6, 2023 02:01
@phall1 phall1 requested review from knikure and removed request for a team June 6, 2023 02:01
@phall1 phall1 changed the title Add transformers 4.29 - PyTorch 2.0.0 training and inference feat: Add transformers 4.29 - PyTorch 2.0.0 training and inference Jun 6, 2023
@knikure knikure linked an issue Jun 6, 2023 that may be closed by this pull request
@knikure knikure self-assigned this Jun 6, 2023
Copy link
Contributor

@knikure knikure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 9be32b3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 9be32b3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 9be32b3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 9be32b3
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 9be32b3
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@codecov-commenter
Copy link

codecov-commenter commented Jun 6, 2023

Codecov Report

Attention: 32 lines in your changes are missing coverage. Please review.

Comparison is base (3057f21) 90.27% compared to head (a494586) 90.23%.
Report is 64 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3904      +/-   ##
==========================================
- Coverage   90.27%   90.23%   -0.04%     
==========================================
  Files         993      316     -677     
  Lines       87861    29282   -58579     
==========================================
- Hits        79313    26424   -52889     
+ Misses       8548     2858    -5690     
Files Coverage Δ
src/sagemaker/base_predictor.py 90.90% <100.00%> (ø)
src/sagemaker/config/config_schema.py 100.00% <100.00%> (ø)
src/sagemaker/estimator.py 92.88% <ø> (ø)
src/sagemaker/feature_store/feature_group.py 78.82% <ø> (ø)
...emaker/feature_store/feature_processor/__init__.py 100.00% <ø> (ø)
...eature_store/feature_processor/_config_uploader.py 97.43% <100.00%> (ø)
...er/feature_store/feature_processor/_data_source.py 100.00% <100.00%> (ø)
...ore/feature_processor/_feature_processor_config.py 100.00% <100.00%> (ø)
.../feature_store/feature_processor/_params_loader.py 100.00% <ø> (ø)
...ker/feature_store/feature_processor/_validation.py 84.05% <100.00%> (ø)
... and 32 more

... and 1267 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

phall1 added a commit to phall1/sagemaker-python-sdk that referenced this pull request Jun 6, 2023
@phall1
Copy link
Author

phall1 commented Jun 6, 2023

Updated python version for torch 2.0 compatibility per the failed tests in the failed builds

knikure pushed a commit to phall1/sagemaker-python-sdk that referenced this pull request Jun 7, 2023
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: a84574b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a84574b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: a84574b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: a84574b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@phall1
Copy link
Author

phall1 commented Jun 7, 2023

Looks like the image isn't available yet, per here:

https://github.com/aws/deep-learning-containers/blob/master/available_images.md

Do we know if this is being worked on or how I could contribute to getting this image available?
Error from logs:

sagemaker.exceptions.UnexpectedStatusException: Error for Processing job test-huggingface-2023-06-07-04-47-23-957: Failed. Reason: ClientError: API error (404): manifest for 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.29.0-gpu-py310-cu118-ubuntu20.04 not found: manifest unknown: Requested image not found

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a84574b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Member

@mufaddal-rohawala mufaddal-rohawala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: aeb5cc3
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: aeb5cc3
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: aeb5cc3
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@knikure knikure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 2b1464f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 2b1464f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 2b1464f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 2b1464f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 2b1464f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@phall1
Copy link
Author

phall1 commented Jun 16, 2023

Looks like the image isn't available yet, per here:

https://github.com/aws/deep-learning-containers/blob/master/available_images.md

Do we know if this is being worked on or how I could contribute to getting this image available?

Error from logs:


sagemaker.exceptions.UnexpectedStatusException: Error for Processing job test-huggingface-2023-06-07-04-47-23-957: Failed. Reason: ClientError: API error (404): manifest for 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.29.0-gpu-py310-cu118-ubuntu20.04 not found: manifest unknown: Requested image not found

Same error as before - bumping my question l

@knikure
Copy link
Contributor

knikure commented Jun 16, 2023

Looks like the image isn't available yet, per here:
aws/deep-learning-containers@master/available_images.md
Do we know if this is being worked on or how I could contribute to getting this image available?
Error from logs:


sagemaker.exceptions.UnexpectedStatusException: Error for Processing job test-huggingface-2023-06-07-04-47-23-957: Failed. Reason: ClientError: API error (404): manifest for 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.29.0-gpu-py310-cu118-ubuntu20.04 not found: manifest unknown: Requested image not found

Same error as before - bumping my question l

We will have to check with DLC team to verify on the image availability.
@ShiboXing Can you help answer this question?

@knikure
Copy link
Contributor

knikure commented Aug 14, 2023

@phall1 Can you check with someone in DLC if transformers 4.29 is supported with PT 2.0. If not, can we close this PR?

@oleksiishevchuk
Copy link

It would be great to get some feedback about this, as we would like to add support for further transformers version (e.g. #4076)

Copy link
Member

@mufaddal-rohawala mufaddal-rohawala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a494586
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: a494586
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: a494586
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: a494586
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a494586
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@oleksiishevchuk
Copy link

Hi @knikure, seems like the same issue with not available image still exists. Do you have any suggestions how to get feedback from the DLC team (@phall1 @ShiboXing) or if we can somehow contribute to help to resolve it?

@knikure
Copy link
Contributor

knikure commented Nov 2, 2023

Hi @knikure, seems like the same issue with not available image still exists. Do you have any suggestions how to get feedback from the DLC team (@phall1 @ShiboXing) or if we can somehow contribute to help to resolve it?

@oleksiishevchuk It would be better to always take a look at what DLC has already released for training and inference and raise issues like #3896 with DLC to make them cater to these requests. PySDK is downstream and cannot help release these images.

@knikure
Copy link
Contributor

knikure commented Mar 8, 2024

Closing this PR as we are tracking this as a feature request.

@knikure knikure closed this Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Development

Successfully merging this pull request may close these issues.

Add transformers v4.29.0 compatibility to sagemaker.huggingface
7 participants