feature: Add optional CodeArtifact login to FrameworkProcessing job script #4145

akuma12 · 2023-09-27T21:12:40Z

Issue #, if available:
#4144

Description of changes:
This PR adds an optional codeartifact_repo_arn parameter to the FrameworkProcessor.run() method. Providing this ARN will allow the _generate_framework_script() method to call _get_codeartifact_index(), which will parse the ARN into a CodeArtifact repo URL, retrieve an authentication token, and write an index option into the pip install -r requirements.txt call generated by _generate_framework_script()

If codeartifact_repo_arn is not provided, then _get_codeartifact_index() will not be called and nothing new will be injected into the runproc.sh script.

The _get_codeartifact_index() code is copied from the sagemaker-training-toolkit. All credit to @humanzz for that update.

Testing done:
Validated a PytorchProcessing job both with and without the codeartifact_repo_arn parameter. Downloaded the generated runproc.sh file from S3 and verified that the index option is written to the file if the ARN is provided, and does nothing if it is not.

I could use some advice when it comes to automated testing, however. Since _get_codeartifact_index() interacts with CodeArtifact via Boto3, I was unsure of the best way to handle this. In unit tests, I would typically use moto or patch the Boto3 make_api_call method. With the integration tests, I wasn't sure how I should interact with CodeArtifact, or if I should even add an integration test.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
I used the commit message format described in CONTRIBUTING
I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
I have checked that my tests are not configured for a specific region or account (if appropriate)
I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

akrishna1995

/bot run all

akrishna1995

Please get a review from one member in your team. please follow best practices - add Unit tests , integ tests

sagemaker-bot · 2023-10-02T23:13:31Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 53c46b0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-10-03T00:04:32Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 53c46b0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-10-03T00:09:07Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 53c46b0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-10-03T00:29:13Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 53c46b0
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-10-03T00:47:33Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 53c46b0
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

akuma12 · 2023-10-12T20:20:14Z

@akrishna1995 Can you re-run the tests? I added 3 more unit tests around the code and fixed the issues that popped up in the last run.

akuma12 · 2023-11-21T18:38:52Z

Curious if anyone has had a chance to take a look at this. Would appreciate another review.

mohanasudhan · 2023-12-13T20:24:18Z

@akuma12 Can you rebase your change? I had a brief look and it lgtm. It would be good to get all the test successful.

humanzz · 2024-01-17T10:26:28Z

Hi @mohanasudhan and @akuma12,
I've been keeping an eye on this, and recently within my team, a need has arisen for using processing jobs and we'd really benefit from this PR being merged/released.

I pulled this PR, and ran export IGNORE_COVERAGE=- ; tox -e py38 -- -s -vv tests/unit/test_processing.py::test_pytorch_processor_with_required_parameters ; unset IGNORE_COVERAGE to check what's failing and all tests seem to be passing.

the only thing that happened was that it seems like black wanted to reformat the files and resulted in the following changes

diff --git a/src/sagemaker/processing.py b/src/sagemaker/processing.py
index b4a063ba..59b8c980 100644
--- a/src/sagemaker/processing.py
+++ b/src/sagemaker/processing.py
@@ -1852,7 +1852,7 @@ class FrameworkProcessor(ScriptProcessor):
         # `arn:${Partition}:codeartifact:${Region}:${Account}:repository/${Domain}/${Repository}`
         https://docs.aws.amazon.com/codeartifact/latest/ug/python-configure-pip.html
         https://docs.aws.amazon.com/service-authorization/latest/reference/list_awscodeartifact.html#awscodeartifact-resources-for-iam-policies
-        
+
         Args:
             codeartifact_repo_arn: arn of the codeartifact repository
             codeartifact_client: boto3 client for codeartifact (used for testing)
@@ -1882,9 +1882,13 @@ class FrameworkProcessor(ScriptProcessor):
         )
         try:
             if not codeartifact_client:
-                codeartifact_client = self.sagemaker_session.boto_session.client("codeartifact", region_name=region)
-            
-            auth_token_response = codeartifact_client.get_authorization_token(domain=domain, domainOwner=owner)
+                codeartifact_client = self.sagemaker_session.boto_session.client(
+                    "codeartifact", region_name=region
+                )
+
+            auth_token_response = codeartifact_client.get_authorization_token(
+                domain=domain, domainOwner=owner
+            )
             token = auth_token_response["authorizationToken"]
             endpoint_response = codeartifact_client.get_repository_endpoint(
                 domain=domain, domainOwner=owner, repository=repository, format="pypi"
diff --git a/tests/unit/test_processing.py b/tests/unit/test_processing.py
index fb55e2fe..3663b35b 100644
--- a/tests/unit/test_processing.py
+++ b/tests/unit/test_processing.py
@@ -1107,27 +1107,33 @@ def test_pyspark_processor_configuration_path_pipeline_config(
 
 @patch("sagemaker.workflow.utilities._pipeline_config", MOCKED_PIPELINE_CONFIG)
 def test_get_codeartifact_index(pipeline_session):
-    codeartifact_repo_arn = "arn:aws:codeartifact:us-west-2:012345678901:repository/test-domain/test-repository"
+    codeartifact_repo_arn = (
+        "arn:aws:codeartifact:us-west-2:012345678901:repository/test-domain/test-repository"
+    )
     codeartifact_url = "test-domain-012345678901.d.codeartifact.us-west-2.amazonaws.com/pypi/test-repository/simple/"
 
-    client = boto3.client('codeartifact', region_name=REGION)
+    client = boto3.client("codeartifact", region_name=REGION)
     stubber = Stubber(client)
-        
+
     get_auth_token_response = {
         "authorizationToken": "mocked_token",
-        "expiration": datetime.datetime(2045, 1, 1, 0, 0, 0)
+        "expiration": datetime.datetime(2045, 1, 1, 0, 0, 0),
     }
     auth_token_expected_params = {"domain": "test-domain", "domainOwner": "012345678901"}
-    stubber.add_response("get_authorization_token", get_auth_token_response, auth_token_expected_params)
+    stubber.add_response(
+        "get_authorization_token", get_auth_token_response, auth_token_expected_params
+    )
 
     get_repo_endpoint_response = {"repositoryEndpoint": f"https://{codeartifact_url}"}
     repo_endpoint_expected_params = {
         "domain": "test-domain",
         "domainOwner": "012345678901",
         "repository": "test-repository",
-        "format": "pypi"
+        "format": "pypi",
     }
-    stubber.add_response("get_repository_endpoint", get_repo_endpoint_response, repo_endpoint_expected_params)
+    stubber.add_response(
+        "get_repository_endpoint", get_repo_endpoint_response, repo_endpoint_expected_params
+    )
 
     processor = PyTorchProcessor(
         role=ROLE,
@@ -1139,8 +1145,10 @@ def test_get_codeartifact_index(pipeline_session):
     )
 
     with stubber:
-        codeartifact_index = processor._get_codeartifact_index(codeartifact_repo_arn=codeartifact_repo_arn, codeartifact_client=client)
-    
+        codeartifact_index = processor._get_codeartifact_index(
+            codeartifact_repo_arn=codeartifact_repo_arn, codeartifact_client=client
+        )
+
     assert codeartifact_index == f"https://aws:mocked_token@{codeartifact_url}"
 
 
@@ -1149,24 +1157,28 @@ def test_get_codeartifact_index_bad_repo_arn(pipeline_session):
     codeartifact_repo_arn = "arn:aws:codeartifact:us-west-2:012345678901:repository/test-domain"
     codeartifact_url = "test-domain-012345678901.d.codeartifact.us-west-2.amazonaws.com/pypi/test-repository/simple/"
 
-    client = boto3.client('codeartifact', region_name=REGION)
+    client = boto3.client("codeartifact", region_name=REGION)
     stubber = Stubber(client)
-        
+
     get_auth_token_response = {
         "authorizationToken": "mocked_token",
-        "expiration": datetime.datetime(2045, 1, 1, 0, 0, 0)
+        "expiration": datetime.datetime(2045, 1, 1, 0, 0, 0),
     }
     auth_token_expected_params = {"domain": "test-domain", "domainOwner": "012345678901"}
-    stubber.add_response("get_authorization_token", get_auth_token_response, auth_token_expected_params)
+    stubber.add_response(
+        "get_authorization_token", get_auth_token_response, auth_token_expected_params
+    )
 
     get_repo_endpoint_response = {"repositoryEndpoint": f"https://{codeartifact_url}"}
     repo_endpoint_expected_params = {
         "domain": "test-domain",
         "domainOwner": "012345678901",
         "repository": "test-repository",
-        "format": "pypi"
+        "format": "pypi",
     }
-    stubber.add_response("get_repository_endpoint", get_repo_endpoint_response, repo_endpoint_expected_params)
+    stubber.add_response(
+        "get_repository_endpoint", get_repo_endpoint_response, repo_endpoint_expected_params
+    )
 
     processor = PyTorchProcessor(
         role=ROLE,
@@ -1179,32 +1191,42 @@ def test_get_codeartifact_index_bad_repo_arn(pipeline_session):
 
     with stubber:
         with pytest.raises(ValueError):
-            processor._get_codeartifact_index(codeartifact_repo_arn=codeartifact_repo_arn, codeartifact_client=client)
+            processor._get_codeartifact_index(
+                codeartifact_repo_arn=codeartifact_repo_arn, codeartifact_client=client
+            )
 
 
 @patch("sagemaker.workflow.utilities._pipeline_config", MOCKED_PIPELINE_CONFIG)
 def test_get_codeartifact_index_client_error(pipeline_session):
-    codeartifact_repo_arn = "arn:aws:codeartifact:us-west-2:012345678901:repository/test-domain/test-repository"
+    codeartifact_repo_arn = (
+        "arn:aws:codeartifact:us-west-2:012345678901:repository/test-domain/test-repository"
+    )
     codeartifact_url = "test-domain-012345678901.d.codeartifact.us-west-2.amazonaws.com/pypi/test-repository/simple/"
 
-    client = boto3.client('codeartifact', region_name=REGION)
+    client = boto3.client("codeartifact", region_name=REGION)
     stubber = Stubber(client)
-        
+
     get_auth_token_response = {
         "authorizationToken": "mocked_token",
-        "expiration": datetime.datetime(2045, 1, 1, 0, 0, 0)
+        "expiration": datetime.datetime(2045, 1, 1, 0, 0, 0),
     }
     auth_token_expected_params = {"domain": "test-domain", "domainOwner": "012345678901"}
-    stubber.add_client_error("get_authorization_token", service_error_code="404", expected_params=auth_token_expected_params)
+    stubber.add_client_error(
+        "get_authorization_token",
+        service_error_code="404",
+        expected_params=auth_token_expected_params,
+    )
 
     get_repo_endpoint_response = {"repositoryEndpoint": f"https://{codeartifact_url}"}
     repo_endpoint_expected_params = {
         "domain": "test-domain",
         "domainOwner": "012345678901",
         "repository": "test-repository",
-        "format": "pypi"
+        "format": "pypi",
     }
-    stubber.add_response("get_repository_endpoint", get_repo_endpoint_response, repo_endpoint_expected_params)
+    stubber.add_response(
+        "get_repository_endpoint", get_repo_endpoint_response, repo_endpoint_expected_params
+    )
 
     processor = PyTorchProcessor(
         role=ROLE,
@@ -1217,7 +1239,9 @@ def test_get_codeartifact_index_client_error(pipeline_session):
 
     with stubber:
         with pytest.raises(RuntimeError):
-            processor._get_codeartifact_index(codeartifact_repo_arn=codeartifact_repo_arn, codeartifact_client=client)
+            processor._get_codeartifact_index(
+                codeartifact_repo_arn=codeartifact_repo_arn, codeartifact_client=client
+            )
 
 
 def _get_script_processor(sagemaker_session):

With no access to the build logs, I wonder if that is the only issue?

humanzz · 2024-01-18T11:50:23Z

One more thought, my teammate @Stacy-D, has started looking into using processing jobs, and has been experimenting with a different approach for setting up CodeArtifact.

Rather than uploading a script that has the hardcoded index in pip install -r requirements.txt {index_option}, she's written a script leveraging ** aws cli** (it's an assumption, but we confirmed it on pytorch containers) for configuring pip using

aws codeartifact login --tool pip --repository "$CODEARTIFACT_REPO" --domain "$CODEARTIFACT_DOMAIN"  --domain-owner "${CODEARTIFACT_OWNER}" --region "${CODEARTIFACT_REGION}"

The script then looks something along the lines of

#!/bin/bash

cd /opt/ml/processing/input/code/
tar -xzf sourcedir.tar.gz

# Exit on any error. SageMaker uses error code to mark failed job.
set -e

aws codeartifact login --tool pip --repository "$CODEARTIFACT_REPO" --domain "$CODEARTIFACT_DOMAIN"  --domain-owner "${CODEARTIFACT_OWNER}" --region "${CODEARTIFACT_REGION}"

if [[ -f 'requirements.txt' ]]; then
    # Some py3 containers has typing, which may breaks pip install
    pip uninstall --yes typing

    pip install -r requirements.txt
fi

python "$THE_SCRIPT" "$@"

To make CA optional, the aws codeartifact login would need to be wrapped in an if condition, for the environment variables to be set.

akuma12 · 2024-03-15T17:34:17Z

I had almost forgotten about this. Thank you @humanzz for your feedback. I'll look into that script change and see if I can modify the code to make use of that.

codecov · 2024-03-15T17:51:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.44%. Comparing base (31190c4) to head (cfe8139).

❗ Current head cfe8139 differs from pull request most recent head f9deaa5

Please upload reports for the commit f9deaa5 to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4145      +/-   ##
==========================================
+ Coverage   86.70%   87.44%   +0.74%     
==========================================
  Files         409      389      -20     
  Lines       39067    36904    -2163     
==========================================
- Hits        33872    32272    -1600     
+ Misses       5195     4632     -563

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akuma12 · 2024-03-15T19:47:44Z

@humanzz The process using the AWS CLI is muuuuuch simpler. I don't have to rely on boto3, and I confirmed that the PyTorch training images have the AWS CLI installed. Updated the code and added some additional unit tests.

akuma12 · 2024-03-21T20:11:20Z

@mohanasudhan Looks like all tests and lints are passing now, if you could take one last look.

akuma12 · 2024-04-19T15:57:02Z

@akrishna1995 I'd love to get a final review on this, if possible. Thanks!

…cript

…d to log into CodeArtifact

akuma12 requested a review from a team as a code owner September 27, 2023 21:12

akuma12 requested review from akrishna1995 and removed request for a team September 27, 2023 21:12

akrishna1995 self-assigned this Oct 2, 2023

akrishna1995 reviewed Oct 2, 2023

View reviewed changes

goelakash force-pushed the processing-job-codeartifact-support branch from 53c46b0 to 61190de Compare October 9, 2023 23:48

humanzz mentioned this pull request Oct 12, 2023

Document the support for installing requirements.txt dependencies from CodeArtifact in PyTorch 2.0.1+ SageMaker Containers #4189

Open

akuma12 temporarily deployed to manual-approval April 16, 2024 18:00 — with GitHub Actions Inactive

akuma12 had a problem deploying to manual-approval April 16, 2024 20:27 — with GitHub Actions Error

akuma12 had a problem deploying to manual-approval April 18, 2024 19:27 — with GitHub Actions Error

akuma12 requested a review from akrishna1995 April 19, 2024 15:52

akuma12 had a problem deploying to manual-approval April 19, 2024 15:53 — with GitHub Actions Error

akuma12 had a problem deploying to manual-approval April 22, 2024 14:48 — with GitHub Actions Error

akuma12 had a problem deploying to manual-approval April 24, 2024 16:50 — with GitHub Actions Error

akuma12 added 9 commits April 29, 2024 10:26

feature: Add optional CodeArtifact login to FrameworkProcessing job s…

53a07bd

…cript

Add unit test for _get_codeartifact_index

4969bfb

Fixed docstring

fa37d4c

Convert CodeArtifact integration to simply generate an AWS CLI comman…

fee0a83

…d to log into CodeArtifact

Fix lint issues

5a09716

More lint fixes

56a4ff9

Lint fix

e12eab5

Yet Another Lint Fix

01a82dc

Black fix

d68b19c

akrishna1995 force-pushed the processing-job-codeartifact-support branch from ea0263a to d68b19c Compare April 29, 2024 17:26

akrishna1995 temporarily deployed to manual-approval April 29, 2024 17:27 — with GitHub Actions Inactive

Merge branch 'master' into processing-job-codeartifact-support

6851e77

akuma12 had a problem deploying to manual-approval May 2, 2024 15:42 — with GitHub Actions Error

Merge branch 'master' into processing-job-codeartifact-support

b76427a

akuma12 had a problem deploying to manual-approval May 6, 2024 14:39 — with GitHub Actions Error

Merge branch 'master' into processing-job-codeartifact-support

c8f2852

akuma12 had a problem deploying to manual-approval May 14, 2024 14:33 — with GitHub Actions Error

Merge branch 'master' into processing-job-codeartifact-support

7d78c28

akuma12 had a problem deploying to manual-approval May 23, 2024 14:33 — with GitHub Actions Error

Merge branch 'master' into processing-job-codeartifact-support

9800681

akuma12 had a problem deploying to manual-approval May 29, 2024 16:04 — with GitHub Actions Error

Merge branch 'master' into processing-job-codeartifact-support

b9404d0

akuma12 had a problem deploying to manual-approval June 4, 2024 15:24 — with GitHub Actions Error

Merge branch 'master' into processing-job-codeartifact-support

f9deaa5

akuma12 requested a deployment to manual-approval June 25, 2024 15:19 — with GitHub Actions Waiting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Add optional CodeArtifact login to FrameworkProcessing job script #4145

feature: Add optional CodeArtifact login to FrameworkProcessing job script #4145

akuma12 commented Sep 27, 2023 •

edited

Loading

akrishna1995 left a comment

akrishna1995 left a comment

sagemaker-bot commented Oct 2, 2023

sagemaker-bot commented Oct 3, 2023

sagemaker-bot commented Oct 3, 2023

sagemaker-bot commented Oct 3, 2023

sagemaker-bot commented Oct 3, 2023

akuma12 commented Oct 12, 2023

akuma12 commented Nov 21, 2023

mohanasudhan commented Dec 13, 2023

humanzz commented Jan 17, 2024

humanzz commented Jan 18, 2024

akuma12 commented Mar 15, 2024

codecov bot commented Mar 15, 2024 •

edited

Loading

akuma12 commented Mar 15, 2024

akuma12 commented Mar 21, 2024

akuma12 commented Apr 19, 2024

feature: Add optional CodeArtifact login to FrameworkProcessing job script #4145

Are you sure you want to change the base?

feature: Add optional CodeArtifact login to FrameworkProcessing job script #4145

Conversation

akuma12 commented Sep 27, 2023 • edited Loading

Merge Checklist

General

Tests

akrishna1995 left a comment

Choose a reason for hiding this comment

akrishna1995 left a comment

Choose a reason for hiding this comment

sagemaker-bot commented Oct 2, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Oct 3, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Oct 3, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Oct 3, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Oct 3, 2023

AWS CodeBuild CI Report

akuma12 commented Oct 12, 2023

akuma12 commented Nov 21, 2023

mohanasudhan commented Dec 13, 2023

humanzz commented Jan 17, 2024

humanzz commented Jan 18, 2024

akuma12 commented Mar 15, 2024

codecov bot commented Mar 15, 2024 • edited Loading

Codecov Report

akuma12 commented Mar 15, 2024

akuma12 commented Mar 21, 2024

akuma12 commented Apr 19, 2024

akuma12 commented Sep 27, 2023 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading