Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SageMaker Processing Step not working with ScriptProcessor() #4471

Closed
mccartni-aws opened this issue Mar 4, 2024 · 1 comment
Closed

SageMaker Processing Step not working with ScriptProcessor() #4471

mccartni-aws opened this issue Mar 4, 2024 · 1 comment
Labels
bug component: pipelines Relates to the SageMaker Pipeline Platform

Comments

@mccartni-aws
Copy link

mccartni-aws commented Mar 4, 2024

Describe the bug
The error message is ValueError: either step_args or processor need to be given, but not both. This indicates that within the ProcessingStep of your SageMaker pipeline, you are not supposed to pass both the processor (which you've initialized as script_processor) and the step_args (which contains the run method from your script_processor) to the ProcessingStep constructor. Only one of these should be provided to define the processing step.

To reproduce

To reproduce this bug, you would create a SageMaker ScriptProcessor and use it to run a script with some arguments. Then, create a ProcessingStep passing in both step_args generated from the script_processor.run() method and the processor itself. The following pseudo-code demonstrates the essence of the steps that would lead to such an error:

script_processor = ScriptProcessor(...)
step_args = script_processor.run(code="script.py", arguments=[...])

finetune_step = ProcessingStep(
    name="FinetuneStep",
    processor=script_processor,  # This should not be provided along with step_args 
    step_args=step_args
)

To fix the issue, one should in theory remove the direct reference to the processor when you are providing step_args since step_args already contains the execution context needed for the processing job, however I am not including the processor. See code snippet for my implementation.

Expected behavior
The ProcesingStep should be created without fail, as only the step_args at explicilty passed into the ProcessingStep(). When the code runs correctly, you would expect the ProcessingStep to be created without raising a ValueError. The processing step would then use the provided arguments to execute the specified Python script (finetune.py) within the configured SageMaker processing environment. The job would process input data as specified, perform the finetuning operation, and output the results to the defined output path.

Screenshots or logs
Screenshot 2024-03-04 at 15 38 45
Screenshot 2024-03-04 at 15 38 53

System information
A description of your system. Please provide:

  • SageMaker Python SDK version:
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans):
  • Framework version:
  • Python version:
  • CPU or GPU:
  • Custom Docker image (Y/N):

Additional context
Add any other context about the problem here.

@knikure knikure added the component: pipelines Relates to the SageMaker Pipeline Platform label Mar 4, 2024
@mccartni-aws
Copy link
Author

This has been resolved. I was not passing in a pipeline session as the sagmaker session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component: pipelines Relates to the SageMaker Pipeline Platform
Projects
None yet
Development

No branches or pull requests

2 participants