-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Describe the bug
I created a pipeline with only one ProcessingStep
based on FrameworkProcessor
. When I upsert()
and start()
the pipeline from a notebook, it runs well.
The code is in a Sagemaker project, generated using an AWS-provided template. When I push a change to Codecommit, the processing job fails because of a missing package. Although the package is in source_dir/requirements.txt, it's not installed.
Traceback (most recent call last): File "preprocess.py", line 7, in <module> import sagemaker
--
ModuleNotFoundError: No module named 'sagemaker'
To reproduce
def get_pipeline(
region,
sagemaker_project_arn=None,
role=None,
default_bucket=None,
model_package_group_name="MstfPackageGroup",
pipeline_name="MstfPipeline",
base_job_prefix="Mstf",
feature_group_name="ninja-x5-y3-feature-group-02-14-30-03",
from_date="2021-05-13",
to_date="2022-05-13"
):
sagemaker_session = get_session(region, default_bucket)
s3_default_bucket = sagemaker_session.default_bucket()
if role is None:
role = sagemaker.session.get_execution_role(sagemaker_session)
# parameters for pipeline execution
processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
processing_instance_type = ParameterString(
name="ProcessingInstanceType", default_value="ml.m5.xlarge"
)
training_instance_type = ParameterString(
name="TrainingInstanceType", default_value="ml.m5.xlarge"
)
model_approval_status = ParameterString(
name="ModelApprovalStatus", default_value="PendingManualApproval"
)
feature_group_name_input = ParameterString(
name="InputFeatureGroupName",
default_value=feature_group_name
)
input_s3_bucket = ParameterString(
name="DefaultS3Bucket",
default_value=f"s3://{s3_default_bucket}/{base_job_prefix}"
)
input_from_date = ParameterString(
name="DataFromDate",
default_value=from_date
)
input_to_date = ParameterString(
name="DataToDate",
default_value=to_date
)
est_cls = sagemaker.sklearn.estimator.SKLearn
framework_version_str="0.23-1"
script_processor = FrameworkProcessor(
estimator_cls=est_cls,
framework_version=framework_version_str,
role=role,
instance_count=1,
instance_type="ml.m5.xlarge",
sagemaker_session=sagemaker_session
)
processor_run_args = script_processor.get_run_args(
code="preprocess.py",
source_dir=os.path.join(BASE_DIR, "preprocessing"),
inputs=[],
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
],
arguments=[
"--feature_group_name", feature_group_name_input,
"--input_s3_bucket", input_s3_bucket,
"--from_date", input_from_date,
"--to_date", input_to_date],
)
step_process = ProcessingStep(
name="PreprocessMstfData",
processor=script_processor,
inputs=processor_run_args.inputs,
outputs=processor_run_args.outputs,
job_arguments=processor_run_args.arguments,
code=processor_run_args.code,
)
# pipeline instance
pipeline = Pipeline(
name=pipeline_name,
parameters=[
processing_instance_type,
processing_instance_count,
training_instance_type,
model_approval_status,
feature_group_name_input,
input_s3_bucket,
input_from_date,
input_to_date
],
steps=[step_process],
sagemaker_session=sagemaker_session,
)
return pipeline
Expected behavior
Install sagemaker package because it's in requirements.txt and run preprocess.py successfully.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.86.2
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): SKLearn
- Framework version: 0.23-1
- Python version: 3.7
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Add any other context about the problem here.