-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
component: processingRelates to the SageMaker Processing PlatformRelates to the SageMaker Processing Platformtype: bug
Description
Describe the bug
When running sagemaker pipeline with VPC network config, it throws the following error:
ClientError: ClientError: expected DHCP options to include keys domain-name-servers and domain-name, but missing one or more attributes: {DhcpOptionsId:dopt-XXXXXXXXXXXXXXXXX DomainNameServers:[0xc009bd0e08] DomainNameSearch:<nil>}.Please refer to https://docs.aws.amazon.com/vpc/latest/userguide/DHCPOptionSet.html for more details
It probably wants the DHCP option set to have both domain-name and domain-name-servers, but ours doesn't have domain-name as mentioned below.
Our DHCP options set:
DHCP option set ID
dopt-XXXXXXXXXXXXXXXXX
NetBIOS name servers
–
Domain name
–
NetBIOS node type
–
Domain name servers
XX.X.X.X
Owner
XXXXXXXXXXXX
NTP servers
–
Reported it as a bug because Sagemaker studio plus numerous other AWS services runs fine in same VPC.
To reproduce
-
Create a sagemaker pipeline, and in the first processing step of the pipeline add the network config
-
NetworkConfig instance:
network_config = sagemaker.network.NetworkConfig(
security_group_ids=["sg-XXXXXXXXXXXXXXXXX"],
subnets=["subnet-XXXXXXXXXXXXXXXXX"],
)
- Add this to the processing step:
processing_instance_count = ParameterInteger(
name="ProcessingInstanceCount", default_value=1
)
processing_instance_type = ParameterString(
name="ProcessingInstanceType", default_value="ml.t3.medium"
)
training_instance_type = ParameterString(
name="TrainingInstanceType", default_value="ml.t3.medium"
)
param_postgres_db = ParameterString(
name="PostgreSqlDBName",
)
param_clickhouse_db = ParameterString(
name="ClickHouseDBName",
)
param_event_id = ParameterString(
name="EventId",
)
param_role_arn = ParameterString(name="SagemakerExecutionRoleARN")
pipeline_session = PipelineSession()
cache_config = CacheConfig(enable_caching=True, expire_after="PT12H")
BASE_DIR = os.path.dirname(os.path.realpath(__file__))
default_bucket = <ADD_BUCKET>
# Step 1 - query and build dataset
dataset_builder_processor = ScriptProcessor(
role=param_role_arn,
image_uri=custom_image_uri,
command=["python"],
instance_count=processing_instance_count,
instance_type=processing_instance_type,
volume_size_in_gb=20,
max_runtime_in_seconds=7200,
base_job_name="Dataset-Builder",
sagemaker_session=pipeline_session,
network_config=network_config,
)
- Create the processing step:
dataset_process = ProcessingStep(
name="Query-Data-and-Build-Features",
processor=dataset_builder_processor,
inputs=[
ProcessingInput(
source=(
f"s3://{default_bucket}/"
"Feature-Groups-Builder-Pipeline/src_module/"
),
destination="/opt/ml/processing/input/code/src/",
input_name="src_module",
)
],
outputs=[
ProcessingOutput(
output_name="visitor_features_df",
source="/opt/ml/processing/output/visitor_features_df",
),
ProcessingOutput(
output_name="exhibitor_features_df",
source="/opt/ml/processing/output/exhibitor_features_df",
),
ProcessingOutput(
output_name="visitor_embeddings_df",
source="/opt/ml/processing/output/visitor_embeddings_df",
),
ProcessingOutput(
output_name="exhibitor_embeddings_df",
source="/opt/ml/processing/output/exhibitor_embeddings_df",
),
ProcessingOutput(
output_name="vis_exh_interactions_df",
source="/opt/ml/processing/output/vis_exh_interactions_df",
),
ProcessingOutput(
output_name="idx_mapper",
source="/opt/ml/processing/output/idx_mapper",
),
],
code=os.path.join(BASE_DIR, "dataset_builder.py"),
job_arguments=[
"--postgres_db",
param_postgres_db,
"--clickhouse_db",
param_clickhouse_db,
"--event_id",
param_event_id.to_string(),
"--role_arn",
param_role_arn,
],
cache_config=cache_config,
)
- Create pipeline instance:
# pipeline instance
pipeline = Pipeline(
name=pipeline_name,
parameters=[
processing_instance_type,
processing_instance_count,
training_instance_type,
param_postgres_db,
param_clickhouse_db,
param_event_id,
param_role_arn,
],
steps=[dataset_process], # There are also other steps but execution never reaches there
sagemaker_session=pipeline_session,
)
- Upser and run pipeline:
upsert_response = pipeline.upsert(
role_arn=<SAGEMAKER_PIPELINE_EXECUTION_ROLE>,
description="Build number: 181",
)
execution = pipeline.start(
PostgreSqlDBName="postgres_db_name",
ClickHouseDBName="ch_db_name",
EventId="99",
SagemakerExecutionRoleARN=<SAGEMAKER_PIPELINE_EXECUTION_ROLE>,
)
Expected behavior
Sagemaker pipeline should not throw this error.
This step failed. For more information, view the logs
ClientError: ClientError: expected DHCP options to include keys domain-name-servers and domain-name, but missing one or more attributes: {DhcpOptionsId:dopt-XXXXXXXXXXXXXXXXX DomainNameServers:[0xc009bd0e08] DomainNameSearch:<nil>}.Please refer to https://docs.aws.amazon.com/vpc/latest/userguide/DHCPOptionSet.html for more details
System information
- SageMaker Python SDK version: 2.161.0
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): -
- Framework version: -
- Python version: 3.9
- CPU or GPU: CPU
- Custom Docker image (Y/N): Yes (Base image python:3.9-slim-buster)
Additional context
- Custom image used in processing step:
FROM python:3.9-slim-buster
RUN pip3 install --no-cache-dir \
numpy==1.24.3 \
pandas==2.0.2 \
psycopg2-binary==2.9.6 \
sqlalchemy==2.0.15 \
clickhouse-driver==0.2.6 \
sentence-transformers==2.2.2 \
sagemaker==2.161.0 \
boto3==1.26.145
ENV PYTHONUNBUFFERED=TRUE
ENTRYPOINT ["python"]
- There are no private subnets in our VPC
- VPC only has IPv4 CIDR
- DHCP option set is linked to VPC
- Sagemaker studio also runs in VPC, but there are no such problems with either studio or any other aws service
- Other
sagemaker.network.NetworkConfigparameters were left to default
Metadata
Metadata
Assignees
Labels
component: processingRelates to the SageMaker Processing PlatformRelates to the SageMaker Processing Platformtype: bug
