Skip to content

SparkMLModel/PipelineModel not supported through LocalSession  #1846

@njgerner

Description

@njgerner

Describe the bug
I am unable to run a SparkMLModel via PipelineModel in local mode due to a lack of support for Containers in the LocalSession method create_model.

To reproduce

import boto3
from sagemaker import LocalSession
from sagemaker.pipeline import PipelineModel
from sagemaker.sparkml.model import SparkMLModel

session = LocalSession(boto3.Session(region_name='us-east-1'))

sparkml_model = SparkMLModel(model_data=</path/to/model>, sagemaker_session=session)

sm_model = PipelineModel(
        name="spark-pipeline-model"
        role="arn:aws:iam::<account_id>:role/role-name",
        models=[sparkml_model],
        sagemaker_session=session
)

transformer = sm_model.transformer(
        instance_count=1,
        instance_type="local",
        output_path="path/to/output",
        accept="text/csv",
        assemble_with='Line',
        env={"SAGEMAKER_SPARKML_SCHEMA": SPARK_ML_SCHEMA}
)

Will give the error result below.

Expected behavior
LocalSession should support the Containers parameter.

Screenshots or logs

    def create_model(
        self,
        name,
        role,
        container_defs,
        vpc_config=None,
        enable_network_isolation=False,
        primary_container=None,
        tags=None,
    ):
        """Create an Amazon SageMaker ``Model``.
        Specify the S3 location of the model artifacts and Docker image containing
        the inference code. Amazon SageMaker uses this information to deploy the
        model in Amazon SageMaker. This method can also be used to create a Model for an Inference
        Pipeline if you pass the list of container definitions through the containers parameter.
    
        Args:
            name (str): Name of the Amazon SageMaker ``Model`` to create.
            role (str): An AWS IAM role (either name or full ARN). The Amazon SageMaker training
                jobs and APIs that create Amazon SageMaker endpoints use this role to access
                training data and model artifacts. You must grant sufficient permissions to this
                role.
            container_defs (list[dict[str, str]] or [dict[str, str]]): A single container
                definition or a list of container definitions which will be invoked sequentially
                while performing the prediction. If the list contains only one container, then
                it'll be passed to SageMaker Hosting as the ``PrimaryContainer`` and otherwise,
                it'll be passed as ``Containers``.You can also specify the  return value of
                ``sagemaker.get_container_def()`` or ``sagemaker.pipeline_container_def()``,
                which will used to create more advanced container configurations, including model
                containers which need artifacts from S3.
            vpc_config (dict[str, list[str]]): The VpcConfig set on the model (default: None)
                * 'Subnets' (list[str]): List of subnet ids.
                * 'SecurityGroupIds' (list[str]): List of security group ids.
            enable_network_isolation (bool): Wether the model requires network isolation or not.
            primary_container (str or dict[str, str]): Docker image which defines the inference
                code. You can also specify the return value of ``sagemaker.container_def()``,
                which is used to create more advanced container configurations, including model
                containers which need artifacts from S3. This field is deprecated, please use
                container_defs instead.
            tags(List[dict[str, str]]): Optional. The list of tags to add to the model.
    
        Example:
            >>> tags = [{'Key': 'tagname', 'Value': 'tagvalue'}]
            For more information about tags, see https://boto3.amazonaws.com/v1/documentation\
            /api/latest/reference/services/sagemaker.html#SageMaker.Client.add_tags
    
    
        Returns:
            str: Name of the Amazon SageMaker ``Model`` created.
        """
        if container_defs and primary_container:
            raise ValueError("Both container_defs and primary_container can not be passed as input")
    
        if primary_container:
            msg = (
                "primary_container is going to be deprecated in a future release. Please use "
                "container_defs instead."
            )
            warnings.warn(msg, DeprecationWarning)
            container_defs = primary_container
    
        role = self.expand_role(role)
    
        if isinstance(container_defs, list):
            container_definition = container_defs
        else:
            container_definition = _expand_container_def(container_defs)
    
        create_model_request = _create_model_request(
            name=name, role=role, container_def=container_definition, tags=tags
        )
    
        if vpc_config:
            create_model_request["VpcConfig"] = vpc_config
    
        if enable_network_isolation:
            create_model_request["EnableNetworkIsolation"] = True
    
        LOGGER.info("Creating model with name: %s", name)
        LOGGER.debug("CreateModel request: %s", json.dumps(create_model_request, indent=4))
    
        try:
>           self.sagemaker_client.create_model(**create_model_request)
E           TypeError: create_model() missing 1 required positional argument: 'PrimaryContainer'

System information
A description of your system. Please provide:

Additional context
This works as expected not on my local machine, but is necessary for us for automated testing in CI.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions