Skip to content

Framework estimator .transform() fails with 'NoneType' object has no attribute '_create_sagemaker_model' #2726

@Dan-Treacher

Description

@Dan-Treacher

Describe the bug

  • I have defined a custom class from the sagemaker.estimator.Framework base class.
  • I've successfully trained a model in a BYO container using sagemaker-training (so .fit and .transform don't pass train or serve as cli arguments to docker, they pass user defined hyperparameters instead).
  • After training, I try to define a transformer in order to do batch inference, but I instead get the error 'NoneType' object has no attribute '_create_sagemaker_model'

To reproduce
Dockerfile:

FROM ubuntu:16.04
RUN apt-get update && \
    apt-get -y install build-essential libatlas-dev git wget curl nginx jq libatlas3-base
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash Miniconda3-latest-Linux-x86_64.sh -bfp /miniconda3 && \
    rm Miniconda3-latest-Linux-x86_64.sh
ENV PATH=/miniconda3/bin:${PATH} 
RUN apt-get update && apt-get install -y python-pip && pip install sagemaker-training catboost
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PYTHONIOENCODING=UTF-8

Python script:

%%writefile catboost_training.py

import sys
import argparse
import logging
import os

from catboost import CatBoostRegressor
import numpy as np
import pandas as pd

print(sys.argv)

if __name__ == '__main__':

    print('extracting arguments')
    print('pd version', pd.__version__)
    parser = argparse.ArgumentParser()
    
    parser.add_argument('--mode', type=str)  # This is where you pass 'train' or 'serve'
    parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    parser.add_argument('--train-file', type=str, default='boston_train.csv')
    parser.add_argument('--test-file', type=str, default='boston_test.csv')
    parser.add_argument('--model-name', type=str, default='catboost_model.dump')
    parser.add_argument('--features', type=str, default='CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT')  # in this script we ask user to explicitly name features
    parser.add_argument('--target', type=str, default='target') # in this script we ask user to explicitly name the target
    parser.add_argument('--depth', type=int, default=2)
    parser.add_argument('--learning_rate', type=float)
    
    args, _ = parser.parse_known_args()
    print(args.mode)
    print(sys.argv)
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def training(args):

        logging.info('reading data')
        train_df = pd.read_csv(os.path.join(args.train, args.train_file))
        test_df = pd.read_csv(os.path.join(args.test, args.test_file))

        logging.info('building training and testing datasets')
        X_train = train_df[args.features.split()]
        X_test = test_df[args.features.split()]
        y_train = train_df[args.target]
        y_test = test_df[args.target]

        # define and train model
        model = CatBoostRegressor()

        model.fit(X_train, y_train, eval_set=(X_test, y_test), logging_level='Silent') 
        print('depth', model.get_all_params()['depth'])

        # print abs error
        logging.info('validating model')
        abs_err = np.abs(model.predict(X_test) - y_test)

        # print couple perf metrics
        for q in [10, 50, 90]:
            logging.info('AE-at-' + str(q) + 'th-percentile: '
                  + str(np.percentile(a=abs_err, q=q)))

        # persist model
        path = os.path.join(args.model_dir, args.model_name)
        logging.info('saving to {}'.format(path))
        model.save_model(path)
    
    def serve(args):
        print(sys.argv)
        print("I'm doing inference and stuff")
        
    
    if args.mode == 'train':
        print('Training started')
        training(args)
        print('Training ended')
    elif args.mode == 'serve':
        print('Inference started')
        serve(args)
        print('Inference ended')
    else:
        print('Invalid entry for CLI parameter: mode', args.mode)

Sagemaker notebook commands:

from sagemaker.estimator import Framework

class CatBoostEstimator(Framework):
    def __init__(
        self,
        entry_point,
        source_dir=None,
        hyperparameters=None,
        py_version="py3",
        framework_version=None,
        image_name=None,
        distributions=None,
        **kwargs):
        
        super(CatBoostEstimator, self).__init__(
            entry_point, source_dir, hyperparameters, image_name=image_name, **kwargs)
    
    def _configure_distribution(self, distributions):
        return
    
    def create_model(
        self,
        model_server_workers=None,
        role=None,
        vpc_config_override=None,
        entry_point=None,
        source_dir=None,
        dependencies=None,
        image_name=None,
        **kwargs):
        
        return None

I then instantiate the extended framework class

output_path = 's3://' + bucket + '/catboost/training_jobs'

catboost = CatBoostEstimator(
    image_name=container_image_uri,
    role=role,
    entry_point='catboost_training.py',
    output_path=output_path,
    train_instance_count=1, 
    train_instance_type='local',#'ml.m5.xlarge',
    #dependencies=['inference_test.py'],  # List of strings that are the paths to other scripts required by the main entrypoint (e.g. module_tools)
    hyperparameters={'mode': 'train',
                     'features': 'CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT',
                     'target': 'target'})

and call the fit function to successfully execute the python script:

catboost.fit({'train': train_location, 'test': test_location}, logs=True)

Immediately after, I try to define a transformer that I could then use to do batch inference with

transformer = catboost.transformer(instance_count=1, instance_type="ml.m5.large")

but this fails with this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-42-ab38193d5da6> in <module>
      1 #catboost.model_uri = 's3://' + bucket + '/catboost/training_jobs/' + catboost.latest_training_job.name + '/model.tar.gz'
----> 2 transformer = catboost.transformer(instance_count=1, instance_type="ml.m5.large")

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in transformer(self, instance_count, instance_type, strategy, assemble_with, output_path, output_kms_key, accept, env, max_concurrent_transforms, max_payload, tags, role, model_server_workers, volume_kms_key, entry_point, vpc_config_override, enable_network_isolation, model_name)
   2637                 name=model_name,
   2638             )
-> 2639             model._create_sagemaker_model(instance_type, tags=tags)
   2640 
   2641             transform_env = model.env.copy()

AttributeError: 'NoneType' object has no attribute '_create_sagemaker_model'

Expected behavior
The last command to give me a transformer that can do batch transforms of data in S3

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.59.3
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Custom
  • Framework version: N/A
  • Python version: 3.6.13
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y

Additional context
I have so many questions about how any of sagemaker's tools work, but the documentation fails to answer any of them. If anyone is feeling charitable, would they mind giving their thoughts on the following questions just to see if I understand things correctly:

  1. Do you need to deploy something to an endpoint in order to use batch transformation. (I don't think you do)
  2. Following question 1, if you don't need to deploy an endpoint to use batch transform, then why does the .transform() method in all the examples keep throwing errors about being unable to find the serve executable?
  3. When using a Dockerfile that's installed the sagemaker-training library, in conjunction with a custom estimator using sagemaker.estimator.Framework as a base class, is it correct that the .fit() method no longer invokes the container with docker run <image_name> train, it instead uses docker run <image_name> <list_of_user_defined_environment_variables>?
  4. Do I need the sagemaker-inference library installed in the Dockerfile to do batch inference? Or is it only related to hosting endpoints?
  5. Once a Framework estimator has been trained, and a model.tar.gz file produced in S3, should you then define and declare a transformer from a sagemaker.model.FrameworkModel instance that allows you to pass entrypoints and model data rather than the 'regular' model.Model class?
  6. Broadly speaking, you might want to decouple training and inference (train quarterly, but perform batch transform every week for example). Is the best way of doing that saving a model in sagemaker (so it appears in the sagemaker console inference>models tab), then loading that in a notebook instance and defining a transformer from it?

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions