Framework estimator .transform() fails with 'NoneType' object has no attribute '_create_sagemaker_model'

**Describe the bug**

- I have defined a custom class from the `sagemaker.estimator.Framework` base class.
- I've successfully trained a model in a BYO container using `sagemaker-training` (so `.fit` and `.transform` don't pass `train` or `serve` as cli arguments to docker, they pass user defined hyperparameters instead).
- After training, I try to define a `transformer` in order to do batch inference, but I instead get the error **'NoneType' object has no attribute '_create_sagemaker_model'**

**To reproduce**
Dockerfile:
```
FROM ubuntu:16.04
RUN apt-get update && \
    apt-get -y install build-essential libatlas-dev git wget curl nginx jq libatlas3-base
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
    bash Miniconda3-latest-Linux-x86_64.sh -bfp /miniconda3 && \
    rm Miniconda3-latest-Linux-x86_64.sh
ENV PATH=/miniconda3/bin:${PATH} 
RUN apt-get update && apt-get install -y python-pip && pip install sagemaker-training catboost
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PYTHONIOENCODING=UTF-8
```
Python script:
```
%%writefile catboost_training.py

import sys
import argparse
import logging
import os

from catboost import CatBoostRegressor
import numpy as np
import pandas as pd

print(sys.argv)

if __name__ == '__main__':

    print('extracting arguments')
    print('pd version', pd.__version__)
    parser = argparse.ArgumentParser()
    
    parser.add_argument('--mode', type=str)  # This is where you pass 'train' or 'serve'
    parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
    parser.add_argument('--train-file', type=str, default='boston_train.csv')
    parser.add_argument('--test-file', type=str, default='boston_test.csv')
    parser.add_argument('--model-name', type=str, default='catboost_model.dump')
    parser.add_argument('--features', type=str, default='CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT')  # in this script we ask user to explicitly name features
    parser.add_argument('--target', type=str, default='target') # in this script we ask user to explicitly name the target
    parser.add_argument('--depth', type=int, default=2)
    parser.add_argument('--learning_rate', type=float)
    
    args, _ = parser.parse_known_args()
    print(args.mode)
    print(sys.argv)
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def training(args):

        logging.info('reading data')
        train_df = pd.read_csv(os.path.join(args.train, args.train_file))
        test_df = pd.read_csv(os.path.join(args.test, args.test_file))

        logging.info('building training and testing datasets')
        X_train = train_df[args.features.split()]
        X_test = test_df[args.features.split()]
        y_train = train_df[args.target]
        y_test = test_df[args.target]

        # define and train model
        model = CatBoostRegressor()

        model.fit(X_train, y_train, eval_set=(X_test, y_test), logging_level='Silent') 
        print('depth', model.get_all_params()['depth'])

        # print abs error
        logging.info('validating model')
        abs_err = np.abs(model.predict(X_test) - y_test)

        # print couple perf metrics
        for q in [10, 50, 90]:
            logging.info('AE-at-' + str(q) + 'th-percentile: '
                  + str(np.percentile(a=abs_err, q=q)))

        # persist model
        path = os.path.join(args.model_dir, args.model_name)
        logging.info('saving to {}'.format(path))
        model.save_model(path)
    
    def serve(args):
        print(sys.argv)
        print("I'm doing inference and stuff")
        
    
    if args.mode == 'train':
        print('Training started')
        training(args)
        print('Training ended')
    elif args.mode == 'serve':
        print('Inference started')
        serve(args)
        print('Inference ended')
    else:
        print('Invalid entry for CLI parameter: mode', args.mode)
```
Sagemaker notebook commands:
```
from sagemaker.estimator import Framework

class CatBoostEstimator(Framework):
    def __init__(
        self,
        entry_point,
        source_dir=None,
        hyperparameters=None,
        py_version="py3",
        framework_version=None,
        image_name=None,
        distributions=None,
        **kwargs):
        
        super(CatBoostEstimator, self).__init__(
            entry_point, source_dir, hyperparameters, image_name=image_name, **kwargs)
    
    def _configure_distribution(self, distributions):
        return
    
    def create_model(
        self,
        model_server_workers=None,
        role=None,
        vpc_config_override=None,
        entry_point=None,
        source_dir=None,
        dependencies=None,
        image_name=None,
        **kwargs):
        
        return None
```
I then instantiate the extended framework class
```
output_path = 's3://' + bucket + '/catboost/training_jobs'

catboost = CatBoostEstimator(
    image_name=container_image_uri,
    role=role,
    entry_point='catboost_training.py',
    output_path=output_path,
    train_instance_count=1, 
    train_instance_type='local',#'ml.m5.xlarge',
    #dependencies=['inference_test.py'],  # List of strings that are the paths to other scripts required by the main entrypoint (e.g. module_tools)
    hyperparameters={'mode': 'train',
                     'features': 'CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT',
                     'target': 'target'})
```
and call the fit function to successfully execute the python script:
```
catboost.fit({'train': train_location, 'test': test_location}, logs=True)
```
Immediately after, I try to define a transformer that I could then use to do batch inference with
```
transformer = catboost.transformer(instance_count=1, instance_type="ml.m5.large")
```
but this fails with this error:
```
AttributeError                            Traceback (most recent call last)
<ipython-input-42-ab38193d5da6> in <module>
      1 #catboost.model_uri = 's3://' + bucket + '/catboost/training_jobs/' + catboost.latest_training_job.name + '/model.tar.gz'
----> 2 transformer = catboost.transformer(instance_count=1, instance_type="ml.m5.large")

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in transformer(self, instance_count, instance_type, strategy, assemble_with, output_path, output_kms_key, accept, env, max_concurrent_transforms, max_payload, tags, role, model_server_workers, volume_kms_key, entry_point, vpc_config_override, enable_network_isolation, model_name)
   2637                 name=model_name,
   2638             )
-> 2639             model._create_sagemaker_model(instance_type, tags=tags)
   2640 
   2641             transform_env = model.env.copy()

AttributeError: 'NoneType' object has no attribute '_create_sagemaker_model'
```

**Expected behavior**
The last command to give me a transformer that can do batch transforms of data in S3

**Screenshots or logs**
If applicable, add screenshots or logs to help explain your problem.

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.59.3
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: Custom
- **Framework version**: N/A
- **Python version**: 3.6.13
- **CPU or GPU**: CPU
- **Custom Docker image (Y/N)**: Y

**Additional context**
I have so many questions about how any of sagemaker's tools work, but the documentation fails to answer any of them. If anyone is feeling charitable, would they mind giving their thoughts on the following questions just to see if I understand things correctly:

1. Do you need to deploy something to an endpoint in order to use batch transformation. (I don't think you do)
2. Following question 1, if you don't need to deploy an endpoint to use batch transform, then why does the `.transform()` method in all the examples keep throwing errors about being unable to find the `serve` executable?
3. When using a `Dockerfile` that's installed the `sagemaker-training` library, in conjunction with a custom estimator using `sagemaker.estimator.Framework` as a base class, is it correct that the `.fit()` method no longer invokes the container with `docker run <image_name> train`, it instead uses `docker run <image_name> <list_of_user_defined_environment_variables>`?
4. Do I need the `sagemaker-inference` library installed in the Dockerfile to do batch inference? Or is it only related to hosting endpoints?
5. Once a Framework estimator has been trained, and a `model.tar.gz` file produced in S3, should you then define and declare a transformer from a `sagemaker.model.FrameworkModel` instance that allows you to pass entrypoints and model data rather than the 'regular' `model.Model` class?
6. Broadly speaking, you might want to decouple training and inference (train quarterly, but perform batch transform every week for example). Is the best way of doing that saving a model in sagemaker (so it appears in the sagemaker console inference>models tab), then loading that in a notebook instance and defining a transformer from it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Framework estimator .transform() fails with 'NoneType' object has no attribute '_create_sagemaker_model' #2726

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Framework estimator .transform() fails with 'NoneType' object has no attribute '_create_sagemaker_model' #2726

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions