-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
- I have defined a custom class from the
sagemaker.estimator.Frameworkbase class. - I've successfully trained a model in a BYO container using
sagemaker-training(so.fitand.transformdon't passtrainorserveas cli arguments to docker, they pass user defined hyperparameters instead). - After training, I try to define a
transformerin order to do batch inference, but I instead get the error 'NoneType' object has no attribute '_create_sagemaker_model'
To reproduce
Dockerfile:
FROM ubuntu:16.04
RUN apt-get update && \
apt-get -y install build-essential libatlas-dev git wget curl nginx jq libatlas3-base
RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -bfp /miniconda3 && \
rm Miniconda3-latest-Linux-x86_64.sh
ENV PATH=/miniconda3/bin:${PATH}
RUN apt-get update && apt-get install -y python-pip && pip install sagemaker-training catboost
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PYTHONIOENCODING=UTF-8
Python script:
%%writefile catboost_training.py
import sys
import argparse
import logging
import os
from catboost import CatBoostRegressor
import numpy as np
import pandas as pd
print(sys.argv)
if __name__ == '__main__':
print('extracting arguments')
print('pd version', pd.__version__)
parser = argparse.ArgumentParser()
parser.add_argument('--mode', type=str) # This is where you pass 'train' or 'serve'
parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
parser.add_argument('--train-file', type=str, default='boston_train.csv')
parser.add_argument('--test-file', type=str, default='boston_test.csv')
parser.add_argument('--model-name', type=str, default='catboost_model.dump')
parser.add_argument('--features', type=str, default='CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT') # in this script we ask user to explicitly name features
parser.add_argument('--target', type=str, default='target') # in this script we ask user to explicitly name the target
parser.add_argument('--depth', type=int, default=2)
parser.add_argument('--learning_rate', type=float)
args, _ = parser.parse_known_args()
print(args.mode)
print(sys.argv)
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def training(args):
logging.info('reading data')
train_df = pd.read_csv(os.path.join(args.train, args.train_file))
test_df = pd.read_csv(os.path.join(args.test, args.test_file))
logging.info('building training and testing datasets')
X_train = train_df[args.features.split()]
X_test = test_df[args.features.split()]
y_train = train_df[args.target]
y_test = test_df[args.target]
# define and train model
model = CatBoostRegressor()
model.fit(X_train, y_train, eval_set=(X_test, y_test), logging_level='Silent')
print('depth', model.get_all_params()['depth'])
# print abs error
logging.info('validating model')
abs_err = np.abs(model.predict(X_test) - y_test)
# print couple perf metrics
for q in [10, 50, 90]:
logging.info('AE-at-' + str(q) + 'th-percentile: '
+ str(np.percentile(a=abs_err, q=q)))
# persist model
path = os.path.join(args.model_dir, args.model_name)
logging.info('saving to {}'.format(path))
model.save_model(path)
def serve(args):
print(sys.argv)
print("I'm doing inference and stuff")
if args.mode == 'train':
print('Training started')
training(args)
print('Training ended')
elif args.mode == 'serve':
print('Inference started')
serve(args)
print('Inference ended')
else:
print('Invalid entry for CLI parameter: mode', args.mode)
Sagemaker notebook commands:
from sagemaker.estimator import Framework
class CatBoostEstimator(Framework):
def __init__(
self,
entry_point,
source_dir=None,
hyperparameters=None,
py_version="py3",
framework_version=None,
image_name=None,
distributions=None,
**kwargs):
super(CatBoostEstimator, self).__init__(
entry_point, source_dir, hyperparameters, image_name=image_name, **kwargs)
def _configure_distribution(self, distributions):
return
def create_model(
self,
model_server_workers=None,
role=None,
vpc_config_override=None,
entry_point=None,
source_dir=None,
dependencies=None,
image_name=None,
**kwargs):
return None
I then instantiate the extended framework class
output_path = 's3://' + bucket + '/catboost/training_jobs'
catboost = CatBoostEstimator(
image_name=container_image_uri,
role=role,
entry_point='catboost_training.py',
output_path=output_path,
train_instance_count=1,
train_instance_type='local',#'ml.m5.xlarge',
#dependencies=['inference_test.py'], # List of strings that are the paths to other scripts required by the main entrypoint (e.g. module_tools)
hyperparameters={'mode': 'train',
'features': 'CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT',
'target': 'target'})
and call the fit function to successfully execute the python script:
catboost.fit({'train': train_location, 'test': test_location}, logs=True)
Immediately after, I try to define a transformer that I could then use to do batch inference with
transformer = catboost.transformer(instance_count=1, instance_type="ml.m5.large")
but this fails with this error:
AttributeError Traceback (most recent call last)
<ipython-input-42-ab38193d5da6> in <module>
1 #catboost.model_uri = 's3://' + bucket + '/catboost/training_jobs/' + catboost.latest_training_job.name + '/model.tar.gz'
----> 2 transformer = catboost.transformer(instance_count=1, instance_type="ml.m5.large")
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in transformer(self, instance_count, instance_type, strategy, assemble_with, output_path, output_kms_key, accept, env, max_concurrent_transforms, max_payload, tags, role, model_server_workers, volume_kms_key, entry_point, vpc_config_override, enable_network_isolation, model_name)
2637 name=model_name,
2638 )
-> 2639 model._create_sagemaker_model(instance_type, tags=tags)
2640
2641 transform_env = model.env.copy()
AttributeError: 'NoneType' object has no attribute '_create_sagemaker_model'
Expected behavior
The last command to give me a transformer that can do batch transforms of data in S3
Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.59.3
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Custom
- Framework version: N/A
- Python version: 3.6.13
- CPU or GPU: CPU
- Custom Docker image (Y/N): Y
Additional context
I have so many questions about how any of sagemaker's tools work, but the documentation fails to answer any of them. If anyone is feeling charitable, would they mind giving their thoughts on the following questions just to see if I understand things correctly:
- Do you need to deploy something to an endpoint in order to use batch transformation. (I don't think you do)
- Following question 1, if you don't need to deploy an endpoint to use batch transform, then why does the
.transform()method in all the examples keep throwing errors about being unable to find theserveexecutable? - When using a
Dockerfilethat's installed thesagemaker-traininglibrary, in conjunction with a custom estimator usingsagemaker.estimator.Frameworkas a base class, is it correct that the.fit()method no longer invokes the container withdocker run <image_name> train, it instead usesdocker run <image_name> <list_of_user_defined_environment_variables>? - Do I need the
sagemaker-inferencelibrary installed in the Dockerfile to do batch inference? Or is it only related to hosting endpoints? - Once a Framework estimator has been trained, and a
model.tar.gzfile produced in S3, should you then define and declare a transformer from asagemaker.model.FrameworkModelinstance that allows you to pass entrypoints and model data rather than the 'regular'model.Modelclass? - Broadly speaking, you might want to decouple training and inference (train quarterly, but perform batch transform every week for example). Is the best way of doing that saving a model in sagemaker (so it appears in the sagemaker console inference>models tab), then loading that in a notebook instance and defining a transformer from it?