Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'Model' object has no attribute '__framework_name__' #1201

Closed
cfregly opened this issue Dec 27, 2019 · 15 comments
Closed

AttributeError: 'Model' object has no attribute '__framework_name__' #1201

cfregly opened this issue Dec 27, 2019 · 15 comments
Labels

Comments

@cfregly
Copy link

cfregly commented Dec 27, 2019

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): TensorFlow
  • Framework Version: 2.0.0
  • Python Version: Python3
  • CPU or GPU: CPU
  • Python SDK Version: 1.49.0
  • StepFunctions SDK: 1.0.0.3
  • Are you using a custom image: No

Describe the problem

Describe the problem or feature request clearly here.

Here is the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-23cc1e7f4991> in <module>()
      3     role=workflow_execution_role,
      4     inputs=training_data_uri,
----> 5     s3_bucket=model_output_path
      6 )

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/template/pipeline/train.py in __init__(self, estimator, role, inputs, s3_bucket, client, **kwargs)
     64             self.pipeline_name = 'training-pipeline-{date}'.format(date=self._generate_timestamp())
     65 
---> 66         self.definition = self.build_workflow_definition()
     67         self.input_template = self._extract_input_template(self.definition)
     68 

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/template/pipeline/train.py in build_workflow_definition(self)
     95             instance_type=train_instance_type,
     96             model=model,
---> 97             model_name=default_name
     98         )
     99 

~/anaconda3/envs/python3/lib/python3.6/site-packages/stepfunctions/steps/sagemaker.py in __init__(self, state_id, model, model_name, instance_type, **kwargs)
    171         """
    172         if isinstance(model, FrameworkModel):
--> 173             parameters = model_config(model=model, instance_type=instance_type, role=model.role, image=model.image)
    174             if model_name:
    175                 parameters['ModelName'] = model_name

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/workflow/airflow.py in model_config(instance_type, model, role, image)
    577 
    578     if isinstance(model, sagemaker.model.FrameworkModel):
--> 579         container_def = prepare_framework_container_def(model, instance_type, s3_operations)
    580     else:
    581         container_def = model.prepare_container_def(instance_type)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/workflow/airflow.py in prepare_framework_container_def(model, instance_type, s3_operations)
    519         deploy_image = fw_utils.create_image_uri(
    520             region_name,
--> 521             model.__framework_name__,
    522             instance_type,
    523             model.framework_version,

AttributeError: 'Model' object has no attribute '__framework_name__'

We are doing this through the stepfunctions SDK with TrainingPipeline

PyTorch is OK.

pytorch/model.py

tensorflow/model.py

Minimal repro / logs

Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  • Exact command to reproduce:
...
pipeline = TrainingPipeline(
    estimator=tensorflow_mnist_estimator,
    role=workflow_execution_role,
    inputs=training_data_uri,
    s3_bucket=model_output_path
)
...

Here is the link to the successful PyTorch version: https://github.com/awslabs/amazon-sagemaker-examples/blob/346811d27ba9e1cd6a3db94a90f584b951f7aa33/step-functions-data-science-sdk/training_pipeline_pytorch_mnist/training_pipeline_pytorch_mnist.ipynb

There is no equivalent example using TensorFlow.

@cfregly
Copy link
Author

cfregly commented Dec 27, 2019

The error is here:

model.__framework_name__,

@cfregly
Copy link
Author

cfregly commented Dec 27, 2019

Further debugging indicates that the model instance is not being created properly. Many attributes are None including entry_point, framework_version, etc

@nadiaya
Copy link
Contributor

nadiaya commented Jan 3, 2020

Could you provide a more detailed code sample you have been running?

PyTorch and TensorFlow support implementation differ (as TF has script mode and framework mode for older versions as well as TFS for inference). For TF 2.0 serving you need to use a different Model

class Model(sagemaker.model.FrameworkModel):

@cfregly
Copy link
Author

cfregly commented Jan 4, 2020

Below is a more-complete code sample. Can you provide more details on how to use the different Model you mention above?

Note: I'm using the TrainingPipeline class from the step-functions SDK. Does the step-functions SDK implementation need to change to use this new model?

I'm following this example...

https://github.com/awslabs/amazon-sagemaker-examples/blob/346811d27ba9e1cd6a3db94a90f584b951f7aa33/step-functions-data-science-sdk/training_pipeline_pytorch_mnist/training_pipeline_pytorch_mnist.ipynb

... and adapting to TensorFlow as seen below.

from sagemaker.tensorflow import TensorFlow

mnist_estimator = TensorFlow(entry_point='mnist_keras_tf2.py',
                             source_dir='./src',
                             role=sagemaker_execution_role,
                             framework_version='2.0.0',
                             train_instance_count=1,
                             train_instance_type='ml.p3.2xlarge',
                             py_version='py3',
                             distributions={'parameter_server': {'enabled': True}})

pipeline = TrainingPipeline(
    estimator=tensorflow_mnist_estimator,
    role=workflow_execution_role,
    inputs=training_data_uri,
    s3_bucket=model_output_path
)

@cfregly
Copy link
Author

cfregly commented Jan 7, 2020

any update?

@laurenyu
Copy link
Contributor

laurenyu commented Jan 8, 2020

Sorry for the delay in response. I've reached out to the team who originally authored the notebook to see if they have any insight. Thanks for your patience!

@cfregly
Copy link
Author

cfregly commented Jan 8, 2020

Note: The original notebook is working OK because it’s using PyTorch. I’m trying to use TensorFlow.

@shunjd
Copy link

shunjd commented Jan 9, 2020

I think the issue here is that TensforFlow returns a different model object.

The TrainingPipeline is expecting to have model.TensorFlowModel, but instead, the actual model is serving.Model.

class TensorFlowModel(FrameworkModel):

class Model(sagemaker.model.FrameworkModel):

With some code analysis, I figure out that script_mode is always on when python version is set to 'py3' which makes the create_model call always return serving.Model.

script_mode (bool): If set to True will the estimator will use the Script Mode
containers (default: False). This will be ignored if py_version is set to 'py3'.

if endpoint_type == "tensorflow-serving" or self._script_mode_enabled():

@laurenyu Could you provide some insights on why does Tensorflow have two different models?

@laurenyu
Copy link
Contributor

laurenyu commented Jan 9, 2020

legacy reasons - there were two different families of pre-built images for supporting TFS. sagemaker.tensorflow.serving.Model is the one we recommend/actively support at this point.

@shunjd
Copy link

shunjd commented Jan 14, 2020

I dig into the code a little bit, i think we need a fix in the sagemaker.workflow.airflow package.

The following code can reproduce the error easily.

from sagemaker.tensorflow import TensorFlow
from sagemaker.workflow.airflow import model_config, training_config

sagemaker_execution_role = 'SM_EXECUTION_ROLE'

mnist_estimator = TensorFlow(entry_point='mnist_keras_tf2.py',
                             source_dir='./scripts',
                             role=sagemaker_execution_role,
                             framework_version='2.0.0',
                             train_instance_count=1,
                             train_instance_type='ml.p3.2xlarge',
                             py_version='py3',
                             distributions={'parameter_server': {'enabled': True}})

train = training_config(mnist_estimator))
model = mnist_estimator.create_model()

print(model_config('ml.p3.2xlarge', model)) # <-- Will throw the exception

The issue is related to the following code blocks.

In Airflow package, the method requires three mandatory attributes(model.__framework_name__, model.framework_version, model.py_version) which are missing in the serving.Model.

if not deploy_image:
region_name = model.sagemaker_session.boto_session.region_name
deploy_image = fw_utils.create_image_uri(
region_name,
model.__framework_name__,
instance_type,
model.framework_version,
model.py_version,
)

Instead, the serving.Model has its own logic to create the model image. I believe the behavior should be consistent across all the Models.

def _get_image_uri(self, instance_type, accelerator_type=None):
"""
Args:
instance_type:
accelerator_type:
"""
if self.image:
return self.image
region_name = self.sagemaker_session.boto_region_name
return create_image_uri(
region_name,
Model.FRAMEWORK_NAME,
instance_type,
self._framework_version,
accelerator_type=accelerator_type,
)

@cfregly
Copy link
Author

cfregly commented Jan 14, 2020

Any update on this? @shunjd

And should this issue should be labeled a "bug" not "question"? Because of this issue, TrainingPipeline is not working for TensorFlow. This seems like a bug.

Thanks!

@laurenyu
Copy link
Contributor

thanks for looking into this, @shunjd!

yes, you're right that it's a bug. we'll work on getting a fix out.

@ajaykarpur
Copy link
Contributor

@shunjd
Copy link

shunjd commented Jan 20, 2020

@laurenyu @ajaykarpur Thank you for the fix, but the problem still exists.

The problem is that when Tensorflow.create_model() is being called, the entry_point from the estimator is not being passed into the model. PyTorch does not have this problem if you compare the code below.

https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/pytorch/estimator.py#L164-L169

https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L572-L579

@ajaykarpur
Copy link
Contributor

ajaykarpur commented Jan 20, 2020

@shunjd I've just merged another fix for the issue you identified (#1252). It will be released later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants