Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON examples for SageMaker / TF serving #18

Closed
nkconnor opened this issue Dec 12, 2017 · 5 comments
Closed

JSON examples for SageMaker / TF serving #18

nkconnor opened this issue Dec 12, 2017 · 5 comments
Labels

Comments

@nkconnor
Copy link
Contributor

cols = [
    tf.feature_column.numeric_column("age"),
    tf.feature_column.categorical_column_with_vocabulary_list("gender", ["m", "f", "other"]),
    tf.feature_column.categorical_column_with_hash_bucket("city", hash_bucket_size=15000)
]

example_spec = tf.feature_column.make_parse_example_spec

print(example_spec)
# {'gender': VarLenFeature(dtype=tf.string), 'age': FixedLenFeature(shape=(1,), dtype=tf.float32, default_value=None), 'city': VarLenFeature(dtype=tf.string)}

srv_fun = tf.estimator.export.build_parsing_serving_input_receiver_fn(example_spec)()

print(str(srv_fun))
#ServingInputReceiver(features={'city': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f83c7509ad0>, 'age': <tf.Tensor 'ParseExample_7/ParseExample:6' shape=(?, 1) dtype=float32>, 'gender': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f83c75093d0>}, receiver_tensors={'examples': <tf.Tensor 'input_example_tensor_8:0' shape=(?,) dtype=string>}, receiver_tensors_alternatives=None)

What format can we use to send predict requests using the Sagemaker SDK for input functions like the above?

The JSON serializer only handles arrays.. so it seems like tf_estimator.predict({"city":"Paris", "gender":"m", "age":22}) is out. I tried variations of Array input and get cryptic errors from the TF serving proxy client (that source code is not available to my knowledge)

Looking at the TF Iris DNN example notebook: it uses a syntax like iris_predictor.predict([6.4, 3.2, 4.5, 1.5]) though the FeatureSpec is like {'input': IrisArrayData}. So perhaps the feature spec needs a top level?

@mvsusp
Copy link
Contributor

mvsusp commented Dec 13, 2017

Hi @nkconnor,

Thanks for your report, I've confirmed that it is a bug. I'm investigating it and will give you a better update tomorrow.

@mvsusp mvsusp added the bug label Dec 13, 2017
@mvsusp
Copy link
Contributor

mvsusp commented Dec 13, 2017

@nkconnor,

predictor.predict should be able to handle a dictionary so tf_estimator.predict({"city":"Paris", "gender":"m", "age":22}) is the correct way to send requests like you mentioned above. The predictor logic defined here needs to handle that case.

The container source code is not open sourced yet, we are planning on releasing it very soon.
I will open a bug in the container side as well, to improve the logging when the prediction fails.

I have a temporary solution to unblock you until this issue is fixed. The solution is to create a instance of a RealTimePredictor that takes a tensorflow_serving.apis request as parameter instead of a dictionary:

from sagemaker.tensorflow.predictor import tf_serializer, tf_deserializer
from tensorflow.python.saved_model.signature_constants import DEFAULT_SERVING_SIGNATURE_DEF_KEY
from tensorflow_serving.apis import classification_pb2

def _create_feature(v):
    if type(v) == int:
        return tf.train.Feature(int64_list=tf.train.Int64List(value=[v]))
    if type(v) == str:
        return tf.train.Feature(bytes_list=tf.train.BytesList(value=[v]))
    if type(v) == float:
        return tf.train.Feature(float_list=tf.train.FloatList(value=[v]))
    raise ValueError('invalid type')

endpoint = 'my-endpoint'
predictor = RealTimePredictor(endpoint=endpoint,
                              deserializer=tf_deserializer, 
                              serializer=tf_serializer,
                              content_type='application/octet-stream')

data = {'age': 39., 'workclass': 'Private', 'fnlwgt': 77516, 'education': 'Bachelors', 
        'education_num': 13.,  'marital_status': 'Never-married', 'occupation': 'Adm-clerical', 
        'relationship': 'Husband', 'race': '', 'gender': '', 'capital_gain': 2174., 'capital_loss': 0., 
        'hours_per_week': 40., 'native_country': '',  'income_bracket': '<=50K'}

features = {k: _create_feature(v) for k, v in data.items()}

request = classification_pb2.ClassificationRequest()
request.model_spec.name = "generic_model"
request.model_spec.signature_name = DEFAULT_SERVING_SIGNATURE_DEF_KEY

example = tf.train.Example(features=tf.train.Features(feature=features))

request.input.example_list.examples.extend([example])

print(predictor.predict(request))

The tensorflow_serving classification request above will be send direct to the tensorflow serving hosted inside the hosting container.

@nkconnor
Copy link
Contributor Author

@mvsusp thanks for jumping on this and the workaround example using protos.

On a related note - is there any examples for the same use case but with a text/csv serializer? Wondering how a CSV should be formatted considering the feature specification is "unordered".

From what I can tell, it does not use a header column.

File "/opt/amazon/lib/python2.7/site-packages/tf_container/serve.py", line 184, in _default_input_fn
data = self._parse_csv_request(serialized_data)
File "/opt/amazon/lib/python2.7/site-packages/tf_container/serve.py", line 141, in _parse_csv_request
full_array = [float(i) for i in row]

Wondering if perhaps this functionality is not implemented yet on the proxy side. LMK if you would rather me open a new issue.

@iquintero
Copy link
Contributor

@nkconnor
I think opening a new issue is the best course of action for that. As of today it is not implemented on the proxy side. If you think this is something worth exploring for us I would create the new issue as a feature request and describe the use case 👍

@laurenyu
Copy link
Contributor

This was now fixed in #62, which was released as part of 1.1.0 today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants