Skip to content

Add support for string types in TensorFlowTransformer #2545

@zeahmed

Description

@zeahmed

There are a couple of request to support string in TensorFlowTransformer. Strings are handled quite differently in TensorFlow. Strings are variable length data. To represent string as a tensor, TensorFlow requires tensor be represented in the following way c.f.
https://github.com/tensorflow/tensorflow/blob/01cf864bb0d82370c259866c0735c0358e33377c/tensorflow/c/c_api.h#L206.

/ --------------------------------------------------------------------------
// TF_Tensor holds a multi-dimensional array of elements of a single data type.
// For all types other than TF_STRING, the data buffer stores elements
// in row major order.  E.g. if data is treated as a vector of TF_DataType:
//
//   element 0:   index (0, ..., 0)
//   element 1:   index (0, ..., 1)
//   ...
//
// The format for TF_STRING tensors is:
//   start_offset: array[uint64]
//   data:         byte[...]
//
//   The string length (as a varint), followed by the contents of the string
//   is encoded at data[start_offset[i]]]. TF_StringEncode and TF_StringDecode
//   facilitate this encoding.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions