Skip to content

Add support for mapping additional structured types to Python Schemas #19938

@damccorm

Description

@damccorm

Currently we can convert between a NamedTuple type and its Schema protos using named_tuple_from_schema and named_tuple_to_schema. I'd like to introduce a system to support additional types, starting with structured types like attrs, dataclasses, and TypedDict.

I've only just started digesting the code, but this task seems pretty straightforward. For example, I think the type-to-schema code would look roughly like this:


def typing_to_runner_api(type_):
  # type: (Type) -> schema_pb2.FieldType
  structured_handler =
_get_structured_handler(type_)
  if structured_handler:
    schema = None
    if hasattr(type_, 'id'):

     schema = SCHEMA_REGISTRY.get_schema_by_id(type_.id)
    if schema is None:
      fields = structured_handler.get_fields()

     type_id = str(uuid4())
      schema = schema_pb2.Schema(fields=fields, id=type_id)
      SCHEMA_REGISTRY.add(type_,
schema)

    return schema_pb2.FieldType(
        row_type=schema_pb2.RowType(
            schema=schema))


The rest of the work would be in implementing a class hierarchy for working with structured types, such as getting a list of fields from an instance, and instantiation from a list of fields. Eventually we can extend this behavior to arbitrary, unstructured types.  

Going in the schema-to-type direction, we have the problem of choosing which type to use for a given schema. I believe that as long as typing_to_runner_api() has been called on our structured type in the current python session, it should be added to the registry and thus round trip ok, so I think we just need a public function for registering schemas for structured types.

[~bhulette] Did you want to tackle this or are you ok with me going after it?

 

Imported from Jira BEAM-8732. Original Jira may contain additional context.
Reported by: chadrik.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions