Skip to content

[Task]: Python schema generated types uses schema registry and coder registry #37893

@Abacn

Description

@Abacn

What needs to happen?

Currently a Python Row (encoded with Row Coder) go through serialization/deserialization becomes a schema generated types named tuple. There are many caveats for this behavior

with cloudpickle becomes default and schema registry coder registry saved on pipeline submission, we should be able to use the schema id registered in the schema registry to obtain the user type, then use coder registry for the user type to get registered (row) coder, that makes user_type->GBK still produces user_type

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions