[BEAM-9384] Add SchemaRegistry.getSchemaCoder to get SchemaCoders for registered types #10974

iemejia · 2020-02-26T07:59:37Z

reuvenlax · 2020-02-26T17:24:20Z

sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaRegistry.java

+        getSchema(typeDescriptor),
+        typeDescriptor,
+        getToRowFunction(typeDescriptor),
+        getFromRowFunction(typeDescriptor));


This looks fine, but can you explain the use case? The goal was for users to never have to deal with SchemaCoder (the fact that schemas are implemented via a special coder should be a Beam implementation details), but I'd understand if we have cases where the coder is still needed.

See #10978 for a concrete case. PTransform authors may benefit of being able to infer a SchemaCoder for a given type from the SchemaRegistry.
We can even add that too to PubsubIO to read JavaBeans and be able to query them with Beam's SQL.

I agree with you in the fact that SchemaCoder is an internal detail that regular users (authors of Pipelines) should not care about. I felt tempted to mark this method as @Internal however PTransform authors (e.g. IO authors) will find this useful (as I did for the PR I mention above for KafkaIO schema support), so probably worth to let it available, also I cannot think of a better place to put this method than here.

It's useful when integrating schema code with code that does not yet understand schemas.

In the KafkaIO example I think that the ideal solution would be to allow a Schema on KafkaRecord (this probably requires us to add Java generic type awareness to schema inference though), in which case they keyCoder and valueCoder isn't needed. I agree that allowing easy inference of SchemaCoder allows for lower-effort integration of schemas in code like this, though hopefully this is just a short-term solution.

Interesting. I have not thought about making KafkaRecord 'schema' like good point. There are some consequences on that that are still not clear to me (like how will we deal with the runtime resolution part of Schemas for KV that we do now with the Confluent Schema Registry support. I am going to give it a try and ping you once I have something in the other PR #10978. Let's continue that discussion there.

mwalenia · 2020-02-27T09:03:09Z

Run seed job

mwalenia · 2020-02-27T09:29:26Z

retest this please

mwalenia · 2020-02-27T09:30:16Z

I thought retesting will clear the nonexistent job :(

iemejia · 2020-02-27T09:39:29Z

@mwalenia I just rebased and push forced to see the unavailable Java11 job dissapear, thanks for the tip!

… registered types

iemejia · 2020-02-27T14:59:31Z

sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/SchemaCoder.java

@@ -59,7 +60,7 @@
 @Experimental(Kind.SCHEMAS)
 public class SchemaCoder<T> extends CustomCoder<T> {
  // This contains a map of primitive types to their coders.
-  public static final ImmutableMap<TypeName, Coder> CODER_MAP =
+  public static final Map<TypeName, Coder> CODER_MAP =


This change is required because we should not be exposing Guava (even our own vendored version) in the public API and it was surfaced by a GcpCoreApiSurfaceTest break

iemejia · 2020-02-28T15:13:59Z

Finally the tests are green. Can you PTAL @reuvenlax .

reuvenlax · 2020-02-29T16:28:22Z

lgtm

iemejia requested a review from aromanenko-dev February 26, 2020 07:59

probot-autolabeler bot added the java label Feb 26, 2020

iemejia force-pushed the BEAM-9384-schemaregistry-getschemacoder branch from 61f1c49 to afe344b Compare February 26, 2020 08:03

iemejia mentioned this pull request Feb 26, 2020

[WIP][BEAM-7336] Add schema inferring for KafkaIO when reading Avro values #10966

Closed

4 tasks

iemejia force-pushed the BEAM-9384-schemaregistry-getschemacoder branch from afe344b to e6e4a11 Compare February 26, 2020 17:21

iemejia requested a review from reuvenlax February 26, 2020 17:22

reuvenlax reviewed Feb 26, 2020

View reviewed changes

iemejia force-pushed the BEAM-9384-schemaregistry-getschemacoder branch from e6e4a11 to 17d8c10 Compare February 27, 2020 09:38

[BEAM-9384] Add SchemaRegistry.getSchemaCoder to get SchemaCoders for…

612d3d1

… registered types

iemejia force-pushed the BEAM-9384-schemaregistry-getschemacoder branch from 17d8c10 to 612d3d1 Compare February 27, 2020 14:57

iemejia commented Feb 27, 2020

View reviewed changes

iemejia merged commit 116c5e8 into apache:master Mar 1, 2020

iemejia deleted the BEAM-9384-schemaregistry-getschemacoder branch March 1, 2020 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-9384] Add SchemaRegistry.getSchemaCoder to get SchemaCoders for registered types #10974

[BEAM-9384] Add SchemaRegistry.getSchemaCoder to get SchemaCoders for registered types #10974

iemejia commented Feb 26, 2020 •

edited

reuvenlax Feb 26, 2020

iemejia Feb 26, 2020

iemejia Feb 27, 2020

reuvenlax Feb 29, 2020

iemejia Mar 1, 2020

mwalenia commented Feb 27, 2020

mwalenia commented Feb 27, 2020

mwalenia commented Feb 27, 2020

iemejia commented Feb 27, 2020

iemejia Feb 27, 2020 •

edited

iemejia commented Feb 28, 2020

reuvenlax commented Feb 29, 2020

[BEAM-9384] Add SchemaRegistry.getSchemaCoder to get SchemaCoders for registered types #10974

[BEAM-9384] Add SchemaRegistry.getSchemaCoder to get SchemaCoders for registered types #10974

Conversation

iemejia commented Feb 26, 2020 • edited

reuvenlax Feb 26, 2020

Choose a reason for hiding this comment

iemejia Feb 26, 2020

Choose a reason for hiding this comment

iemejia Feb 27, 2020

Choose a reason for hiding this comment

reuvenlax Feb 29, 2020

Choose a reason for hiding this comment

iemejia Mar 1, 2020

Choose a reason for hiding this comment

mwalenia commented Feb 27, 2020

mwalenia commented Feb 27, 2020

mwalenia commented Feb 27, 2020

iemejia commented Feb 27, 2020

iemejia Feb 27, 2020 • edited

Choose a reason for hiding this comment

iemejia commented Feb 28, 2020

reuvenlax commented Feb 29, 2020

iemejia commented Feb 26, 2020 •

edited

iemejia Feb 27, 2020 •

edited