[FLINK-33045] Make it possible to disable auto-registering schema in Schema Registry #26662

MartijnVisser · 2025-06-10T16:55:25Z

What is the purpose of the change

This PR is based on #25410 and aims to complete the necessary tasks. It introduces auto.register.schemas as a table option. Compared to the linked PR, it includes unit tests, a new IT case and updated documentation

Brief change log

Introduces new table option auto.register.schemas
Adds unit tests
Adds a new AvroConfluentITCase
Removes previous (currently disabled) bash-based tests
It also bumps certain dependencies

Verifying this change

This change added tests and can be verified as follows:

Run AvroConfluentITCase that writes and reads from/to Kafka using avro-confluent, with the table option set to true (default) to show that Flink can register the schema and false where it relies on schema registration outside of Flink

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): yes
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? docs / JavaDocs

flinkbot · 2025-06-10T17:01:37Z

CI report:

c2d944a Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

MartijnVisser · 2025-06-10T18:04:08Z

@flinkbot run azure

MartijnVisser · 2025-06-11T05:57:55Z

@flinkbot run azure

davidradl · 2025-06-11T08:26:59Z

...main/java/org/apache/flink/formats/avro/registry/confluent/ConfluentSchemaRegistryCoder.java

@@ -58,7 +77,7 @@ public ConfluentSchemaRegistryCoder(String subject, SchemaRegistryClient schemaR
     * @param schemaRegistryClient client to connect schema registry
     */
    public ConfluentSchemaRegistryCoder(SchemaRegistryClient schemaRegistryClient) {
-        this.schemaRegistryClient = schemaRegistryClient;
+        this(null, schemaRegistryClient, null);


nit: it is a bit strange to have an optional first parameter I would expect the mandatory parameters first for all these methods ie the constructors to all start
ConfluentSchemaRegistryCoder(SchemaRegistryClient schemaRegistryClient ...
I realise this is existing code, so it is your call if you want to change it

That's a valid nit, but the original commit that I took over had that changed already and is in this PR, see

flink/flink-formats/flink-avro-confluent-registry/src/main/java/org/apache/flink/formats/avro/registry/confluent/ConfluentSchemaRegistryCoder.java

Line 70 in 0b8f50b

this(subject, schemaRegistryClient, null);

docs/content.zh/docs/connectors/table/formats/avro-confluent.md

davidradl · 2025-06-11T08:32:48Z

docs/content/docs/connectors/table/formats/avro-confluent.md

+        <tr>
+            <td><h5>auto.register.schemas</h5></td>
+            <td>optional</td>
+            <td>yes</td>


why is this line here but not in the Chinese?

Because I don't know Chinese :) So I'll follow the process that's at https://flink.apache.org/how-to-contribute/contribute-documentation/#chinese-documentation-translation after this PR is merged

I had assumed we would just add here as the optional above is not translated. <td>yes</td> as per the note in the link you sent. But it sounds like you have this in hand.

davidradl · 2025-06-11T08:34:21Z

flink-python/pom.xml

@@ -289,7 +289,7 @@ under the License.
 			<!-- Indirectly accessed in pyflink_gateway_server -->
 			<groupId>org.apache.flink</groupId>
 			<artifactId>flink-sql-connector-kafka</artifactId>
-			<version>3.0.0-1.17</version>
+			<version>4.0.0-2.0</version>


is this related to the fix?

It's about keeping versions in sync. Ideally the Python dependencies on connector versions should be out of the Flink repo and moved into the connector repos, but we're not there yet

… in registry

fapaul

Thanks for working on this feature. It's a great quality of life improvement.

fapaul · 2025-06-20T08:35:00Z

flink-end-to-end-tests/flink-confluent-schema-registry/pom.xml

+		<dependency>
+			<groupId>io.confluent</groupId>
+			<artifactId>kafka-avro-serializer</artifactId>
+			<version>7.2.2</version>


Afaik the serializer is versioned similarly to kafka. 7.2.2 should be Kafka 3.2. Can we upgrade the serializer to 7.9.0 to be inline with the used kafka version 3.9.0?

Yes, good one!

fapaul · 2025-06-20T08:36:11Z

...k-end-to-end-tests/flink-confluent-schema-registry/src/main/resources/avro/input-record.avsc

+  "type": "record",
+  "name": "record",
+  "fields": [
+    {"name": "name", "type": ["null", "string"], "default": null},


Nit: Why did you change the used type of the example? Is this related to registration?

Disabling auto registration only means that Flink won't try to register the schema in Schema Registry during every run. However, it still means that the schema that has been registered in Schema Registry by external service, is either exactly how Flink would have registered it (so with that specific namespace), or how the user would have provided it via the avro-confluent.schema table property.

fapaul · 2025-06-20T08:38:53Z

...schema-registry/src/test/java/org/apache/flink/schema/registry/test/AvroConfluentITCase.java

+                            "SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS",
+                            INTER_CONTAINER_KAFKA_ALIAS + ":9092")
+                    .dependsOn(KAFKA);
+


You can use

Suggested change

@RegisterExtension

to avoid manually starting the Flink Cluster and let Junit handle the lifecycle.

fapaul · 2025-06-20T08:39:49Z

...schema-registry/src/test/java/org/apache/flink/schema/registry/test/AvroConfluentITCase.java

+    @BeforeAll
+    public static void setup() throws Exception {
+        KAFKA.start();
+        SCHEMA_REGISTRY.start();
+        FLINK.start();
+    }
+
+    @AfterAll
+    public static void tearDown() {
+        FLINK.stop();
+        SCHEMA_REGISTRY.stop();
+        KAFKA.stop();
+    }


These methods should be obsolete since for the Kafka/SR container the lifecycle is already handled by using the @Container annotation and for the Flink container see the commend above.

fapaul · 2025-06-20T08:43:16Z

...schema-registry/src/test/java/org/apache/flink/schema/registry/test/AvroConfluentITCase.java

+    }
+
+    @Test
+    public void testAvroConfluentIntegrationWithManualRegister() throws Exception {


I would have expected to see a test that runs a query and fails, if the schema isn't registered.

The current test IMO doesn't fully cover the behavior since it can also pass if the SQL query does the registration.

fapaul · 2025-06-20T08:47:20Z

...k-end-to-end-tests/flink-confluent-schema-registry/src/test/resources/log4j2-test.properties

+
+# Set root logger level to OFF to not flood build logs
+# set manually to INFO for debugging purposes
+rootLogger.level = INFO


Please set this to OFF before merging

fapaul · 2025-06-20T08:48:36Z

flink-end-to-end-tests/flink-sql-client-test/pom.xml

 		<dependency>
 			<groupId>org.apache.kafka</groupId>
 			<artifactId>kafka-clients</artifactId>
-			<version>3.2.3</version>
+			<version>3.9.0</version>


Nit: SInce you also upgraded the FLink connector to be compatible with Kafka 4.0, can you also use kafka 4.0 here?

I don't think we should, given that the Flink Kafka connector v4.0 itself refers on Kafka Client 3.9.0 https://github.com/apache/flink-connector-kafka/blob/v4.0.0/pom.xml#L53

fapaul · 2025-06-20T08:50:41Z

...main/java/org/apache/flink/formats/avro/registry/confluent/ConfluentSchemaRegistryCoder.java

+        out.write(schemaIdBytes);
+    }
+
+    private boolean registerSchema() {


Why is registerSchema a method, instead of parsing the value once in the ctor?

fapaul · 2025-06-20T08:52:38Z

...main/java/org/apache/flink/formats/avro/registry/confluent/ConfluentSchemaRegistryCoder.java

+            try {
+                registeredId = schemaRegistryClient.getId(subject, schema);
+            } catch (RestClientException e) {
+                throw new IOException("Could not retrieve schema in registry", e);


Does it throw an IOException if the schema is not present?

Nit: Can we maybe throw a better exception e.g. FlinkException to make sure this is "expected" on not found schema?

fapaul · 2025-06-20T08:55:25Z

...est/java/org/apache/flink/formats/avro/registry/confluent/RegistryAvroFormatFactoryTest.java

@@ -229,6 +229,74 @@ public void testSerializationSchemaWithInvalidOptionalSchema() {
                                null, SCHEMA.toPhysicalRowDataType()));
    }

+    @Test


Nit: Can we turn the three (or at least the first/third) tests into a parameterizedTest to avoid the code duplication?

It's a bit unclear to me why only test two verifies the behavior for sinks and sources while the other tests only look at sinks.

MartijnVisser force-pushed the FLINK-33045_support-auto-register-schema branch from 94811df to ba6bd2e Compare June 11, 2025 06:52

davidradl reviewed Jun 11, 2025

View reviewed changes

docs/content.zh/docs/connectors/table/formats/avro-confluent.md Show resolved Hide resolved

davidradl reviewed Jun 11, 2025

View reviewed changes

MartijnVisser force-pushed the FLINK-33045_support-auto-register-schema branch from 59f3a77 to 7304dde Compare June 11, 2025 18:03

parisni and others added 6 commits June 12, 2025 09:25

[FLINK-33045] format avro-confluent - disable auto-registering schema…

e729474

… in registry

[FLINK-33045] Add unit tests

2c69a18

[FLINK-33045] Add AvroConfluentITCase

3aae1b1

[FLINK-33045] Add documentation

7e92d89

[FLINK-33045] Delete Bash-based tests

a750e2a

[FLINK-33045] Improve documentation

0b8f50b

MartijnVisser force-pushed the FLINK-33045_support-auto-register-schema branch from 7304dde to 0b8f50b Compare June 12, 2025 07:39

[FLINK-33045] Spotless

c2d944a

fapaul reviewed Jun 20, 2025

View reviewed changes

[FLINK-33045] Make it possible to disable auto-registering schema in Schema Registry #26662

Are you sure you want to change the base?

[FLINK-33045] Make it possible to disable auto-registering schema in Schema Registry #26662

Uh oh!

Conversation

MartijnVisser commented Jun 10, 2025

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

MartijnVisser commented Jun 10, 2025

Uh oh!

MartijnVisser commented Jun 11, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidradl Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fapaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MartijnVisser Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fapaul Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flinkbot commented Jun 10, 2025 •

edited

Loading

davidradl Jun 12, 2025 •

edited

Loading

MartijnVisser Jun 20, 2025 •

edited

Loading

fapaul Jun 20, 2025 •

edited

Loading