fix: register schema within sandbox #8614

mjsax · 2022-01-19T02:03:28Z

mjsax · 2022-01-19T02:16:58Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java


    return LimitedProxyBuilder.forClass(SchemaRegistryClient.class)
-        .swallow("register", anyParams(), 123)
+        .forward("register", methodParams(String.class, ParsedSchema.class), sandboxSchemaRegistryCache)


This is the "main" fix -- instead of swallowing the registration, we capture it within the sandbox.

how are we planning to test this?

Working on tests... The PR is still WIP. Just opened if for early feedback.

config/ksql-server.properties

lihaosky · 2022-01-25T19:14:52Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+        final ParsedSchema parsedSchema,
+        final int version,
+        final int id) {
+      return -1; // swallow


Why is this swallowed whereas previous register does actual work?

Old code swallows both: .swallow("register", anyParams(), 123)

I don't think ksqlDB actually uses this method? I just added it to make the existing unit tests pass.

lihaosky · 2022-01-25T19:15:54Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+    @Override
+    public int register(final String subject, final ParsedSchema parsedSchema) {
+      if (subjectCache.containsKey(subject)) {
+        throw new IllegalStateException("Subject '" + subject + "' already in use.");


Why does this throw instead of returning existing id for the subject which is the behavior of schema registry?

Should it not depend on the compatibility rules if updating the schema actually works? Not sure to what extend we need to fully mimic SR logic? -- I tired to keep it simple in the hope it would be sufficient. Not totally sure to be fair. Thoughts?

This will make the following fail because they share same subject?
CREATE test_stream_1 (id int key, name varchar) with (kafka_topic='some_topic', partitions=1, format='avro')
CREATE test_stream_2 (id int key, name varchar, age int) with (kafka_topic='some_topic', partitions=1, format='avro')

I think we should err on the side of false positives rather than false negatives. It'd be bad to have statements that actually work fail validation because then we might be introducing a regression (if a customer uses such statements in their workflows). In contrast, a false positive means we let some statements through validation that then fail in execution. Not great since multi-statements requests might have been partially executed, but also doesn't prevent any existing workflows.

lihaosky · 2022-01-25T19:19:08Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+      final int schemaId = nextSchemaId--;
+      subjectCache.put(subject, parsedSchema);
+      subjectToId.put(subject, schemaId);
+      idCache.put(schemaId, parsedSchema);


Could you use MockSchemaRegistryClient? https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/MockSchemaRegistryClient.java

Maybe. What would we gain?

Make maintaining the 3 caches here easier and less error prone? MockSchemaRegistryClient seems to have similar logic inside.

Make behavior same as prod schema registry. For example, MockSchemaRegistryClient could have register same subject differently instead of throwing.

+1 if we can use the MockSchemaRegistryClient rather than maintaining our own caches for the newly registered subjects, that'd be nice. OTOH if it's a lot of effort I think the current approach is fine too. Let's just be careful about not throwing errors where we shouldn't be (see above).

lihaosky · 2022-01-25T19:25:36Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+        if (e.getStatus() == HttpStatus.SC_NOT_FOUND) {
+          final ParsedSchema schema = idCache.get(id);
+          if (schema != null) {
+            return schema;
+          }


Maybe put some comment why we do this?

lihaosky · 2022-01-25T19:25:46Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+        if (e.getStatus() == HttpStatus.SC_NOT_FOUND) {
+          final ParsedSchema schemaByName = subjectCache.get(subject);
+          final ParsedSchema schemaById = idCache.get(id);
+          if (schemaByName != null && schemaByName == schemaById) {
+            return schemaByName;


lihaosky · 2022-01-25T19:26:48Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+        if (e.getStatus() == HttpStatus.SC_NOT_FOUND && subjectToId.containsKey(subject)) {
+          return subjectToId.get(subject);
+        }


vcrfxia

Thanks @mjsax -- great test coverage! Comments inline.

vcrfxia · 2022-01-26T17:12:14Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+    @Override
+    public int register(final String subject, final ParsedSchema parsedSchema) {
+      if (subjectCache.containsKey(subject)) {
+        throw new IllegalStateException("Subject '" + subject + "' already in use.");


I think we should err on the side of false positives rather than false negatives. It'd be bad to have statements that actually work fail validation because then we might be introducing a regression (if a customer uses such statements in their workflows). In contrast, a false positive means we let some statements through validation that then fail in execution. Not great since multi-statements requests might have been partially executed, but also doesn't prevent any existing workflows.

vcrfxia · 2022-01-26T17:12:51Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+      final int schemaId = nextSchemaId--;
+      subjectCache.put(subject, parsedSchema);
+      subjectToId.put(subject, schemaId);
+      idCache.put(schemaId, parsedSchema);


+1 if we can use the MockSchemaRegistryClient rather than maintaining our own caches for the newly registered subjects, that'd be nice. OTOH if it's a lot of effort I think the current approach is fine too. Let's just be careful about not throwing errors where we shouldn't be (see above).

vcrfxia · 2022-01-26T17:13:35Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+      try {
+        return delegate.getSchemaById(id);
+      } catch (RestClientException e) {
+        // if we don't find the schema is SR, we try to get it from the sandbox cache


nit: typo. Also three more occurrences of the same typo in this file.

Suggested change

// if we don't find the schema is SR, we try to get it from the sandbox cache

// if we don't find the schema in SR, we try to get it from the sandbox cache

vcrfxia · 2022-01-26T17:13:47Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+      try {
+        return delegate.getLatestSchemaMetadata(subject);
+      } catch (final RestClientException e) {
+        // if we don't find the schema metadata is SR, but there subject is registered inside


nit: typos

Suggested change

// if we don't find the schema metadata is SR, but there subject is registered inside

// if we don't find the schema metadata in SR, but the subject is registered inside

vcrfxia · 2022-01-26T17:13:53Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

@@ -0,0 +1,238 @@
+/*
+ * Copyright 2018 Confluent Inc.


vcrfxia · 2022-01-26T17:14:22Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+            "http://foo:8080")
+        .withAdditionalConfig(
+            KsqlConfig.KSQL_SHARED_RUNTIME_ENABLED,
+            sharedRuntimes)


Why do we believe that shared runtimes might affect the behavior tested in this integration test? (Do we really need to test both versions of this?)

I just blindly copied from EndToEndIntegrationTest -- not even sure that the difference is. Happy to remove it.

vcrfxia · 2022-01-26T17:16:26Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+      "CREATE STREAM avro_input (a INT KEY, b VARCHAR)"
+        + " WITH (KAFKA_TOPIC='t1', PARTITIONS=1, FORMAT='AVRO', WRAP_SINGLE_VALUE=false);"
+
+      // Then:


This usage of "When"/"Then" is different from how the pattern is used in other tests in the repo. "Then" is typically used for validations, but here the only validation is that the combination of statements successfully executes. Maybe we remove the "When"/"Then" and just add a comment like:

Suggested change

// Then:

// dependent statement also executes successfully

?

vcrfxia · 2022-01-26T17:17:05Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+  }
+
+  @Test
+  public void shouldRegisterAvroSchemaInSandboxViaCS() {


Why do we think primitive vs non-primitive schemas might be handled differently? I think we can cut the number of tests in half by only testing one of them.

vcrfxia · 2022-01-26T17:17:51Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+
+@RunWith(Parameterized.class)
+@Category({IntegrationTest.class})
+public class DependentStatementsIntegrationTest {


Great test coverage! Out of curiosity, did we ever add the analogous tests for statements with topic dependencies? Would be nice to do that if not (can be separate from this PR).

Should be covered (cf #4800)?

The PR that fixed that issue only added unit test coverage, and no integration test coverage. I meant to say that it'd be nice to add an integration test into this new DependentStatementsIntegrationTest.java file for the topic dependency case too.

lihaosky

LGTM overall! Some nits and please take a look at test failure.

lihaosky · 2022-01-26T21:25:41Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+      final String... args
+  ) {
+    final String formatted = format(statement, (Object[])args);
+    log.debug("Sending statement: {}", formatted);


nit: remove logging in test?

ksqldb-engine/src/test/java/io/confluent/ksql/services/SandboxedSchemaRegistryClientTest.java

vcrfxia

Thanks @mjsax ! LGTM with some minor comments/questions inline.

vcrfxia · 2022-01-27T15:43:22Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

  }

  private SandboxedSchemaRegistryClient() {
  }
+
+  static final class SandboxSchemaRegistryCache implements SchemaRegistryClient {
+    private final MockSchemaRegistryClient mockedClient = new MockSchemaRegistryClient();


Can we add comments explaining the difference between delegate and mockedClient here, i.e., what each is used for/represents? Without this context, this code is very difficult to understand.

We might also want to rename mockedClient to sandboxCacheClient or similar, in order to clarify within the code itself as well.

vcrfxia · 2022-01-27T15:48:38Z

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

+      } catch (RestClientException e) {
+        // if we don't find the schema in SR, we try to get it from the sandbox cache
+        if (e.getStatus() == HttpStatus.SC_NOT_FOUND) {
+          return mockedClient.getSchemaById(id);


The earlier version of this PR only returned from this line if the schema was present in the sandbox cache, and otherwise threw on the line below. This latest version of the PR now always returns from the mocked client representing the sandbox cache. Does the mocked client throw an error if the schema is not present? Otherwise there is a behavior change here (which might be fine, wondering if it's intentional).

Same comment/question for the other methods below too (getSchemaBySubjectAndId, getLatestSchemaMetadata, getVersion, and getId).

It also throws if not found: https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/MockSchemaRegistryClient.java#L123-L143.

vcrfxia · 2022-01-27T15:49:19Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+
+@RunWith(Parameterized.class)
+@Category({IntegrationTest.class})
+public class DependentStatementsIntegrationTest {


The PR that fixed that issue only added unit test coverage, and no integration test coverage. I meant to say that it'd be nice to add an integration test into this new DependentStatementsIntegrationTest.java file for the topic dependency case too.

mjsax · 2022-01-28T03:44:28Z

Test failed a second time, but it passes locally:

Expected: is <3>
     but: was <2>
	at io.confluent.ksql.physical.scalablepush.locator.AllHostsLocatorTest.shouldLocate(AllHostsLocatorTest.java:56)

Any idea? Let's see if it fails again in the next build.

lihaosky · 2022-01-28T06:42:31Z

Is AllHostsLocatorTest failure related to #8665? Latest build failure seems to be a different one though:

KsqlResourceFunctionalTest.shouldInsertIntoValuesForAvroTopic:201->makeKsqlRequest:267 Failed to await result.msg: Failed to insert values into 'BOOKS'.        Could not serialize key: ['Metamorphosis']

vcrfxia

Thanks @mjsax , a couple more comments inline.

The AllHostsLocatorTest failure was unrelated (see #8664). Not sure about the new one.

ksqldb-engine/src/main/java/io/confluent/ksql/services/SandboxedSchemaRegistryClient.java

vcrfxia · 2022-01-28T16:27:12Z

...b-engine/src/test/java/io/confluent/ksql/integration/DependentStatementsIntegrationTest.java

+  }
+
+  @Test
+  public void shouldCreateDependentTopicWithDefaultReplicationInSandbox() {


This test doesn't actually test a topic dependency, does it? I thought the dependency issue fixed in the previous PR was if the first CREATE STREAM creates a topic and then the second one uses the newly created topic. (Second statement should reference the topic, not the stream.)

This dependency case (CS followed by CSAS) should already be tested in other integration tests throughout the codebase.

No, the fix was about "inherit the replication factor". The second statement creates an output topic, and was not able to create it, because it did neither inherit the replication factor from upstream topic, nor did if fetch default replication factor from Kafka.

Happy to remove this test again -- I thought you asked to add it. I can also add a different one that use two CS statements, but it also seems redundant?

Oh, huh! I totally misunderstood the original ticket for that issue then, haha. If this integration test covers the fix, then that sounds great to me. Thanks for adding it!

vcrfxia

Thanks @mjsax ! LGTM.

mjsax · 2022-01-31T18:03:37Z

Another (different) test failure:

java.lang.AssertionError: Failed to await result.msg: Failed to insert values into 'BOOKS'. Could not serialize key: ['Metamorphosis']
	at io.confluent.ksql.rest.integration.KsqlResourceFunctionalTest.makeKsqlRequest(KsqlResourceFunctionalTest.java:267)
	at io.confluent.ksql.rest.integration.KsqlResourceFunctionalTest.shouldInsertIntoValuesForAvroTopic(KsqlResourceFunctionalTest.java:201)

Failed locally, too. Is this related to broken master ?

…edSchemaRegistryClient.java Co-authored-by: Victoria Xia <victoria.f.xia281@gmail.com>

mjsax requested a review from a team as a code owner January 19, 2022 02:03

mjsax commented Jan 19, 2022

View reviewed changes

suhas-satish approved these changes Jan 20, 2022

View reviewed changes

mjsax changed the title ~~[WIP -- DO NOT MERGE] fix: register schema within sandbox~~ fix: register schema within sandbox Jan 25, 2022

mjsax commented Jan 25, 2022

View reviewed changes

config/ksql-server.properties Outdated Show resolved Hide resolved

mjsax commented Jan 25, 2022

View reviewed changes

config/ksql-server.properties Outdated Show resolved Hide resolved

lihaosky reviewed Jan 25, 2022

View reviewed changes

vcrfxia reviewed Jan 26, 2022

View reviewed changes

lihaosky reviewed Jan 26, 2022

View reviewed changes

vcrfxia reviewed Jan 27, 2022

View reviewed changes

vcrfxia reviewed Jan 28, 2022

View reviewed changes

vcrfxia approved these changes Jan 28, 2022

View reviewed changes

mjsax and others added 14 commits January 31, 2022 10:33

fix: register schema within sandbox

73a671d

add tests

ab642a9

add file

7571b5d

fix test

7e81ef9

foo

9c91f11

add comments

008a713

fix tests

d05a202

github comments

3d53b4b

use mocked-sr-client

751a36f

fix checkstyle

87c76ff

Github comments

4d6f379

Github comments

e16da90

Update ksqldb-engine/src/main/java/io/confluent/ksql/services/Sandbox…

15a4045

…edSchemaRegistryClient.java Co-authored-by: Victoria Xia <victoria.f.xia281@gmail.com>

Update ksqldb-engine/src/main/java/io/confluent/ksql/services/Sandbox…

17b0659

…edSchemaRegistryClient.java Co-authored-by: Victoria Xia <victoria.f.xia281@gmail.com>

mjsax added 2 commits January 31, 2022 10:33

Github comment + rebased

d349301

fix test

d301376

mjsax merged commit ba572e0 into confluentinc:master Feb 3, 2022

mjsax deleted the gh1394-schema-inference branch February 3, 2022 03:00

	// if we don't find the schema is SR, we try to get it from the sandbox cache
	// if we don't find the schema in SR, we try to get it from the sandbox cache

	// if we don't find the schema metadata is SR, but there subject is registered inside
	// if we don't find the schema metadata in SR, but the subject is registered inside

fix: register schema within sandbox #8614

fix: register schema within sandbox #8614

Conversation

mjsax commented Jan 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcrfxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lihaosky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcrfxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjsax commented Jan 28, 2022

lihaosky commented Jan 28, 2022

vcrfxia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vcrfxia left a comment

Choose a reason for hiding this comment

mjsax commented Jan 31, 2022