Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: register schema within sandbox #8614

Merged
merged 16 commits into from
Feb 3, 2022
Merged

fix: register schema within sandbox #8614

merged 16 commits into from
Feb 3, 2022

Conversation

mjsax
Copy link
Member

@mjsax mjsax commented Jan 19, 2022

Closes #1394

@mjsax mjsax requested a review from a team as a code owner January 19, 2022 02:03

return LimitedProxyBuilder.forClass(SchemaRegistryClient.class)
.swallow("register", anyParams(), 123)
.forward("register", methodParams(String.class, ParsedSchema.class), sandboxSchemaRegistryCache)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "main" fix -- instead of swallowing the registration, we capture it within the sandbox.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we planning to test this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on tests... The PR is still WIP. Just opened if for early feedback.

@mjsax mjsax changed the title [WIP -- DO NOT MERGE] fix: register schema within sandbox fix: register schema within sandbox Jan 25, 2022
config/ksql-server.properties Outdated Show resolved Hide resolved
config/ksql-server.properties Outdated Show resolved Hide resolved
final ParsedSchema parsedSchema,
final int version,
final int id) {
return -1; // swallow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this swallowed whereas previous register does actual work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code swallows both: .swallow("register", anyParams(), 123)

I don't think ksqlDB actually uses this method? I just added it to make the existing unit tests pass.

@Override
public int register(final String subject, final ParsedSchema parsedSchema) {
if (subjectCache.containsKey(subject)) {
throw new IllegalStateException("Subject '" + subject + "' already in use.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this throw instead of returning existing id for the subject which is the behavior of schema registry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it not depend on the compatibility rules if updating the schema actually works? Not sure to what extend we need to fully mimic SR logic? -- I tired to keep it simple in the hope it would be sufficient. Not totally sure to be fair. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the following fail because they share same subject?
CREATE test_stream_1 (id int key, name varchar) with (kafka_topic='some_topic', partitions=1, format='avro')
CREATE test_stream_2 (id int key, name varchar, age int) with (kafka_topic='some_topic', partitions=1, format='avro')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should err on the side of false positives rather than false negatives. It'd be bad to have statements that actually work fail validation because then we might be introducing a regression (if a customer uses such statements in their workflows). In contrast, a false positive means we let some statements through validation that then fail in execution. Not great since multi-statements requests might have been partially executed, but also doesn't prevent any existing workflows.

Comment on lines 77 to 80
final int schemaId = nextSchemaId--;
subjectCache.put(subject, parsedSchema);
subjectToId.put(subject, schemaId);
idCache.put(schemaId, parsedSchema);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. What would we gain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Make maintaining the 3 caches here easier and less error prone? MockSchemaRegistryClient seems to have similar logic inside.
  2. Make behavior same as prod schema registry. For example, MockSchemaRegistryClient could have register same subject differently instead of throwing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if we can use the MockSchemaRegistryClient rather than maintaining our own caches for the newly registered subjects, that'd be nice. OTOH if it's a lot of effort I think the current approach is fine too. Let's just be careful about not throwing errors where we shouldn't be (see above).

Comment on lines 98 to 102
if (e.getStatus() == HttpStatus.SC_NOT_FOUND) {
final ParsedSchema schema = idCache.get(id);
if (schema != null) {
return schema;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put some comment why we do this?

Comment on lines 116 to 120
if (e.getStatus() == HttpStatus.SC_NOT_FOUND) {
final ParsedSchema schemaByName = subjectCache.get(subject);
final ParsedSchema schemaById = idCache.get(id);
if (schemaByName != null && schemaByName == schemaById) {
return schemaByName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 223 to 255
if (e.getStatus() == HttpStatus.SC_NOT_FOUND && subjectToId.containsKey(subject)) {
return subjectToId.get(subject);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjsax -- great test coverage! Comments inline.

@Override
public int register(final String subject, final ParsedSchema parsedSchema) {
if (subjectCache.containsKey(subject)) {
throw new IllegalStateException("Subject '" + subject + "' already in use.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should err on the side of false positives rather than false negatives. It'd be bad to have statements that actually work fail validation because then we might be introducing a regression (if a customer uses such statements in their workflows). In contrast, a false positive means we let some statements through validation that then fail in execution. Not great since multi-statements requests might have been partially executed, but also doesn't prevent any existing workflows.

Comment on lines 77 to 80
final int schemaId = nextSchemaId--;
subjectCache.put(subject, parsedSchema);
subjectToId.put(subject, schemaId);
idCache.put(schemaId, parsedSchema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 if we can use the MockSchemaRegistryClient rather than maintaining our own caches for the newly registered subjects, that'd be nice. OTOH if it's a lot of effort I think the current approach is fine too. Let's just be careful about not throwing errors where we shouldn't be (see above).

try {
return delegate.getSchemaById(id);
} catch (RestClientException e) {
// if we don't find the schema is SR, we try to get it from the sandbox cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo. Also three more occurrences of the same typo in this file.

Suggested change
// if we don't find the schema is SR, we try to get it from the sandbox cache
// if we don't find the schema in SR, we try to get it from the sandbox cache

try {
return delegate.getLatestSchemaMetadata(subject);
} catch (final RestClientException e) {
// if we don't find the schema metadata is SR, but there subject is registered inside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typos

Suggested change
// if we don't find the schema metadata is SR, but there subject is registered inside
// if we don't find the schema metadata in SR, but the subject is registered inside

@@ -0,0 +1,238 @@
/*
* Copyright 2018 Confluent Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: year

"http://foo:8080")
.withAdditionalConfig(
KsqlConfig.KSQL_SHARED_RUNTIME_ENABLED,
sharedRuntimes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we believe that shared runtimes might affect the behavior tested in this integration test? (Do we really need to test both versions of this?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just blindly copied from EndToEndIntegrationTest -- not even sure that the difference is. Happy to remove it.

"CREATE STREAM avro_input (a INT KEY, b VARCHAR)"
+ " WITH (KAFKA_TOPIC='t1', PARTITIONS=1, FORMAT='AVRO', WRAP_SINGLE_VALUE=false);"

// Then:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage of "When"/"Then" is different from how the pattern is used in other tests in the repo. "Then" is typically used for validations, but here the only validation is that the combination of statements successfully executes. Maybe we remove the "When"/"Then" and just add a comment like:

Suggested change
// Then:
// dependent statement also executes successfully

?

}

@Test
public void shouldRegisterAvroSchemaInSandboxViaCS() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we think primitive vs non-primitive schemas might be handled differently? I think we can cut the number of tests in half by only testing one of them.


@RunWith(Parameterized.class)
@Category({IntegrationTest.class})
public class DependentStatementsIntegrationTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great test coverage! Out of curiosity, did we ever add the analogous tests for statements with topic dependencies? Would be nice to do that if not (can be separate from this PR).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be covered (cf #4800)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR that fixed that issue only added unit test coverage, and no integration test coverage. I meant to say that it'd be nice to add an integration test into this new DependentStatementsIntegrationTest.java file for the topic dependency case too.

Copy link
Member

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! Some nits and please take a look at test failure.

final String... args
) {
final String formatted = format(statement, (Object[])args);
log.debug("Sending statement: {}", formatted);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove logging in test?

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjsax ! LGTM with some minor comments/questions inline.

}

private SandboxedSchemaRegistryClient() {
}

static final class SandboxSchemaRegistryCache implements SchemaRegistryClient {
private final MockSchemaRegistryClient mockedClient = new MockSchemaRegistryClient();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments explaining the difference between delegate and mockedClient here, i.e., what each is used for/represents? Without this context, this code is very difficult to understand.

We might also want to rename mockedClient to sandboxCacheClient or similar, in order to clarify within the code itself as well.

} catch (RestClientException e) {
// if we don't find the schema in SR, we try to get it from the sandbox cache
if (e.getStatus() == HttpStatus.SC_NOT_FOUND) {
return mockedClient.getSchemaById(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier version of this PR only returned from this line if the schema was present in the sandbox cache, and otherwise threw on the line below. This latest version of the PR now always returns from the mocked client representing the sandbox cache. Does the mocked client throw an error if the schema is not present? Otherwise there is a behavior change here (which might be fine, wondering if it's intentional).

Same comment/question for the other methods below too (getSchemaBySubjectAndId, getLatestSchemaMetadata, getVersion, and getId).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@RunWith(Parameterized.class)
@Category({IntegrationTest.class})
public class DependentStatementsIntegrationTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR that fixed that issue only added unit test coverage, and no integration test coverage. I meant to say that it'd be nice to add an integration test into this new DependentStatementsIntegrationTest.java file for the topic dependency case too.

@mjsax
Copy link
Member Author

mjsax commented Jan 28, 2022

Test failed a second time, but it passes locally:

Expected: is <3>
     but: was <2>
	at io.confluent.ksql.physical.scalablepush.locator.AllHostsLocatorTest.shouldLocate(AllHostsLocatorTest.java:56)

Any idea? Let's see if it fails again in the next build.

@lihaosky
Copy link
Member

Is AllHostsLocatorTest failure related to #8665? Latest build failure seems to be a different one though:

KsqlResourceFunctionalTest.shouldInsertIntoValuesForAvroTopic:201->makeKsqlRequest:267 Failed to await result.msg: Failed to insert values into 'BOOKS'.        Could not serialize key: ['Metamorphosis']

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjsax , a couple more comments inline.

The AllHostsLocatorTest failure was unrelated (see #8664). Not sure about the new one.

}

@Test
public void shouldCreateDependentTopicWithDefaultReplicationInSandbox() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't actually test a topic dependency, does it? I thought the dependency issue fixed in the previous PR was if the first CREATE STREAM creates a topic and then the second one uses the newly created topic. (Second statement should reference the topic, not the stream.)

This dependency case (CS followed by CSAS) should already be tested in other integration tests throughout the codebase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the fix was about "inherit the replication factor". The second statement creates an output topic, and was not able to create it, because it did neither inherit the replication factor from upstream topic, nor did if fetch default replication factor from Kafka.

Happy to remove this test again -- I thought you asked to add it. I can also add a different one that use two CS statements, but it also seems redundant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, huh! I totally misunderstood the original ticket for that issue then, haha. If this integration test covers the fix, then that sounds great to me. Thanks for adding it!

Copy link
Contributor

@vcrfxia vcrfxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjsax ! LGTM.

@mjsax
Copy link
Member Author

mjsax commented Jan 31, 2022

Another (different) test failure:

java.lang.AssertionError: Failed to await result.msg: Failed to insert values into 'BOOKS'. Could not serialize key: ['Metamorphosis']
	at io.confluent.ksql.rest.integration.KsqlResourceFunctionalTest.makeKsqlRequest(KsqlResourceFunctionalTest.java:267)
	at io.confluent.ksql.rest.integration.KsqlResourceFunctionalTest.shouldInsertIntoValuesForAvroTopic(KsqlResourceFunctionalTest.java:201)

Failed locally, too. Is this related to broken master ?

@mjsax mjsax merged commit ba572e0 into confluentinc:master Feb 3, 2022
@mjsax mjsax deleted the gh1394-schema-inference branch February 3, 2022 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants