Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-15307: update/note deprecated configs #14360

Merged
merged 20 commits into from May 10, 2024

Conversation

Cerchie
Copy link
Contributor

@Cerchie Cerchie commented Sep 8, 2023

First pass at identifying deprecated configs listed in https://kafka.apache.org/21/documentation/streams/developer-guide/config-streams

There might be some discussion to be had regarding what to say about why these are deprecated.

Also, in the StreamsConfig.java line 1355, some logic throws errors when EXACTLY_ONCE, EXACTLY_ONCE_BETA, and RETRIES are used. Should that be updated as well, considering that there are new pieces of config with the @Deprecated annotation?

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@@ -259,7 +259,7 @@ <h4><a class="toc-backref" href="#id23">num.standby.replicas</a><a class="header
</tr>
<tr class="row-even"><td>cache.max.bytes.buffering</td>
<td>Medium</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads.</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads. Deprecated.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add "Deprecated" to the "head line" in L260 instead? (Similar below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! done

@mjsax
Copy link
Member

mjsax commented Sep 12, 2023

There might be some discussion to be had regarding what to say about why these are deprecated.

Not sure if we need to put this into the docs? If we want to say something about it, we could add a link to the corresponding KIP, ie, "Deprecated via KIP-xxx" -- the KIP will contain the necessary information.

What we should do in any case is, link to configs that should be used instead of the deprecated ones (seems we cannot do this in this PR though, as some new config are still missing in the doc -- or we add them in this PR directly)

Also, in the StreamsConfig.java line 1355, some logic throws errors when EXACTLY_ONCE, EXACTLY_ONCE_BETA, and RETRIES are used. Should that be updated as well, considering that there are new pieces of config with the @deprecated annotation?

Yes, very good idea to extend this code and also log an error if other deprecated configs are used.

@Cerchie
Copy link
Contributor Author

Cerchie commented Sep 12, 2023

There might be some discussion to be had regarding what to say about why these are deprecated.

Not sure if we need to put this into the docs? If we want to say something about it, we could add a link to the corresponding KIP, ie, "Deprecated via KIP-xxx" -- the KIP will contain the necessary information.

What we should do in any case is, link to configs that should be used instead of the deprecated ones (seems we cannot do this in this PR though, as some new config are still missing in the doc -- or we add them in this PR directly)

Also, in the StreamsConfig.java line 1355, some logic throws errors when EXACTLY_ONCE, EXACTLY_ONCE_BETA, and RETRIES are used. Should that be updated as well, considering that there are new pieces of config with the @deprecated annotation?

Yes, very good idea to extend this code and also log an error if other deprecated configs are used.

happy to add those new config in this PR -- shall I also add that StreamsConfig.java logic update in this PR, or separate it out since it's a code update rather than a docs one?

@mjsax
Copy link
Member

mjsax commented Sep 13, 2023

happy to add those new config in this PR

Great.

shall I also add that StreamsConfig.java logic update in this PR, or separate it out since it's a code update rather than a docs one?

Guess both ways work, but I tend to think splitting out a new PR might be better.

@Cerchie
Copy link
Contributor Author

Cerchie commented Sep 25, 2023

happy to add those new config in this PR

Great.

shall I also add that StreamsConfig.java logic update in this PR, or separate it out since it's a code update rather than a docs one?

Guess both ways work, but I tend to think splitting out a new PR might be better.

Here's the new PR: #14448

Copy link

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions bot added the stale Stale PRs label Dec 25, 2023
@Cerchie Cerchie requested a review from mjsax January 16, 2024 18:40
@github-actions github-actions bot removed the stale Stale PRs label Jan 17, 2024
<tr class="row-even"><td>cache.max.bytes</td>
<td>Medium</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads. Note that at the debug level you can use cache.size to monitor the actual size of the Kafka Streams cache.</td>
<td>10485760 bytes</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<td>10485760 bytes</td>
<td>10485760</td>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description already say Maximum number of memory bytes so seems we can drop bytes here?

<tr class="row-even"><td>cache.max.bytes.buffering</td>
<tr class="row-even"><td>cache.max.bytes</td>
<td>Medium</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads. Note that at the debug level you can use cache.size to monitor the actual size of the Kafka Streams cache.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads. Note that at the debug level you can use cache.size to monitor the actual size of the Kafka Streams cache.</td>
<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads. Note that at the debug level you can use <code>cache.size</code> to monitor the actual size of the Kafka Streams cache.</td>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% sue what

Note that at the debug level you can use cache.size to monitor the actual size of the Kafka Streams cache

actually mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not done it myself, basing it on what is said in the release https://www.confluent.io/blog/apache-kafka-3-4-0-new-features-and-updates/

@@ -257,7 +258,12 @@ <h4><a class="toc-backref" href="#id23">num.standby.replicas</a><a class="header
<td colspan="2">The maximum number of records to buffer per partition.</td>
<td>1000</td>
</tr>
<tr class="row-even"><td>cache.max.bytes.buffering</td>
<tr class="row-even"><td>cache.max.bytes</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tr class="row-even"><td>cache.max.bytes</td>
<tr class="row-even"><td>statestore.cache.max.bytes</td>

Serde for the inner class of a windowed record. Must implement the org.apache.kafka.common.serialization.Serde interface.
</p>
<p>
Note that setting this config in KafkaStreams application would result in an error as it is meant to be used only from Plain consumer client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence confuses me. I realize that it's also in the Java code, but not sure what it actually means? Maybe @ableegoldman who reviewed the original PR can help out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bear with me here because it's kind of a long story in full and I only half remember it...and also this statement is only half true. A more accurate statement would be something like this:

"Note that this config is only used by plain consumer/producer clients that set a windowed de/serializer type via configs. For Kafka Streams applications that deal with windowed types you must pass in the inner serde type when you instantiate the windowed serde object for your topology"

In other words, there's no concept of a "default windowed serde" type in Streams because you always have to pass in the actual Serde object, and need to pass in the window size and inner serde type at that time. The same can also be done for plain producer/consumer clients but someone pointed out that the console clients have to be configured via properties, which means they can only invoke the default constructor based on the class name passed in for the key/value de/serializer configs. Which meant we needed to add the window size and inner serde type as configs that could be used to configure a windowed Serde that was instantiated via default constructor

I guess we should update the docs string in StreamsConfig as well. I assume this error slipped in because of the window.size config, which is only needed for the consumer client (since the producer client throws away the window size when encoding a windowed type). But both producer and consumer need to know the inner serde type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Sophie! Changed the sentence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sophie.

Note that this config is only used by plain consumer/producer clients

For this case, why are we documenting is in KS docs -- should it not be in clients docs? (Also, this applies to window.size.ms introduced in 2.8 via https://cwiki.apache.org/confluence/display/KAFKA/KIP-659%3A+Improve+TimeWindowedDeserializer+and+TimeWindowedSerde+to+handle+window+size, right, but not to windowed.inner.serde.class which is a KS config added in 3.0 via https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177047930

In the end, KS module provides the window-serdes and thus it does not make sense to add window.size.ms to ClientConfig -- especially, as you pointed out, because is only necessary for console consumer.

Thus, while StreamsConfig#WINDOW_SIZE_MS_CONFIG must exist as a variable name, I am wondering if it's actually correct that we added it as a StreamsConfig, ie, via define(...)? Mabye we should do a small KIP and remove it? -- For use, we should not mention window.size.ms in KS docs on the web-page (at least not for "top level config" -- we should either add it to a "windowed serde" section, or to (console) consumer config section where it belong to)?

window.inner.serde.class is a KS config and should just be documented in the regular way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to the blog post I don't know where the cache.size metric is coming from-- because in the quoted KIP-770 it appears to be cache-size-bytes-total .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess window.size.ms is just in a weird place somewhere between a client config and a streams config. Given that it's not for use in an actual Kafka Streams application, it makes sense to me that we should not include it in the Streams config docs. Are you proposing to move it to ClientConfig or just not .define it in any class? In the end, we added it via KIP and would need to move/remove it via KIP, and I'm not sure it's worth doing a KIP over. We can just leave it out of the Streams config docs

Also -- this applies to both window.size.ms and window.inner.serde.class, no? I don't understand this bit:

window.inner.serde.class is a KS config and should just be documented in the regular way.

They are essentially the same kind of config, they just refer to the two different parameters of a windowed serde. If window.size.ms is not a KS config, then neither is window.inner.serde.class -- and vice versa

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you proposing to move it to ClientConfig or just not .define it in any class? In the end, we added it via KIP and would need to move/remove it via KIP, and I'm not sure it's worth doing a KIP over. We can just leave it out of the Streams config docs

Yes, the idea was to maybe do a small KIP and remove it from StreamsConfig as it does not belong there. If Lucia is interested in doing it -- I agree it's not super critical, and just removing from the docs is also fine with me.

About window.inner.serde.class it is indeed used by Kafka Streams itself (cf TimeWindowedSerializer and others). If one would set default serde to TimeWindowedSerializer it would be used, so I think it's ok to have it in the docs, and also correct to have it as StreamsConfig registered via define().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did some more digging, and now I actually think that @ableegoldman is right, we might want to treat windowed.inner.serde.class similar to window.size... (ie, maybe remove from StreamsConfig -- we could add this to the KIP Lucia started).

I also understand now, why the docs says, using it would result in an error (for both configs): Kafka Streams will always pass window-size and inner-serde via the constructor and we will also verify that we don't get an parameter set twice (or zero time), and throw an error inside configure() method of the windowed serdes...

Thus, we might want to not add windowed.inner.serde.class to the docs in this PR to begin with...

Sorry for the back and forth. Reading and understanding code is hard...

@Cerchie Cerchie force-pushed the update/docs-deprecated-configs branch from 14fb849 to 5a258b4 Compare February 23, 2024 21:58
docs/streams/developer-guide/config-streams.html Outdated Show resolved Hide resolved
docs/streams/developer-guide/config-streams.html Outdated Show resolved Hide resolved
docs/streams/developer-guide/config-streams.html Outdated Show resolved Hide resolved
docs/streams/developer-guide/config-streams.html Outdated Show resolved Hide resolved
Serde for the inner class of a windowed record. Must implement the org.apache.kafka.common.serialization.Serde interface.
</p>
<p>
Note that setting this config in KafkaStreams application would result in an error as it is meant to be used only from Plain consumer client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did some more digging, and now I actually think that @ableegoldman is right, we might want to treat windowed.inner.serde.class similar to window.size... (ie, maybe remove from StreamsConfig -- we could add this to the KIP Lucia started).

I also understand now, why the docs says, using it would result in an error (for both configs): Kafka Streams will always pass window-size and inner-serde via the constructor and we will also verify that we don't get an parameter set twice (or zero time), and throw an error inside configure() method of the windowed serdes...

Thus, we might want to not add windowed.inner.serde.class to the docs in this PR to begin with...

Sorry for the back and forth. Reading and understanding code is hard...

@Cerchie
Copy link
Contributor Author

Cerchie commented Apr 30, 2024

tagging @mjsax here, made some edits in response to the last roung

<td colspan="2">Maximum number of memory bytes to be used for record caches across all threads.</td>
<td>10485760</td>
</tr>
<tr class="row-even"><td>cache.max.bytes.buffering (Deprecated. Use cache.max.bytes instead.)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tr class="row-even"><td>cache.max.bytes.buffering (Deprecated. Use cache.max.bytes instead.)</td>
<tr class="row-even"><td>cache.max.bytes.buffering (Deprecated. Use statestore.cache.max.bytes instead.)</td>

@@ -326,8 +331,15 @@ <h4><a class="toc-backref" href="#id23">num.standby.replicas</a><a class="header
the <code>org.apache.kafka.streams.state.DslStoreSuppliers</code> interface.
</td>
<td><code>BuiltInDslStoreSuppliers.RocksDBDslStoreSuppliers</code></td>
<td colspan="2">Default serializer/deserializer for the inner class of windowed keys, implementing the <code class="docutils literal"><span class="pre">Serde</span></code> interface.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this line does not belong to dsl.store.suppliers.class config?

</tr>
<tr class="row-even"><td>max.task.idle.ms</td>
<tr class="row-even"><td>default.windowed.value.serde.inner (Deprecated.)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems default.windowed.value.serde.inner exist already? Why do we add it here?

@@ -493,7 +493,8 @@ public class StreamsConfig extends AbstractConfig {
public static final String BUILT_IN_METRICS_VERSION_CONFIG = "built.in.metrics.version";
private static final String BUILT_IN_METRICS_VERSION_DOC = "Version of the built-in metrics to use.";

/** {@code cache.max.bytes.buffering} */
/** {@code cache.max.bytes.buffering}
* @deprecated since 3.4.0 Use {@link #CACHE_MAX_BYTES_CONFIG "cache.max.bytes"} instead. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @deprecated since 3.4.0 Use {@link #CACHE_MAX_BYTES_CONFIG "cache.max.bytes"} instead. */
* @deprecated since 3.4.0 Use {@link #STATESTORE_CACHE_MAX_BYTES_CONFIG "statestore.cache.max.bytes"} instead. */

@SuppressWarnings("WeakerAccess")
@Deprecated
public static final String DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS = "default.windowed.value.serde.inner";
private static final String DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS_DOC = "Default serializer / deserializer for the inner class of a windowed value. Must implement the " +
"<code>org.apache.kafka.common.serialization.Serde</code> interface.";

public static final String WINDOWED_INNER_CLASS_SERDE = "windowed.inner.class.serde";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we cannot just remove it, but also marked as deprecated only instead? Also, given that deprecation is part of KIP-1020, should this be only done in the KIP-1020 PR, and this PR would only do the docs cleanup?

@@ -647,7 +645,8 @@ public class StreamsConfig extends AbstractConfig {
@SuppressWarnings("WeakerAccess")
public static final String METRICS_SAMPLE_WINDOW_MS_CONFIG = CommonClientConfigs.METRICS_SAMPLE_WINDOW_MS_CONFIG;

/** {@code auto.include.jmx.reporter} */
/** {@code auto.include.jmx.reporter
* @deprecated and will removed in 4.0.0 Use {@link JMX_REPORTER "jmx.reporter"} instead.} */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find! Did not even know about this one.

@@ -300,7 +305,7 @@ <h4><a class="toc-backref" href="#id23">num.standby.replicas</a><a class="header
set by the user or all serdes must be passed in explicitly (see also default.key.serde).</td>
<td><code class="docutils literal"><span class="pre">null</span></code></td>
</tr>
<tr class="row-even"><td>default.windowed.key.serde.inner</td>
<tr class="row-even"><td>default.windowed.key.serde.inner (Deprecated.)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is default.window.value.serde.inner which was also deprecated, right? (L308 original, new L313)

@Cerchie Cerchie requested a review from mjsax May 3, 2024 20:10
Copy link
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one last comment, but I think you can address it via #14448.

<td>Medium</td>
<td colspan="2">Default serializer/deserializer for the inner class of windowed keys, implementing the <code class="docutils literal"><span class="pre">Serde</span></code> interface.</td>
<td><code class="docutils literal"><span class="pre">null</span></code></td>
</tr>
<tr class="row-odd"><td>default.windowed.value.serde.inner</td>
<tr class="row-odd"><td>default.windowed.value.serde.inner (Deprecated. Use windowed.inner.class.serde instead.)</td>
<td>Medium</td>
<td colspan="2">Default serializer/deserializer for the inner class of windowed values, implementing the <code class="docutils literal"><span class="pre">Serde</span></code> interface.</td>
<td><code class="docutils literal"><span class="pre">null</span></code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just see by change, that we use a different formatting for marking a deprecated config:

<tr class="row-even"><td>default.dsl.store</td>
  <td>Low</td>
  <td colspan="2">
    DEPRECATED] The default state store type used by DSL operators. Deprecated in
    favor of <code>dsl.store.suppliers.class</code>
    </td>

I think we should use a unified formatting -- don't have a preference which one.

@mjsax mjsax merged commit 31528f5 into apache:trunk May 10, 2024
1 check failed
@mjsax
Copy link
Member

mjsax commented May 10, 2024

Thanks for the PR @Cerchie! Merged to trunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants