Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not Merge - Mistakenly Opened #6960

Closed
wants to merge 71 commits into from
Closed

Do not Merge - Mistakenly Opened #6960

wants to merge 71 commits into from

Conversation

gokhansari
Copy link

@gokhansari gokhansari commented Jun 18, 2019

Mistakenly opened on wrong branch

colinhicks and others added 30 commits May 20, 2019 15:25
Some versions of OpenJDK 11 do not properly handle external javadocs links referencing previous Java versions. See: https://bugs.openjdk.java.net/browse/JDK-8212233.

Failure symptom:
`> Task :connect:api:javadoc
javadoc: error - The code being documented uses modules but the packages defined in https://docs.oracle.com/javase/8/docs/api/ are in the unnamed module.
1 error`

This PR conditionally sets the Java api docs link for the affected Gradle tasks. I verified that the links render correctly in the generated documentation when building with `1.8.0_181` and `11.0.3`. For example, in `build/docs/javadoc/org/apache/kafka/connect/source/SourceTask.html` the hyperlink to `java.nio.channels.Selector` points to a valid page on Oracle's site in both cases.

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>

Closes #5041 from omkreddy/KAFKA-3143

(cherry picked from commit d77bac1)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
Fix a javadoc mistake introduced in https://github.com/apache/kafka/pull/5911/files#diff-35e3523474fa277a63e36a3fe9e22af8.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
…6582)

Reviewers: Jason Gustafson <jason@confluent.io>, Colin Patrick McCabe <cmccabe@confluent.io>, Andrew Olson <aolson1@cerner.com>, José Armando García Sancio <jsancio@users.noreply.github.com>
Reviewers: Boyang Chen <bchen11@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>
Reviewer: Randall Hauch <rhauch@gmail.com>
Return a copy of the ConfigDef in Client Configs. Related to KIP-458.

Author: Magesh Nandakumar <magesh.n.kumar@gmail.com
Reviewer: Randall Hauch <rhauch@gmail.com>
Reviewers: Jason Gustafson <jason@confluent.io>
KIP-345 and KIP-392 introduced a couple breaking changes for old versions of bumped protocols. This patch fixes them.

Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Boyang Chen <bchen11@outlook.com>, Guozhang Wang <wangguoz@gmail.com>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #6787 from omkreddy/transaction_test

(cherry picked from commit 5ca6a2e)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
…am (#6779)

As title states. We plan to merge this to both trunk and 2.3 if it could fix the stream system tests globally.
Reference implementation: #6673

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>
…mmit (#6579)

Prior to this change, the next commit time advances
_each_ time a commit happens -- including when a commit happens
because it was requested by the `Task`. When a `Task` requests a
commit several times, the clock advances far into the future
which prevents expected periodic commits from happening.

This commit changes the behavior, we reset `nextCommit` relative
to the time of the commit.

Reviewers: Jason Gustafson <jason@confluent.io>
…ing testing docker containers (#6797)

Reviewers: Colin P. McCabe <cmccabe@apache.org>
… excluded from class loading isolation (#6796)

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
…ent overrides (#6789)

Because of how config values are converted into strings in the `AbstractHerder.validateClientOverrides()` method after being validated by the client override policy, an exception is thrown if the value returned by the policy isn't already parsed as the type expected by the client `ConfigDef`. The fix here involves parsing client override properties before passing them to the override policy.

A unit test is added to ensure that several different types of configs are validated properly by the herder.

Author: Chris Egerton <chrise@confluent.io>
Reviewers: Magesh Nandakumar <magesh.n.kumar@gmail.com>, Randall Hauch <rhauch@gmail.com>
These are important to ensure we don't break compatibility.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Gwen Shapira

Closes #6794 from ijuma/update-version-compat-tests
Important fix: FasterXML/jackson-databind#2326

Reviewers: Colin P. McCabe <cmccabe@apache.org>
Reviewers: John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Boyang Chen <boyang@confluent.io>
The old docs here used a now deprecated method to set the block cache size. In switching over to the new one we would now need to construct a Cache object and therefore also need to close it, so this is a good opportunity to demonstrate the RocksDBConfigSetter#close method that will need to be implemented by users.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>
This gets hit when debug logging is enabled.

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
…pports latest version (#6806)

In the olden days, OffsetForLeaderEpoch was exclusively an inter-broker protocol and
required Cluster level permission. With KIP-320, clients can use this API as well and
so we lowered the required permission to Topic Describe. The only way the client can
be sure that the new permissions are in use is to require version 3 of the protocol
which was bumped for 2.3. If the broker does not support this version, we skip the
validation and revert to the old behavior.

Additionally, this patch fixes a problem with the newly added replicaId field when
parsed from older versions which did not have it. If the field was not present, then
we used the consumer's sentinel value, but this would limit the range of visible
offsets by the high watermark. To get around this problem, this patch adds a
separate "debug" sentinel similar to APIs like Fetch and ListOffsets.

Reviewers: Ismael Juma <ismael@juma.me.uk>
…ents (#6722)

When cleaning transactional data, we need to keep track of which transactions still have data associated with them so that we do not remove the markers. We had logic to do this, but the state was not being carried over when beginning cleaning for a new set of segments. This could cause the cleaner to incorrectly believe a transaction marker was no longer needed. The fix here carries the transactional state between groups of segments to be cleaned.

Reviewers: Dhruvil Shah <dhruvil@confluent.io>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jun Rao <junrao@gmail.com>
…ion is possible (#6823)

The consumer should await api version information before determining whether the broker supports offset validation. In KAFKA-8422, we skip the validation if we don't have api version information, which means we always skip validation the first time we connect to a node. This bug was detected by the failing system test `tests/client/truncation_test.py`. The test passes again with this fix.

Reviewers: Ismael Juma <ismael@juma.me.uk>
* add doc changes for static membership release

* address comments

* address comments for a separate page

* address Matthias' comment
Remove created metrics when shutting down `ControllerEventManager`. This fixes transient failures in `ControllerEventManagerTest.testEventQueueTime` and is generally good hygiene.

Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>
…6808)

Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>,  Bill Bejeck <bbejeck@gmail.com>
…ker (#6812)

Authorized operations must be null when talking to a pre-KIP-430 broker.
If we present this as the empty set instead, it is impossible for clients
to know if they have no permissions, or are talking to an old broker.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Boyang Chen <boyang@confluent.io>
…6831)

As we are planning to add on more supporting features for rebalancing under static membership, we need to make sure the behavior for `group.instance.id` is consistent throughout the whole stack. This patch ensures that the default value is null in the JoinGroup response.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
vvcephei and others added 27 commits May 31, 2019 14:01
See also #6684

KTable processors must be supplied with a KTableProcessorSupplier, which in turn requires implementing a ValueGetter, for use with joins and groupings.

For suppression, a correct view only includes the previously emitted values (not the currently buffered ones), so this change also involves pushing the Change value type into the suppression buffer's interface, so that it can get the prior value upon first buffering (which is also the previously emitted value).

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
…d value is null (#6842)

When the restored record value is null, we are in danger of NPE during restoration phase.

Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
…aves (#6859)

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Reviewers: Randall Hauch <rhauch@gmail.com>, Matthias J. Sax <mjsax@apache.org>,  Bill Bejeck <bbejeck@gmail.com>
…6795)

Since the originals map passed to AbstractConfig constructor may be immutable, avoid updating this map while resolving indirect config variables. Instead a new ResolvingMap instance is now used to store resolved configs.

Reviewers: Randall Hauch <rhauch@gmail.com>, Boyang Chen <bchen11@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
…h KIP-297 (#6750)

According to KIP-297 a parameter is passed to ConfigProvider with syntax "config.providers.{name}.param.{param-name}". Currently AbstractConfig allows parameters of the format "config.providers.{name}.{param-name}". With this fix AbstractConfig will be consistent with KIP-297 syntax.

Reviewers: Robert Yokota <rayokota@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
…ative rebalancing (#6850)

Restart task on reconfiguration under incremental cooperative rebalancing, and keep execution paths separate for config updates between eager and cooperative. Include the group generation in the log message when the worker receives its assignment.

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
Temporarily restore the SslFactory.sslContext() function, which some connectors use. This function is not a public API and it will be removed eventually. For now, we will mark it as deprecated.
…ebalancing (#6872)

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
…EST API (#6791)

When Connect forwards a REST request from one worker to another, the Authorization header was not forwarded. This commit changes the Connect framework to add include the authorization header when forwarding requests to other workers.

Author: Hai-Dang Dam <damquanghaidang@gmail.com>
Reviewers: Robert Yokota <rayokota@gmail.com>, Randall Hauch <rhauch@gmail.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Boyang Chen <boyang@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confuent.io>
Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Gwen Shapira, David Arthur, James Cheng, Vahid Hashemian

Closes #6869 from cmccabe/KAFKA-4893

(cherry picked from commit e6563aa)
Signed-off-by: Gwen Shapira <cshapi@gmail.com>
Jason Gustafson <jason@confluent.io>
…6762)

The Dead state in the coordinator is used for groups which are either pending deletion or migration to a new coordinator. Currently requests received while in this state result in an UNKNOWN_MEMBER_ID which causes consumers to reset the memberId. This is a problem for KIP-345 since it can cause an older member to fence a newer member. This patch changes the error code returned in this state to COORDINATOR_NOT_AVAILABLE, which causes the consumer to rediscover the coordinator, but not reset the memberId.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
In RocksDBTimestampedStore#openRocksDB we try to open a db with two column families. If this succeeds but the first column family is empty (db.newIterator.seekToFirst.isValid() == false) we never actually close its ColumnFamilyHandle

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
…ctionDisabled Test

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io>

Closes #6887 from omkreddy/unclean-leader

(cherry picked from commit 060701f)
Signed-off-by: Manikumar Reddy <manikumar@confluent.io>
We see this failure from time to time:
```
java.lang.AssertionError: expected:<1> but was:<0>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:647)
	at org.junit.Assert.assertEquals(Assert.java:633)
	at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:512)
```
The cause is probably that we are using `consumeRecordsFor` which has no expectation on the number of records to fetch and a timeout of just 1s. This patch changes the code to use `consumeRecords` and the default 15s timeout.

Note we have also fixed a bug in the test case itself, which was using the wrong topic for the second write, which meant it could never have failed in the anticipated way anyway.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Gwen Shapira

Closes #6905 from hachikuji/fix-flaky-transaction-test

(cherry picked from commit bb8de0b)
Signed-off-by: Gwen Shapira <cshapi@gmail.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
We see the upgrade test failing from time to time. I looked into it and found that the root cause is basically that the test throughput can be too high for the 0.9 producer to make progress. Eventually it reaches a point where it has a huge backlog of timed out requests in the accumulator which all have to be expired. We see a long run of messages like this in the output:

```
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335160","key":null}
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386132,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335163","key":null}
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335166","key":null}
{"exception":"class org.apache.kafka.common.errors.TimeoutException","time_ms":1559907386133,"name":"producer_send_error","topic":"test_topic","message":"Batch Expired","class":"class org.apache.kafka.tools.VerifiableProducer","value":"335169","key":null}
```
This can continue for a long time (I have observed up to 1 min) and prevents the producer from successfully writing any new data. While it is busy expiring the batches, no data is getting delivered to the consumer, which causes it to eventually raise a timeout.
```
kafka.consumer.ConsumerTimeoutException
at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:50)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:109)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:69)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:47)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
```
The fix here is to reduce the throughput, which seems reasonable since the purpose of the test is to verify the upgrade, which does not demand heavy load. Note that I investigated several failing instances of this test going back to 1.0 and saw a similar pattern, so there does not appear to be a regression.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Gwen Shapira

Closes #6907 from hachikuji/lower-throughput-for-upgrade-test

(cherry picked from commit c7c310b)
Signed-off-by: Gwen Shapira <cshapi@gmail.com>
We've seen `ReplicaVerificationToolTest.test_replica_lags` fail occasionally due to errors such as the following:
```
RemoteCommandError: ubuntuworker7: Command 'kill -15 2896' returned non-zero exit status 1. Remote error message: bash: line 0: kill: (2896) - No such process
```
The problem seems to be a shutdown race condition when using `max_messages` with the producer. The process may already be gone which will cause the signal to fail.

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Gwen Shapira

Closes #6906 from hachikuji/fix-failing-replicat-verification-test

(cherry picked from commit 2feb44e)
Signed-off-by: Gwen Shapira <cshapi@gmail.com>
Reviewers: Joel Hamill <joel@confluent.io>, Jim Galasyn <jim.galasyn@confluent.io>, Matthias J. Sax <matthias@confluent.io>
The ResetIntegrationTest has experienced several failures and it seems the current timeout of 10 seconds may not be enough time

Reviewers: Matthias J. Sax <mjsax@apache.org>, Boyang Chen <boyang@confluent.io>
This PR fixes a bug in static group membership. Previously we limit the `member.id` replacement in JoinGroup to only cases when the group is in Stable. This is error-prone and could potentially allow duplicate consumers reading from the same topic. For example, imagine a case where two unknown members join in the `PrepareRebalance` stage at the same time.

The PR fixes the following things:

1. Replace `member.id` at any time we see a known static member rejoins group with unknown member.id
2. Immediately fence any ongoing join/sync group callback to early terminate the duplicate member.
3. Clearly handle Dead/Empty cases as exceptional.
4. Return old leader id upon static member leader rejoin to avoid trivial member assignment being triggered.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
The idempotent producer attempts to detect spurious UNKNOWN_PRODUCER_ID errors and handle them by reassigning sequence numbers to the inflight batches. The inflight batches are tracked in a PriorityQueue. The problem is that the reassignment of sequence numbers depends on the iteration order of PriorityQueue, which does not guarantee any ordering. So this can result in sequence numbers being assigned in the wrong order.  This patch fixes the problem by using a sorted set instead of a priority queue so that the iteration order preserves the sequence order. Note that resetting sequence numbers is an exceptional case.

This patch also fixes KAFKA-8484, which can cause an IllegalStateException when the producerId is reset while there are pending produce requests inflight. The solution is to ensure that sequence numbers are only reset if the producerId of a failed batch corresponds to the current producerId.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
…andler

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
(cherry picked from commit 3581429)
Update docs with KIP-415 details for incremental cooperative rebalancing

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
@gokhansari gokhansari closed this Jun 18, 2019
@gokhansari gokhansari changed the title KAFKA-8554 Generate Topic/Key from Json Transform Do not Merge - KAFKA-8554 Jun 18, 2019
@gokhansari gokhansari changed the title Do not Merge - KAFKA-8554 Do not Merge - Mistakenly Opened Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet