Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-4547 (0.10.1 hotfix): Avoid unnecessary offset commit that could lead to an invalid offset position if partition is paused #2415

Closed
wants to merge 130 commits into from

Conversation

vahidhashemian
Copy link
Contributor

No description provided.

hachikuji and others added 30 commits September 19, 2016 21:19
Author: Ben Stopford <benstopford@gmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#1881 from benstopford/KAFKA-4193
Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Damian Guy <damian.guy@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes apache#1887 from enothereska/hotfix-metadata-unavailable
…strapTwoBrokersWithFollowerThrottle

Build is unstable, so it's hard to validate this change. Of the various builds up until 11am BST the test ran twice and passed twice.

Author: Ben Stopford <benstopford@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1873 from benstopford/KAFKA-4184
The ReassignPartitionsTest system tests doesn't reassign any replicas (i.e. move data).

This is a simple issue. It uses a 3 node cluster with replication factor of 3, so whilst the replicas are jumbled around, nothing actually is moved from machine to machine when the assignment is executed.

This fix just ups the number of nodes to 4 so things move.

Tests pass locally.
There are runs pending on the two branch builders

Passes:
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/551/
https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/94/
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/553/
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/554/
https://jenkins.confluent.io/job/system-test-kafka-branch-builder-2/95

Failures:
https://jenkins.confluent.io/job/system-test-kafka-branch-builder/552 => _RuntimeError: There aren't enough available nodes to satisfy the resource request. Total cluster size: 1, Requested: 4, Already allocated: 1, Available: 0._ Which I assume to do with the test env.

Author: Ben Stopford <benstopford@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1892 from benstopford/fix_reassignment_test
missing javadoc on public API method PersistenKeyValueFactory.enableCaching

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Eno Thereska, Guozhang Wang

Closes apache#1891 from dguy/minor-java-doc
Remove isValidCleanupPolicy and related fields as they are never used.

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Eno Thereska, Guozhang Wang

Closes apache#1888 from dguy/minor-remove-unused
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Guozhang Wang

Closes apache#1898 from hachikuji/KAFKA-3782
Minor comment fixes.

Author: Elias Levy <fearsome.lucidity@gmail.com>

Reviewers: Guozhang Wang

Closes apache#1885 from eliaslevy/fix-test-comments
The original commit interval of 30 seconds might be too large in some cases, e.g., when the verifier finishes before those 30 seconds have elapsed.

Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Damian Guy, Guozhang Wang

Closes apache#1899 from enothereska/hotfix-smoke-test-commit-interval
…rver`

We had a number of failures recently due to these timeouts being too low. It's a particular problem if multiple forks are used while running the tests.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#1889 from ijuma/increase-zk-timeout-in-tests
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1866 from hachikuji/rebalance-delay-test-cases
… topic

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>

Closes apache#1859 from hachikuji/KAFKA-3590
Technically this does not strictly adhere to RFC-952 however it is valid for domain names, urls and uris so we should loosen the requirements a tad.

Author: Ryan Pridgeon <ryan.n.pridgeon@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1856 from rnpridgeon/KAFKA-3719
- Updated implementation docs with details on Cluster Id generation.
- Mention cluster id in "noteworthy changes for 0.10.1.0" in upgrade docs.

Author: Sumit Arrawatia <sumit.arrawatia@gmail.com>
Author: arrawatia <sumit.arrawatia@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1895 from arrawatia/kip-78-docs
…ified

Author: Arun Mahadevan <aiyer@hortonworks.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>

Closes apache#1376 from arunmahadevan/cons-consumer-fix
Fix existing client-id quota test which currently don't configure quota overrides correctly. Add new tests for user and (user, client-id) quota overrides and default quotas.

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes apache#1860 from rajinisivaram/KAFKA-4055
… be caught

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes apache#1907 from hachikuji/catch-wakeup-worker-sink-task
… are bounced during reassignment

There is a corner case bug, where during partition reassignment, if the
controller and a broker receiving a new replica are bounced at the same
time, the partition reassignment is failed.

The cause of this bug is a block of code in the KafkaController which
fails the reassignment if the aliveNewReplicas != newReplicas, ie. if
some of the new replicas are offline at the time a controller fails
over.

The fix is to have the controller listen for ISR change events even for
new replicas which are not alive when the controller boots up. Once the
said replicas come online, they will be in the ISR set, and the new
controller will detect this, and then mark the reassignment as
successful.

Interestingly, the block of code in question was introduced in
KAFKA-990, where a concern about this exact scenario was raised :)

This bug was revealed in the system tests in apache#1904.
The relevant tests will be enabled in either this or a followup PR when PR-1904 is merged.

Thanks to junrao identifying the issue and providing the patch.

Author: Apurva Mehta <apurva.1618@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes apache#1910 from apurvam/KAFKA-4214
…fault partition assignment strategy to round robin

This patch adds proper warning message and necessary doc updates for updating the default partition assignment strategy of Mirror Maker from range to round robin. The actual switch would occur as part of a major release cycle (to be scheduled).

Author: Vahid Hashemian <vahidhashemian@us.ibm.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#1499 from vahidhashemian/KAFKA-3831
…r.poll

Author: Magnus Reftel <magnus.reftel@skatteetaten.no>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#1901 from reftel/feature/poll_javadoc
Simple jira which alters two things:

1. kafka-reassign-partitions --verify prints Throttle was removed regardless of whether a throttle was applied. It should only print this if the value was actually changed.

2. --verify should exception if the —throttle argument. (check generate too)

To test this I extracted all validation logic into a separate method and added a test which covers the majority of combinations. The validation logic was retained as is, other than implementing (2) and adding validation to the --broker-list option which you can currently apply to any of hte main actions (where it is ignored). Requirement 1 was tested manually (as it's just println).

Testing:
- Build passes locally.
- System test reassign_partitions_test.py also passes.

Author: Ben Stopford <benstopford@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes apache#1896 from benstopford/KAFKA-4200
This small PR pulls ThrottledReplicationRateLimit out of KafkaConfig and puts it in a class that defines Dynamic Configs. Client configs are also placed in this class and validation added.

Author: Ben Stopford <benstopford@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes apache#1864 from benstopford/KAFKA-4177
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1914 from hachikuji/mm-default-new-consumer
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#1905 from ijuma/no-new-consumer-switch-in-examples
…lled

If some StreamsMetadataState methods are called before the onChange method is called a NullPointerException was being thrown. Added null check for cluster in isInitialized method

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#1920 from dguy/fix-npe-streamsmetadata
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Eno Thereska <eno.thereska@gmail.com>

Closes apache#1919 from guozhangwang/minor-error-message-fixes
…tores with logging disabled

Adding the test so we know that the State Stores with logging disabled or without a topic don't throw any exceptions.

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#1916 from dguy/state-store-logging-disabled
KafkaExceptions currently thrown from within StreamThread/StreamTask currently bubble up without any additional context. This makes it hard to figure out where something went wrong, i.e, which topic had the serialization exception etc

Author: Damian Guy <damian.guy@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#1819 from dguy/kafka-3708 and squashes the following commits:

d6feaa8 [Damian Guy] address comments
15b89e7 [Damian Guy] merge trunk
6b8a8af [Damian Guy] catch exceptions in various places and throw more informative versions
c86eeda [Damian Guy] fix conflicts
8f37e2c [Damian Guy] add some context to exceptions
Terminate topic purgatory thread in AdminManager during server shutdown to avoid threads being left around in unit tests.

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#1927 from rajinisivaram/KAFKA-4227
himani1 and others added 25 commits October 18, 2016 22:12
Author: himani1 <1himani.arora@gmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#2035 from himani1/code_refactored
…r to speedup shutdown

Author: Alexey Ozeritsky <aozeritsky@yandex-team.ru>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>

Closes apache#2023 from resetius/AbstractFetcherManager-shutdown-speedup
Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Damian Guy, Guozhang Wang

Closes apache#2038 from enothereska/hotfix-put-cache
There are 32 failing tests on both trunk and my branch.

Author: jozi-k <jozef.koval@protonmail.ch>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#2036 from jozi-k/update-rocksdb-4.11.2
Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Michael G. Noll, Matthias J. Sax, Guozhang Wang

Closes apache#2030 from enothereska/minor-kip63-docs
…meaningful error message

…eaningful error message

Author: bbejeck <bbejeck@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#2042 from bbejeck/KAFKA-4312_write_as_text_throws_NPE_empty_string
Signed-off-by: radai-rosenblatt <radai.rosenblattgmail.com>

Author: radai-rosenblatt <radai.rosenblatt@gmail.com>

Reviewers: Joel Koshy <jjkoshy.w@gmail.com>

Closes apache#1961 from radai-rosenblatt/extensibility
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#2039 from mjsax/hotfix-ktableLeftJoin
…t of topicGroups method

…d out of topicGroups method. The topicGroups method only called from StreamPartitionAssignor when KafkaStreams object  is the leader, needs to be executed for clients.

Author: bbejeck <bbejeck@gmail.com>

Reviewers: Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

Closes apache#2005 from bbejeck/KAFKA-4269_multiple_kstream_instances_mult_consumers_npe
Author: Jun Rao <junrao@gmail.com>

Reviewers: Ben Stopford <benstopford@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes apache#2043 from junrao/kafka-4313
 - fixed leftJoin -> outerJoin test bug
 - simplified to only use values
 - fixed inner KTable-KTable join
 - fixed left KTable-KTable join
 - fixed outer KTable-KTable join
 - fixed inner, left, and outer left KStream-KStream joins
 - added inner KStream-KTable join
 - fixed left KStream-KTable join

Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

Closes apache#1777 from mjsax/kafka-4001-joins
Existing VMs will need to be re-provisioned or re-created to pick up this change.

Reference docs:
https://www.vagrantup.com/docs/synced-folders/rsync.html

Author: Magnus Edenhill <magnus@edenhill.se>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#2047 from edenhill/fix_vm_rsync_exclude
Author: Ben Stopford <benstopford@gmail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#2034 from benstopford/throttling-system-test-kafka-changes
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#2016 from hachikuji/KAFKA-4296
Author: Xavier Léauté <xavier@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes apache#2052 from xvrl/test-add-list-topics
…upCommand

This PR makes a couple of enhancements to the `--describe` option of `ConsumerGroupCommand`:
1. Listing members with no assigned partitions.
2. Showing the member id along with the owner of each partition (owner is supposed to be the logical application id and all members in the same group are supposed to set the same owner).
3. Printing a warning indicating whether ZooKeeper based or new consumer API based information is being reported.

It also adds unit tests to verify the added functionality.

Note: The third request on the corresponding JIRA (listing active offsets for empty groups of new consumers) is not implemented as part of this PR, and has been moved to its own JIRA (KAFKA-3853).

Author: Vahid Hashemian <vahidhashemian@us.ibm.com>

Reviewers: Jun Rao <junrao@gmail.com>, Jason Gustafson <jason@confluent.io>

Closes apache#1336 from vahidhashemian/KAFKA-3144
…grationTest.testReprocessingFromScratchAfterReset

 - fixed consumer group dead condition
 - disabled state store cache

Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#2056 from mjsax/KAFKA-4058-instableResetToolTest
Author: Andrew Stevenson <andrew@datamountaineer.com>

Reviewers: Shikhar Bhushan <shikhar@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes apache#2055 from andrewstevenson/kafka-4334
…group for each topic

- reworked to use a sinlge KafkaConsumer and subscribe only once

Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes apache#2049 from mjsax/improveResetTool
Increase timeout in test to avoid transient failures due to long GC or slow machine.

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes apache#2057 from rajinisivaram/KAFKA-2089
There should be only one cases where these clean-ups have a functional impact: replaced repeated identical logs with a single log for the stale controller epoch case.

The rest should just make the code easier to read and make it a bit less wasteful. I did this exercise because unused variables sometimes mask bugs.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes apache#1985 from ijuma/remove-unused
…in SimpleAclAuthorizer

Also use named parameters in KafkaServer for clarity (even though it was correct previously).

Author: Matt <wangm92@163.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes apache#1646 from wangzzu/wangzzu
… describe output

Author: Vahid Hashemian <vahidhashemian@us.ibm.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#2061 from vahidhashemian/KAFKA-4339
Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes apache#2027 from rajinisivaram/KAFKA-4301
…alid offset position if partition is paused (Hotfix for 0.10.1)
@vahidhashemian vahidhashemian deleted the KAFKA-4547-0.10.1 branch January 21, 2017 04:11
@asfbot
Copy link

asfbot commented Jan 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/1083/
Test FAILed (JDK 8 and Scala 2.12).

@asfbot
Copy link

asfbot commented Jan 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/1085/
Test PASSed (JDK 8 and Scala 2.11).

@asfbot
Copy link

asfbot commented Jan 21, 2017

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.10/1083/
Test PASSed (JDK 7 and Scala 2.10).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet