Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-6096: Add multi-threaded tests for group coordinator, txn manager #4122

Merged
merged 2 commits into from
Jan 9, 2018

Conversation

rajinisivaram
Copy link
Contributor

No description provided.

@rajinisivaram
Copy link
Contributor Author

@hachikuji These are the tests I have so far. They currently test only the good paths, but would have been sufficient to detect the deadlocks in KAFKA-5970 and KAFKA-6042. All the error paths need to be added as well, but it will be good if you can review the current code to see if there is a simpler way of adding all the cases to the test.


@Test
def verifyGoodPathConcurrency() {
val operations: Seq[GroupOperation[ _, _]] = Seq(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach. But I guess we don't have any tests with multiple members in the same group yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests create multiple members in each group (5 groups with 5 members each at the moment, run concurrently across 5 threads).

import scala.collection.mutable
import scala.collection.JavaConverters._

class TransactionStateManagerConcurrencyTest extends AbstractCoordinatorConcurrencyTest[Transaction] {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why do this at the lower level instead of the way you did the group coordinator tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji Thank you for the review. I was trying to use logic from existing mock tests in both cases to trigger the operations. The transaction coordinator unit tests use a mock TransactionStateManager and various other mocks, so it looked like a lot more work to get those to trigger the path with appendRecords. But it does make sense to write the tests at the coordinator level, so I will update.

@rajinisivaram
Copy link
Contributor Author

retest this please

@ijuma
Copy link
Contributor

ijuma commented Oct 31, 2017

Looks like testFencingOnTransactionExpiration is flaky (unrelated to this PR). We should file a JIRA, cc @apurvam.

java.lang.AssertionError: expected:<1> but was:<0>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:834)
	at org.junit.Assert.assertEquals(Assert.java:645)
	at org.junit.Assert.assertEquals(Assert.java:631)
	at kafka.api.TransactionsTest.testFencingOnTransactionExpiration(TransactionsTest.scala:486)

Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. This testing is very valuable. I added a few minor comments and questions.

var hasMore = true
while (hasMore) {
hasMore = false
val head = taskQueue.synchronized {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a little unconventional to use ".synchronized" instead of a space.

// Run some random operations
RandomOperationSequence(createMembers(s"random$i"), operations).run()

// Check that proper sequences till work correctly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "still"?

abstract class OperationSequence(members: Set[M], operations: Seq[Operation]) {
def actionSequence: Seq[Set[Action]]
def run(): Unit = {
actionSequence.foreach { actions => verifyConcurrentActions(actions) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could just be actionSequence.foreach(verifyConcurrentActions)

}

override def appendRecords(timeout: Long,
requiredAcks: Short,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should align with the previous argument?

override def setUp() {
super.setUp()

// make two partitions of the group topic to make sure some partitions are not owned by the coordinator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this comment. How do we actually ensure that one of the partitions remains unowned by the coordinator? Also, what is the point of doing so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, copy-paste from another test. Removed comment.

new InitProducerIdOperation(),
new AddPartitionsToTransactionOperation(Set(new TopicPartition("topic", 0))),
new EndTransactionOperation(TransactionResult.COMMIT),
new EndTransactionOperation(TransactionResult.ABORT))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why we have two EndTxn operaitons. If we run them sequentially, the last one would cause an unexpected transition. Is that a useful case to check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to test abort as well as commit, but you are right, this doesn't really help. I have changed the code to do commit for half of the transactions and abort for the other half, so that the sequence makes sense.

}
}

private def loadUnloadActions(firstPartitionSet: Set[Int], secondPartitionSet: Set[Int]): Set[Action] = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any better names we can choose for these arguments?

}

@Test
def testConcurrentLoadUnloadPartitions(): Unit = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level comments would be helpful for these load/unload test cases since the scenario is a little more complex.

}
}

class AddPartitionsToTransactionOperation(partitions: Set[TopicPartition]) extends TxnOperation[Errors] {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we could just use Txn instead of Transaction like we do elsewhere.


def createResponse(request: WriteTxnMarkersRequest): WriteTxnMarkersResponse = {
val pidErrorMap = request.markers.asScala.map { marker =>
(marker.producerId().asInstanceOf[java.lang.Long], marker.partitions.asScala.map { tp => (tp, Errors.NONE) }.toMap.asJava)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop parenthesis after producerId

@rajinisivaram
Copy link
Contributor Author

@hachikuji Thank you for reviewing this PR. And sorry about the delay in addressing the comments. I have updated the PR and rebased. Can you take another look when you have some time? Thanks.

Copy link

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patch!

@hachikuji
Copy link

retest this please

@hachikuji
Copy link

The failures seem unrelated. I will merge to trunk and 1.0.

@hachikuji hachikuji merged commit 6396d01 into apache:trunk Jan 9, 2018
@hachikuji
Copy link

Well, I had intended to merge to 1.0, but there are conflicts unfortunately with the new zk client. Maybe that's ok since we're unlikely to make major changes to in 1.0 anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants