KAFKA-5259: TransactionalId auth implies ProducerId auth #3075

hachikuji · 2017-05-16T23:54:12Z

No description provided.

asfbot · 2017-05-16T23:57:32Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4037/
Test FAILed (JDK 7 and Scala 2.11).

asfbot · 2017-05-16T23:59:58Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4023/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-17T01:01:19Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4036/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-17T01:02:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4050/
Test FAILed (JDK 7 and Scala 2.11).

asfbot · 2017-05-17T01:12:23Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4039/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-17T01:12:57Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4053/
Test FAILed (JDK 7 and Scala 2.11).

ijuma · 2017-05-17T13:34:59Z

Btw, you may want to update the AclCommand in this PR too.

hachikuji · 2017-05-17T18:52:17Z

@ijuma Yes, thanks for reminding me.

asfbot · 2017-05-18T22:12:01Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4143/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-18T22:12:11Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4156/
Test FAILed (JDK 7 and Scala 2.11).

asfbot · 2017-05-19T01:16:29Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4157/
Test FAILed (JDK 7 and Scala 2.11).

asfbot · 2017-05-19T01:16:36Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4144/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-19T08:28:48Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4161/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-19T08:28:51Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4174/
Test FAILed (JDK 7 and Scala 2.11).

hachikuji · 2017-05-19T08:48:50Z

@ijuma I know you're busy, but perhaps you can help review some of the changes in KafkaApis and AclCommand?

asfbot · 2017-05-19T08:49:15Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4162/
Test FAILed (JDK 8 and Scala 2.12).

asfbot · 2017-05-19T08:49:32Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4175/
Test FAILed (JDK 7 and Scala 2.11).

ijuma · 2017-05-19T09:09:35Z

core/src/main/scala/kafka/server/KafkaApis.scala

-          internalTopics.map { tp => (tp, Errors.TOPIC_AUTHORIZATION_FAILED) }
+        // Any failed partition check causes the entire request to fail. We only send back error responses
+        // for the partitions that failed to avoid needing to send an ambiguous error code for the partitions
+        // which succeeded.


I will take a deeper look. However, I had one question: would it be clearer if we had an error code for this? Something referring to the fact that the partition failed because the request failed (i.e. the request is transactional, so either all of it succeeds or none succeeds).

Yes, I debated this, but I couldn't think of a suitable error. Maybe OPERATION_NOT_ATTEMPTED or something like that?

I think an OPERATION_NOT_ATTEMPTED for the partitions which were abandoned would be nice to have.

Another way to deal with this is to make it clear that the operation for a specific partition was successfully completed only if the status for that partition is Errors.NONE. If there is no entry for a partition, then you can assume that the particular operation was not attempted. This is the status quo today, and I think it is sufficient.

I had forgotten to reply to this. It should not block this PR and if there is no time to change it before the release, it's fine. However, the OPERATION_NOT_ATTEMPTED option seems a little better.

asfbot · 2017-05-19T17:23:02Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4191/
Test FAILed (JDK 7 and Scala 2.11).

asfbot · 2017-05-19T17:49:12Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4178/
Test FAILed (JDK 8 and Scala 2.12).

hachikuji · 2017-05-19T17:51:24Z

cc @apurvam

apurvam

Reviewed the client side code and left comments.

apurvam · 2017-05-19T18:31:23Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -605,7 +606,9 @@ public void abortTransaction() throws ProducerFencedException {
     * Implementation of asynchronously send a record to a topic.
     */
    private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
-        ensureProperTransactionalState();
+        if (transactionManager != null)


This may be a matter of preference, but I find it easier to do such checks in the callee: it generally creates fewer branches in the top level method.

I agree it's subjective. The reason I preferred moving the check here is that it's easier to dismiss the function if you know you are not concerned about the transactional/idempotent cases. Otherwise, you have to descend into it to find that it just returns. My preference is not too strong either way though.

apurvam · 2017-05-19T18:37:26Z

clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java

@@ -640,7 +643,7 @@ public void abortTransaction() throws ProducerFencedException {
            long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
            log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
            // producer callback will make sure to call both 'callback' and interceptor callback
-            Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp, transactionManager);
+            Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);


Before adding transactions, we only had an interceptor callback if an interceptor was defined. We should probably go back to that pattern now that we have moved the call of transactionManager.setError to Sender.failBatch.

apurvam · 2017-05-19T18:42:31Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

+
+            if (transactionManager.isInErrorState() && accumulator.hasUnflushedBatches()) {
+                log.error("Aborting producer batches due to fatal error", transactionManager.lastError());
+                accumulator.abortBatches(transactionManager.lastError());


So in this case, we will still call sendProducerData which will do nothing, and then we will poll again. Would it be better to just return here?

apurvam · 2017-05-19T19:14:05Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

@@ -512,12 +495,10 @@ private void completeBatch(ProducerBatch batch, ProduceResponse.PartitionRespons
                final RuntimeException exception;
                if (error == Errors.TOPIC_AUTHORIZATION_FAILED)
                    exception = new TopicAuthorizationException(batch.topicPartition.topic());
+                else if (error == Errors.CLUSTER_AUTHORIZATION_FAILED)
+                    exception = new ClusterAuthorizationException("The producer is not authorized to do idempotent sends");


It would be better to use the Errors.CLUSTER_AUTHORIZATION_FAILED.execption() here and elsewhere. THis way the error messages can be updated consistently everywhere for the same error type.

I have already started doing this for the ProducerFencedException in an upcoming patch.

I understand the point, but the problem is that the message for CLUSTER_AUTHORIZATION_FAILED is very generic since there are a number of cases that this handles. I wanted to be able to give a better message since we know the specific operation that would have failed.

apurvam · 2017-05-19T19:16:03Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java

-                requestHandler.fatal(new KafkaException(errorMessage, lastError));
-            else
-                requestHandler.fatal(new KafkaException(errorMessage));
+    private boolean maybeTerminateRequestWithError(TxnRequestHandler requestHandler) {


nit: In this class, all the private methods come after the public and package-private methods. It would be nice to maintain that consistency.

apurvam · 2017-05-19T19:19:51Z

clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java

@@ -596,6 +606,8 @@ public void handleResponse(AbstractResponse response) {
                reenqueue();
            } else if (error == Errors.COORDINATOR_LOAD_IN_PROGRESS || error == Errors.CONCURRENT_TRANSACTIONS) {
                reenqueue();
+            } else if (error == Errors.CLUSTER_AUTHORIZATION_FAILED) {
+                fatal(new ClusterAuthorizationException("The producer is not authorized to generate a producerId for idempotence"));


As mentioned before, we should be using the Errors.CLUSTER_AUTHORIZATION_FAILED.exception() here so that the messages are consistent and can be updated consistently in the future.

asfbot · 2017-05-19T20:44:25Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4197/
Test FAILed (JDK 7 and Scala 2.11).

asfbot · 2017-05-24T00:26:40Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4326/
Test PASSed (JDK 7 and Scala 2.11).

asfbot · 2017-05-24T00:57:35Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4313/
Test PASSed (JDK 8 and Scala 2.12).

junrao

@hachikuji : Thanks for the updated patch. Just one more comment.

junrao · 2017-05-24T02:07:06Z

core/src/main/scala/kafka/server/KafkaApis.scala

+        // which succeeded.
+        val partitionErrors = (unauthorizedForWriteRequestInfo.map(_ -> Errors.TOPIC_AUTHORIZATION_FAILED) ++
+          nonExistingOrUnauthorizedForDescribeTopics.map(_ -> Errors.UNKNOWN_TOPIC_OR_PARTITION) ++
+          internalTopics.map(_ ->Errors.TOPIC_AUTHORIZATION_FAILED)).toMap


Would it be better to just return a response level error instead of a partition level error since the failure always happens to all partitions? We can probably have an error message in the response that describes the specific topics that caused the error.

That might be the simplest way to convey the intended semantic. One minor annoyance is that we wouldn't be able to set the unauthorized topic list in TopicAuthorizationException (unless we tried to parse the error message). I don't have a strong preference. What do you think @ijuma?

…ling

asfbot · 2017-05-24T21:23:50Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk7-scala2.11/4361/
Test PASSed (JDK 7 and Scala 2.11).

asfbot · 2017-05-24T21:35:34Z

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/4347/
Test PASSed (JDK 8 and Scala 2.12).

hachikuji · 2017-05-24T21:44:05Z

I've opened https://issues.apache.org/jira/browse/KAFKA-5322 to resolve the question of the AddPartitions error codes. I'm going to go ahead and merge this to trunk and 0.11.0.

Author: Jason Gustafson <jason@confluent.io> Reviewers: Apurva Mehta <apurva@confluent.io>, Jun Rao <junrao@gmail.com> Closes #3075 from hachikuji/KAFKA-5259-FIXED (cherry picked from commit 38f6cae) Signed-off-by: Jason Gustafson <jason@confluent.io>

hachikuji force-pushed the KAFKA-5259-FIXED branch from f5823e8 to 5cfedd9 Compare May 18, 2017 22:03

hachikuji force-pushed the KAFKA-5259-FIXED branch from 5cfedd9 to 185b4b1 Compare May 18, 2017 22:15

hachikuji force-pushed the KAFKA-5259-FIXED branch 3 times, most recently from 8b300aa to 08d4783 Compare May 19, 2017 08:25

hachikuji force-pushed the KAFKA-5259-FIXED branch from 08d4783 to 030839d Compare May 19, 2017 08:46

hachikuji changed the title ~~KAFKA-5259 [WIP]: TransactionalId auth implies ProducerId auth~~ KAFKA-5259: TransactionalId auth implies ProducerId auth May 19, 2017

ijuma reviewed May 19, 2017

View reviewed changes

apurvam reviewed May 19, 2017

View reviewed changes

junrao reviewed May 24, 2017

View reviewed changes

hachikuji added 21 commits May 24, 2017 13:22

KAFKA-5259: TransactionalId auth implies ProducerId auth

9d048ff

Update transactional response error codes

1de2d6e

A few minor producer cleanups

7e11840

More cleanups

11ec7a6

Implement AclCommand, add more tests, and cleanup producer error hand…

7a60e5f

…ling

Fix checkstyle and test failure

b4416a7

Fix failing transactions tests

27b7323

Revert handling of unsupported version and disconnect

ac58cb7

Fix Sender.run() behavior and add a few comments

4784e69

Address a couple review comments

fb3e3c8

Fix issue removing idempotent ACL in integration test

036a10f

Fix some rebase breakage

bc632ac

A bunch of logging improvements

4e1e1bb

Test cluster auth errors in Sender

e5b2714

Add some auth tests to TransactionManagerTest

d52c26e

Add the rest of the authorization test cases

652d68d

Fix failing auth integration tests

60ae9bf

Finer-grained checking for produce response error tests

3747b8b

Address review comments in AclCommandTest

9cdabdf

Fix compilation error from upstream changes

9550405

Fix broken GroupCoordinator tests

8d41288

hachikuji force-pushed the KAFKA-5259-FIXED branch from 9e024ca to 8d41288 Compare May 24, 2017 20:24

asfgit closed this in 38f6cae May 24, 2017

KAFKA-5259: TransactionalId auth implies ProducerId auth #3075

KAFKA-5259: TransactionalId auth implies ProducerId auth #3075

Conversation

hachikuji commented May 16, 2017

asfbot commented May 16, 2017

asfbot commented May 16, 2017

asfbot commented May 17, 2017

asfbot commented May 17, 2017

asfbot commented May 17, 2017

asfbot commented May 17, 2017

ijuma commented May 17, 2017

hachikuji commented May 17, 2017

asfbot commented May 18, 2017

asfbot commented May 18, 2017

asfbot commented May 19, 2017

asfbot commented May 19, 2017

asfbot commented May 19, 2017

asfbot commented May 19, 2017

hachikuji commented May 19, 2017

asfbot commented May 19, 2017

asfbot commented May 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented May 19, 2017

asfbot commented May 19, 2017

hachikuji commented May 19, 2017

apurvam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apurvam May 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hachikuji May 19, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented May 19, 2017

asfbot commented May 24, 2017

asfbot commented May 24, 2017

junrao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asfbot commented May 24, 2017

asfbot commented May 24, 2017

hachikuji commented May 24, 2017

apurvam May 19, 2017 •

edited

hachikuji May 19, 2017 •

edited