Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KafkaError.str() raises UnicodeDecodeError #129

Closed
johnrgregg3 opened this issue Feb 17, 2017 · 8 comments
Closed

KafkaError.str() raises UnicodeDecodeError #129

johnrgregg3 opened this issue Feb 17, 2017 · 8 comments
Labels

Comments

@johnrgregg3
Copy link

I am running confluent_kafka.version ('0.9.2', 590080), confluent_kafka.libversion ('0.9.3', 590847)
In my on_delivery callback:

def fe_analytics_kafka_cb(err, msg):
if err:
LOG.error("fe_analytics_kafka_cb: error: code %d, name %s",
err.code(), str(err.name()))
fred = err.str()
LOG.error("fe_analytics_kafka_cb: error: code %d, name %s: %s",
err.code(), str(err.name()), str(err.str()))

Note the first LOG message does not have the err.str(), so I can see the code and name. They are:

fe_analytics_kafka_cb: error: code -192, name _MSG_TIMED_OUT

Interestingly, I get hit with the exception merely by calling err.str(), not from trying to log it. That is, the fred = err.str() gets it.

I get this exception:

File "/a/bin/vocfe_app/vocfe.py", line 195, in fe_analytics_kafka_cb
fred = err.str()
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 76: invalid start byte

Of course I'd like to be able to see the string, otherwise I only have the name to go on.

I think what is going on is that for _MSG_TIMED_OUT, str() just echos the message that was sent, but the length is off, so random memory beyond the message is attempted to be displayed. I'm sending 76 bytes, and the exceptions are in position 76 or 77, beyond the end of my message. Moreover, sometimes it does not raise the exception, but displays weird stuff after the text of my message (which was a JSON blob):

fe_analytics_kafka_cb: error: code -192, name _MSG_TIMED_OUT: {"test_message": "Fri Feb 17 16:43:53 2017 INITIAL TEST FROM 23.79.234.201"}^Y

See the ^Y at the end? I guess sometimes it gets lucky and the random memory looks like valid unicode, and sometimes it doesn't.

@johnrgregg3
Copy link
Author

It strikes me that even if the message length issue is sorted out, str() should not just echo the content of the message. What if I'm sending binary data over Kafka? Won't I still get this exception? It seems like str() should do some sanitizing in general.

@johnrgregg3
Copy link
Author

rd_kafka_message_errstr(const rd_kafka_message_t *rkmessage) {
if (!rkmessage->err)
return NULL;

    if (rkmessage->payload)
            return (const char *)rkmessage->payload;

    return rd_kafka_err2str(rkmessage->err);

}

Just returns a pointer to the payload, no NUL-termination, no guarantee that the payload is a trging or unicode. Later, it is passed to KafkaError_new_or_None as argument str, who does this:

PyObject *KafkaError_new_or_None (rd_kafka_resp_err_t err, const char *str) {
if (!err)
Py_RETURN_NONE;
return KafkaError_new0(err, "%s", str);
}

@edenhill edenhill added the bug label Feb 17, 2017
@edenhill
Copy link
Contributor

edenhill commented Feb 17, 2017

@johnrgregg3 Great work finding the root cause of this, I'm impressed!

The key here is this small comment in rdkafka.h:

void   *payload;           /**< Producer: original message payload.
				    * Consumer: Depends on the value of \c err :
				    * - \c err==0: Message payload.
				    * - \c err!=0: Error string */

https://github.com/edenhill/librdkafka/blob/master/src/rdkafka.h#L832

What this means in practice is that rd_kafka_message_errstr() should never be used on a producer message, and that misuse is a bug in the Python client.

edenhill added a commit that referenced this issue Feb 27, 2017
Proper error string handling in Producer, issue #129
@edenhill edenhill closed this as completed Mar 2, 2017
@johnrgregg3
Copy link
Author

Do you know which release this fix will be in? Will it be the upcoming 0.9.4, or the one after that?

@edenhill
Copy link
Contributor

edenhill commented Mar 3, 2017

This just missed 0.9.4, so it'll be available in the next release (no timeline yet)

@edenhill
Copy link
Contributor

edenhill commented Mar 3, 2017

If this turns out to be a blocker we'll ship a maintenance release

@johnrgregg3
Copy link
Author

I would like to emphasize that in the cases where I could get the str(), it was quite useful, so it would be nice to simply not have that function on the producer side. In particular, "198.18.130.13:9092/bootstrap: Connect to ipv4#198.18.130.13:9092 failed: Connection refused" tells me a lot more than simply the code/name, "_TRANSPORT". Similarly, the detail in _RESOLVE helps a lot.

@edenhill
Copy link
Contributor

The per-message delivery report errors are sparse - they only have the error code (which can be humanized) that only provides a static error indication.

But I think that is okay in this specific case because the error codes you mention, _TRANSPORT, _RESOLVE, etc, will not be propagated to the delivery report handler since they are temporary errors that librdkafka will handle.
The error code for a failed message will be more abstracted, e.g., _TIMED_OUT if the message could not be delivered in message.timeout.ms (for whatever underlying reasons), or a direct error code from the broker such as REPLICA_NOT_AVAILABLE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants