-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failed: rd_slice_abs_offset #689
Comments
Can you reproduce this with Please also fill out the issue checklist (consumer config, broker version) |
Thank you for the quick response! Below are the requested logs, the brokers (3) are all running Kafka 2.3.0.
|
Thank you. It would be very useful to know where exactly this is happening in the code, could you try the following? $ ulimit -c unlimited
$ <reproduce the issue>
$ gdb $(/usr/bin/env python) core # on osx the core file will be in /cores/core.<pid>
gdb) bt
gdb) exit and provide us with the output? |
This originally was running in a Docker container in our K8s cluster. I created a VM and ran it in there with the same parameters. The output for gdb is:
|
Okay, that backtrace isn't very helpful unfortunately :( Another way that might work is this: $ gdb $(/usr/bin/env python)
gdb) run /path/to/your/script.py and -its arguments
# wait for crash
gdb) bt full And if this method works please stay in gdb for me to instruct on more commands. |
Unfortunately, it looks pretty much the same:
|
Ouch, okay, too bad :( Can you provide your full consumer configuration? Do you know if the topic transactional messages on it? (from e.g., kafka streams, ksql) |
I think I got it fixed by adding the
There's nobody else doing transactional messages on the topic. What's worth mentioning is that we are actually consuming data from two topics. |
Well it actually stopped crashing but there's a more visible error now:
|
What versions of librdkafka have you tried? |
So far only these two: 1.1.0 and 1.2.0 |
This is in the FetchResponse parsing code, it is executed prior to the application asking for the messages, so not likely to be related. Are you building librdkafka separately or using the binary wheels of the confluent python client? |
Do you know if these messages are compressed in the topic? |
We're using the librdkafka package from Alpine and the messages are not compressed. |
Would you be ok with building your own version of librdkafka, to get some debug symbols? $ apk del librdkafka librdkafka-dev
$ apk add gcc make openssl-dev zlib-dev git # and possibly others
$
$ git clone https://github.com/edenhill/librdkafka
$ cd librdkafka
$ ./configure --disable-optimization --enable-devel --prefix=/usr
$ make
$ sudo make install
$
$ gdb $(/usr/bin/env python)
gdb) run /your/py/script.py
# wait for crash
gdb) bt full at this point keep the terminal open as I might want to extract some object fields. |
Has the root cause of this issue identified? confluent-kafka-go version 100 |
Hello! Same question about version 1.3.0. There is backtrace for failed thread:
|
Please upgrade to v1.7.0 and try to reproduce. |
@edenhill, thank you, I'll try to test our application with 1.7.0. I've analyzed coredump from 1.3.0 and found out that the thread was reading a slice with length
But there is zero-filled memory at this offset after an absolutely valid previous message:
As result we fell back to |
I've clarified the size of zero-filled memory. It's about 400 bytes of zeroes, then a partial record with a corrupted start, then valid records up to the end of the slice. The last valid record before zeroes has offset 12183895 and empty offset delta, the first valid record after the zeroes has offset 12183899. |
FWIW, we're seeing this same assertion failure in both librdkafka 1.8.2 and librdkafka 2.3.0 (in C++, not in Python).
Either as a cause of the problem, or as an effect of it — we're not sure yet — at the time of the crash there were like 2500 threads running, all starving each other out. This caused us to see this message in the logs, prior to the crash:
But we weren't actually running a super old broker version; in fact our two brokers were running Apache Kafka 2.6.1 and 2.6.3 respectively. So my working hypothesis is that "Timed out" here had nothing to do with broker version; it had to do with the thread getting starved out. We've seen librdkafka suffer from "misbehavior on unexpected CPU starvation" before — see confluentinc/librdkafka#4100 . Is it possible that, under unexpected CPU starvation, we reach a "This should never happen" codepath which then goes on to read uninitialized memory or otherwise act on garbage data, and that's how we end up in Here's our 1.8.2 backtrace (if I've picked the right thread), up to where it starts showing as
|
Description
While reading data from a consumer and writing to a sink I get the following error:
Assertion failed: rd_slice_abs_offset(new_slice) <= new_slice->end (rdbuf.c: rd_slice_narrow_copy: 1027)
No additional information is provided. One thing I've noticed is that the message is not always the last one. I've upgraded the client and librdkafka to 1.2.0 and this is still occurring.
Thank you
Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):{...}
'debug': '..'
as necessary)The text was updated successfully, but these errors were encountered: