Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when sending read requests #1970

Closed
merlimat opened this issue Mar 1, 2019 · 9 comments · Fixed by #2111
Closed

Exception when sending read requests #1970

merlimat opened this issue Mar 1, 2019 · 9 comments · Fixed by #2111
Labels

Comments

@merlimat
Copy link
Contributor

merlimat commented Mar 1, 2019

BUG REPORT

Describe the bug

Got reports from Pulsar users (with BK 4.9.0) of these errors:

17:24:16.302 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught 
java.lang.ArrayIndexOutOfBoundsException: -2
    at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:60) ~[com.google.guava-guava-21.0.jar:?]
    at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.sendNextRead(PendingReadOp.java:400) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
    at org.apache.bookkeeper.client.PendingReadOp$SequenceReadRequest.read(PendingReadOp.java:382) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
    at org.apache.bookkeeper.client.PendingReadOp.initiate(PendingReadOp.java:526) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
    at org.apache.bookkeeper.client.PendingReadOp.safeRun(PendingReadOp.java:536) ~[org.apache.bookkeeper-bookkeeper-server-4.9.0.jar:4.9.0]
    at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.9.0.jar:4.9.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
@eolivelli
Copy link
Contributor

Do you already have a reproducer?

@merlimat
Copy link
Contributor Author

merlimat commented Mar 1, 2019

No, I haven't seen this error myself yet

@grantwwu
Copy link

grantwwu commented Mar 1, 2019

I was the user reporting this - my current cluster is throwing this pretty much continuously - let me know if there's anything I can do.

@ivankelly
Copy link
Contributor

Relates to apache/pulsar#3715

@chairmank
Copy link

chairmank commented Mar 28, 2019

I have observed this bug with BookKeeper 4.9.0 when using the tiered storage feature of Pulsar 2.3.0.

I added logging to org.apache.bookkeeper.client.RoundRobinDistributionSchedule and found that the getWriteSet method is being called with entryId = -2. getWriteSet calls the static create method of WriteSetImpl, which calls the private reset method, which looks like this:

        private void reset(int ensembleSize, int writeQuorumSize,
                           long entryId) {
            setSize(writeQuorumSize);
            for (int w = 0; w < writeQuorumSize; w++) {
                set(w, (int) ((entryId + w) % ensembleSize));
            }
        }

The % operator returns a negative number when the first operand is negative. This is bad because (entryId + w) % ensembleSize) must be a valid integer index into a list of available bookies in the ensemble, between 0 (inclusive) and ensembleSize (exclusive).

We can avoid the resulting ArrayIndexOutOfBoundsException by replacing % with java.lang.Math.floorMod. I think that this is a good defensive change.

But changing the WriteSet implementation is not a real solution because it doesn't address the real bug: Why is entryId a negative number? I don't know whether this is a bug in Pulsar or BookKeeper.

@sijie
Copy link
Member

sijie commented Mar 28, 2019

I don't think bookkeeper should accept any reads reading entries with negative ids. there is already a defensive sanity check in readAsync call.

        // Little sanity check
        if (firstEntry < 0 || firstEntry > lastEntry) {
            LOG.error("IncorrectParameterException on ledgerId:{} firstEntry:{} lastEntry:{}",
                    ledgerId, firstEntry, lastEntry);
            return FutureUtils.exception(new BKIncorrectParameterException());
        }

@klevy-toast
Copy link

Hi. I am also seeing this, when running the some benchmarks with 100GB of backlog.

@sijie
Copy link
Member

sijie commented Mar 29, 2019

FYI I am currently looking into this issue as well. Will post updates once I got any.

sijie added a commit to sijie/pulsar that referenced this issue Jun 13, 2019
*Motivation*

2.3.x releases turn on StickyRead by default.
But StickyRead causes a lot of ArrayIndexOutOfBoundException.

See: apache/bookkeeper#1970 and apache#3715

*Modifications*

Disable sticy read by default until the bug is fixed and a new bookkeeper version is released.
@sijie
Copy link
Member

sijie commented Jun 13, 2019

for Pulsar users who are following this issue, please refer this comment to get around the issue.

The negative entry id was introduced by a random generator used by StickyRead. I am sending a pull request to fix the issue.

sijie added a commit to sijie/bookkeeper that referenced this issue Jun 13, 2019
Master Issue: apache#1970

*Motivation*

Fixes apache#1970

By default bookie uses a random generator to generate a bookie index as the sticky
read bookie index. However the random generator will generate negative numbers. Hence
it will generate negative bookie indexes in write set and cause ArrayIndexOutOfBoundException
when bookkeeper attempts to read entries.

*Modifications*

Make sure getStickyReadBookieIndex not return negative number.

*Verify this change*

This problem introduced by a random generator. It is very hard to write a unit test to reproduce this issue.
Existing StickyRead tests are good to cover this code change.
codelipenghui pushed a commit to apache/pulsar that referenced this issue Jun 14, 2019
*Motivation*

2.3.x releases turn on StickyRead by default.
But StickyRead causes a lot of ArrayIndexOutOfBoundException.

See: apache/bookkeeper#1970 and #3715

*Modifications*

Disable sticy read by default until the bug is fixed and a new bookkeeper version is released.
sijie added a commit that referenced this issue Jun 21, 2019
Descriptions of the changes in this PR:

Master Issue: #1970
Related Issues: apache/pulsar#3715 apache/pulsar#4526

*Motivation*

Fixes #1970

By default bookie uses a random generator to generate a bookie index as the sticky
read bookie index. However the random generator will generate negative numbers. Hence
it will generate negative bookie indexes in write set and cause ArrayIndexOutOfBoundException
when bookkeeper attempts to read entries.

*Modifications*

Make sure getStickyReadBookieIndex not return negative number.

*Verify this change*

This problem introduced by a random generator. It is very hard to write a unit test to reproduce this issue.
Existing StickyRead tests are good to cover this code change.




Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Jia Zhai <zhaijia@apache.org>, Yong Zhang <zhangyong1025.zy@gmail.com>

This closes #2111 from sijie/fix_sticky_read, closes #1970
sijie added a commit that referenced this issue Jun 21, 2019
Descriptions of the changes in this PR:

Master Issue: #1970
Related Issues: apache/pulsar#3715 apache/pulsar#4526

*Motivation*

Fixes #1970

By default bookie uses a random generator to generate a bookie index as the sticky
read bookie index. However the random generator will generate negative numbers. Hence
it will generate negative bookie indexes in write set and cause ArrayIndexOutOfBoundException
when bookkeeper attempts to read entries.

*Modifications*

Make sure getStickyReadBookieIndex not return negative number.

*Verify this change*

This problem introduced by a random generator. It is very hard to write a unit test to reproduce this issue.
Existing StickyRead tests are good to cover this code change.




Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Jia Zhai <zhaijia@apache.org>, Yong Zhang <zhangyong1025.zy@gmail.com>

This closes #2111 from sijie/fix_sticky_read, closes #1970
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants