Skip to content

Use READ_ENTRY_SCHEDULING_DELAY to stable stickyReadsWithFailures#3628

Merged
hangc0276 merged 1 commit into
apache:masterfrom
wenbingshen:wenbing/fixStickyReadsWithFailuresTest
Dec 7, 2022
Merged

Use READ_ENTRY_SCHEDULING_DELAY to stable stickyReadsWithFailures#3628
hangc0276 merged 1 commit into
apache:masterfrom
wenbingshen:wenbing/fixStickyReadsWithFailuresTest

Conversation

@wenbingshen
Copy link
Copy Markdown
Member

@wenbingshen wenbingshen commented Nov 9, 2022

Motivation

I found the following flaky-test: org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures:
https://github.com/apache/bookkeeper/actions/runs/3367374609/jobs/5584792353

Error:  Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.925 s <<< FAILURE! - in org.apache.bookkeeper.bookie.BookieStickyReadsTest
Error:  org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures  Time elapsed: 1.752 s  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index: -1, Size: 3
	at java.base/java.util.LinkedList.checkElementIndex(LinkedList.java:559)
	at java.base/java.util.LinkedList.get(LinkedList.java:480)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.serverByIndex(BookKeeperClusterTestCase.java:369)
	at org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures(BookieStickyReadsTest.java:153)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:829)

In the stickyReadsWithFailures test, the client successfully reads the entry, but the READ_ENTRY_REQUEST metric does not get bonuses. After reading the READ_ENTRY_REQUESTupdate logic, I found the metric updating after netty channel sends the response successfully, the metric is updated through the ChannelFutureListener callback, and the asynchronous update causes the above test to fail.

protected void sendResponse(StatusCode code, Object response, OpStatsLogger statsLogger) {
         ......
         if (channel.isActive()) {
            channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
                @Override
                public void operationComplete(ChannelFuture future) throws Exception {
                    long writeElapsedNanos = MathUtils.elapsedNanos(writeNanos);
                    if (!future.isSuccess()) {
                        requestProcessor.getRequestStats().getChannelWriteStats()
                                .registerFailedEvent(writeElapsedNanos, TimeUnit.NANOSECONDS);
                    } else {
                        requestProcessor.getRequestStats().getChannelWriteStats()
                                .registerSuccessfulEvent(writeElapsedNanos, TimeUnit.NANOSECONDS);
                    }
                    if (StatusCode.EOK == code) {
                        statsLogger.registerSuccessfulEvent(MathUtils.elapsedNanos(enqueueNanos), TimeUnit.NANOSECONDS);
                    } else {
                        statsLogger.registerFailedEvent(MathUtils.elapsedNanos(enqueueNanos), TimeUnit.NANOSECONDS);
                    }
                }
            });
        }
         ......
}

Changes

The READ_ENTRY_SCHEDULING_DELAY metric is processed before the Read request is processed, which proved that bookie receives the read request from the client and can well meet the needs of BookieStickyReadsTest.

This makes the BookieStickyReadsTest test more stabled.

@wenbingshen
Copy link
Copy Markdown
Member Author

ping @hangc0276 @dlg99 @zymap @shoothzj PTAL. Thanks.

@wenbingshen
Copy link
Copy Markdown
Member Author

ping @zymap @hangc0276 @shoothzj @dlg99 If you have time, can you help take a look. Thanks.

@hangc0276 hangc0276 merged commit eda9c42 into apache:master Dec 7, 2022
yaalsn pushed a commit to yaalsn/bookkeeper that referenced this pull request Jan 30, 2023
…ache#3628)

### Motivation
I found the following flaky-test: org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures:
https://github.com/apache/bookkeeper/actions/runs/3367374609/jobs/5584792353
```
Error:  Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.925 s <<< FAILURE! - in org.apache.bookkeeper.bookie.BookieStickyReadsTest
Error:  org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures  Time elapsed: 1.752 s  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index: -1, Size: 3
	at java.base/java.util.LinkedList.checkElementIndex(LinkedList.java:559)
	at java.base/java.util.LinkedList.get(LinkedList.java:480)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.serverByIndex(BookKeeperClusterTestCase.java:369)
	at org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures(BookieStickyReadsTest.java:153)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:829)
```

In the stickyReadsWithFailures test, the client successfully reads the entry, but the `READ_ENTRY_REQUEST` metric does not get bonuses. After reading the `READ_ENTRY_REQUEST`update logic, I found the metric updating after netty channel sends the response successfully, the metric is updated through the `ChannelFutureListener` callback, and the asynchronous update causes the above test to fail.

```java
protected void sendResponse(StatusCode code, Object response, OpStatsLogger statsLogger) {
         ......
         if (channel.isActive()) {
            channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
                @OverRide
                public void operationComplete(ChannelFuture future) throws Exception {
                    long writeElapsedNanos = MathUtils.elapsedNanos(writeNanos);
                    if (!future.isSuccess()) {
                        requestProcessor.getRequestStats().getChannelWriteStats()
                                .registerFailedEvent(writeElapsedNanos, TimeUnit.NANOSECONDS);
                    } else {
                        requestProcessor.getRequestStats().getChannelWriteStats()
                                .registerSuccessfulEvent(writeElapsedNanos, TimeUnit.NANOSECONDS);
                    }
                    if (StatusCode.EOK == code) {
                        statsLogger.registerSuccessfulEvent(MathUtils.elapsedNanos(enqueueNanos), TimeUnit.NANOSECONDS);
                    } else {
                        statsLogger.registerFailedEvent(MathUtils.elapsedNanos(enqueueNanos), TimeUnit.NANOSECONDS);
                    }
                }
            });
        }
         ......
}
```

### Changes

The `READ_ENTRY_SCHEDULING_DELAY` metric is processed before the Read request is processed, which proved that bookie receives the read request from the client and can well meet the needs of `BookieStickyReadsTest`.

This makes the `BookieStickyReadsTest` test more stabled.
@hangc0276 hangc0276 added this to the 4.16.0 milestone Mar 21, 2023
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
…ache#3628)

### Motivation
I found the following flaky-test: org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures:
https://github.com/apache/bookkeeper/actions/runs/3367374609/jobs/5584792353
```
Error:  Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.925 s <<< FAILURE! - in org.apache.bookkeeper.bookie.BookieStickyReadsTest
Error:  org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures  Time elapsed: 1.752 s  <<< ERROR!
java.lang.IndexOutOfBoundsException: Index: -1, Size: 3
	at java.base/java.util.LinkedList.checkElementIndex(LinkedList.java:559)
	at java.base/java.util.LinkedList.get(LinkedList.java:480)
	at org.apache.bookkeeper.test.BookKeeperClusterTestCase.serverByIndex(BookKeeperClusterTestCase.java:369)
	at org.apache.bookkeeper.bookie.BookieStickyReadsTest.stickyReadsWithFailures(BookieStickyReadsTest.java:153)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:829)
```

In the stickyReadsWithFailures test, the client successfully reads the entry, but the `READ_ENTRY_REQUEST` metric does not get bonuses. After reading the `READ_ENTRY_REQUEST`update logic, I found the metric updating after netty channel sends the response successfully, the metric is updated through the `ChannelFutureListener` callback, and the asynchronous update causes the above test to fail.

```java
protected void sendResponse(StatusCode code, Object response, OpStatsLogger statsLogger) {
         ......
         if (channel.isActive()) {
            channel.writeAndFlush(response).addListener(new ChannelFutureListener() {
                @OverRide
                public void operationComplete(ChannelFuture future) throws Exception {
                    long writeElapsedNanos = MathUtils.elapsedNanos(writeNanos);
                    if (!future.isSuccess()) {
                        requestProcessor.getRequestStats().getChannelWriteStats()
                                .registerFailedEvent(writeElapsedNanos, TimeUnit.NANOSECONDS);
                    } else {
                        requestProcessor.getRequestStats().getChannelWriteStats()
                                .registerSuccessfulEvent(writeElapsedNanos, TimeUnit.NANOSECONDS);
                    }
                    if (StatusCode.EOK == code) {
                        statsLogger.registerSuccessfulEvent(MathUtils.elapsedNanos(enqueueNanos), TimeUnit.NANOSECONDS);
                    } else {
                        statsLogger.registerFailedEvent(MathUtils.elapsedNanos(enqueueNanos), TimeUnit.NANOSECONDS);
                    }
                }
            });
        }
         ......
}
```

### Changes

The `READ_ENTRY_SCHEDULING_DELAY` metric is processed before the Read request is processed, which proved that bookie receives the read request from the client and can well meet the needs of `BookieStickyReadsTest`.

This makes the `BookieStickyReadsTest` test more stabled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants