Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK][EASY] Flaky test: support to interrupt the thrift request if remote engine is broken #6172

Closed
3 of 8 tasks
pan3793 opened this issue Mar 12, 2024 · 3 comments
Closed
3 of 8 tasks
Assignees
Labels

Comments

@pan3793
Copy link
Member

pan3793 commented Mar 12, 2024

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

What kind of test improve?

  • Fix flaky tests.
  • Fix bug in tests.
  • Increase test coverage.
  • Other kinds of test improve.

Describe the issues of the existing tests or improvements for new tests

KyuubiOperationPerConnectionSuite:
- support to interrupt the thrift request if remote engine is broken *** FAILED ***
  The code passed to eventually never returned normally. Attempted 206 times over 3.011519466 seconds. Last failure message: session.client.asyncRequestInterrupted was false. (KyuubiOperationPerConnectionSuite.scala:343)

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
  • No. I cannot submit a PR at this time.
@beryllw
Copy link
Contributor

beryllw commented Apr 3, 2024

Working in something related to this, i will try to fix this.

@pan3793
Copy link
Member Author

pan3793 commented Apr 3, 2024

@beryllw thanks, this should be a side-effect of upgrading thrift 0.16

@pan3793 pan3793 changed the title [TEST] Flaky test: support to interrupt the thrift request if remote engine is broken [TASK][EASY] Flaky test: support to interrupt the thrift request if remote engine is broken Apr 3, 2024
@beryllw
Copy link
Contributor

beryllw commented Apr 30, 2024

fix upload unit test log first. #6244

beryllw added a commit to beryllw/incubator-kyuubi that referenced this issue May 9, 2024
…est immediately after marking the engine not alive
turboFei pushed a commit that referenced this issue May 9, 2024
…mediately after marking the engine not alive

Support to interrupt the thrift request immediately after marking the engine not alive

# 🔍 Description
## Issue References 🔗

This pull request fixes #6172

## Describe Your Solution 🔧

https://github.com/apache/kyuubi/blob/12c5568c9b020b1212c9514f046c67fcb267467e/kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala#L103-L110

When probe fails and exceeds engineAliveTimeout, not interrupt the thrift request immediately, only marked `remoteEngineBroken` and wait next `engineAliveProbeInterval` to interrupt.
Unit test `KyuubiOperationPerConnectionSuite` assert timeout 3s.

https://github.com/apache/kyuubi/blob/12c5568c9b020b1212c9514f046c67fcb267467e/kyuubi-server/src/test/scala/org/apache/kyuubi/operation/KyuubiOperationPerConnectionSuite.scala#L344-L346

Exception log
```
03:25:15.125 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:BD 23 DE B6 16 56 4E 2B 97 3C 09 74 89 F5 C9 26, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11886 WARN KyuubiSyncThriftClient: The engine[local-1714879506640] alive probe fails
org.apache.kyuubi.shaded.thrift.transport.TTransportException: Socket is closed by peer.
	...
03:25:16.126 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:BD 23 DE B6 16 56 4E 2B 97 3C 09 74 89 F5 C9 26, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11886 WARN KyuubiSyncThriftClient: The engine[local-1714879506640] alive probe fails
org.apache.kyuubi.shaded.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed)
	...
03:25:16.126 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:BD 23 DE B6 16 56 4E 2B 97 3C 09 74 89 F5 C9 26, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11886 ERROR KyuubiSyncThriftClient: Mark the engine[local-1714879506640] not alive with no recent alive probe success: 2001 ms exceeds timeout 1000 ms
```

Success log
```
16:57:46.859 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:B4 9D 71 83 6D 15 4D 8D BE DA 65 75 27 5D 4E D8, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11609 WARN KyuubiSyncThriftClient: The engine[local-1715101059872] alive probe fails
...
16:57:46.860 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:B4 9D 71 83 6D 15 4D 8D BE DA 65 75 27 5D 4E D8, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11609 ERROR KyuubiSyncThriftClient: Mark the engine[local-1715101059872] not alive with no recent alive probe success: 1001 ms exceeds timeout 1000 ms
16:57:47.860 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:B4 9D 71 83 6D 15 4D 8D BE DA 65 75 27 5D 4E D8, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11609 WARN KyuubiSyncThriftClient: Removing Clients for TSessionHandle(sessionId:THandleIdentifier(guid:9D AA D5 C2 9B E4 43 D7 BE 81 D0 99 EA 5B 9E 37, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38))
```

## Types of changes :bookmark:

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request :coffin:

#### Behavior With This Pull Request :tada:

#### Related Unit Tests

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6375 from beryllw/kyuubi_6172.

Closes #6172

991798b [wangjunbo] [KYUUBI #6172][TASK][EASY] Support to interrupt the thrift request immediately after marking the engine not alive

Authored-by: wangjunbo <wangjunbo@qiyi.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
(cherry picked from commit 1fc2b35)
Signed-off-by: Wang, Fei <fwang12@ebay.com>
turboFei pushed a commit that referenced this issue May 9, 2024
…mediately after marking the engine not alive

Support to interrupt the thrift request immediately after marking the engine not alive

# 🔍 Description
## Issue References 🔗

This pull request fixes #6172

## Describe Your Solution 🔧

https://github.com/apache/kyuubi/blob/12c5568c9b020b1212c9514f046c67fcb267467e/kyuubi-server/src/main/scala/org/apache/kyuubi/client/KyuubiSyncThriftClient.scala#L103-L110

When probe fails and exceeds engineAliveTimeout, not interrupt the thrift request immediately, only marked `remoteEngineBroken` and wait next `engineAliveProbeInterval` to interrupt.
Unit test `KyuubiOperationPerConnectionSuite` assert timeout 3s.

https://github.com/apache/kyuubi/blob/12c5568c9b020b1212c9514f046c67fcb267467e/kyuubi-server/src/test/scala/org/apache/kyuubi/operation/KyuubiOperationPerConnectionSuite.scala#L344-L346

Exception log
```
03:25:15.125 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:BD 23 DE B6 16 56 4E 2B 97 3C 09 74 89 F5 C9 26, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11886 WARN KyuubiSyncThriftClient: The engine[local-1714879506640] alive probe fails
org.apache.kyuubi.shaded.thrift.transport.TTransportException: Socket is closed by peer.
	...
03:25:16.126 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:BD 23 DE B6 16 56 4E 2B 97 3C 09 74 89 F5 C9 26, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11886 WARN KyuubiSyncThriftClient: The engine[local-1714879506640] alive probe fails
org.apache.kyuubi.shaded.thrift.transport.TTransportException: java.net.SocketException: Broken pipe (Write failed)
	...
03:25:16.126 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:BD 23 DE B6 16 56 4E 2B 97 3C 09 74 89 F5 C9 26, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11886 ERROR KyuubiSyncThriftClient: Mark the engine[local-1714879506640] not alive with no recent alive probe success: 2001 ms exceeds timeout 1000 ms
```

Success log
```
16:57:46.859 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:B4 9D 71 83 6D 15 4D 8D BE DA 65 75 27 5D 4E D8, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11609 WARN KyuubiSyncThriftClient: The engine[local-1715101059872] alive probe fails
...
16:57:46.860 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:B4 9D 71 83 6D 15 4D 8D BE DA 65 75 27 5D 4E D8, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11609 ERROR KyuubiSyncThriftClient: Mark the engine[local-1715101059872] not alive with no recent alive probe success: 1001 ms exceeds timeout 1000 ms
16:57:47.860 engine-alive-probe-TSessionHandle(sessionId:THandleIdentifier(guid:B4 9D 71 83 6D 15 4D 8D BE DA 65 75 27 5D 4E D8, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38)): Thread-11609 WARN KyuubiSyncThriftClient: Removing Clients for TSessionHandle(sessionId:THandleIdentifier(guid:9D AA D5 C2 9B E4 43 D7 BE 81 D0 99 EA 5B 9E 37, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38))
```

## Types of changes :bookmark:

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request :coffin:

#### Behavior With This Pull Request :tada:

#### Related Unit Tests

---

# Checklist 📝

- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

**Be nice. Be informative.**

Closes #6375 from beryllw/kyuubi_6172.

Closes #6172

991798b [wangjunbo] [KYUUBI #6172][TASK][EASY] Support to interrupt the thrift request immediately after marking the engine not alive

Authored-by: wangjunbo <wangjunbo@qiyi.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
(cherry picked from commit 1fc2b35)
Signed-off-by: Wang, Fei <fwang12@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants