Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][fn] Fix Deadlock in Functions Worker LeaderService #21711

Merged
merged 2 commits into from Dec 12, 2023

Conversation

Technoboy-
Copy link
Contributor

Fixes #21501

Motivation

No need to synchronized the method isLeader in LeaderService

See the deadlock stack :

"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@Technoboy- Technoboy- self-assigned this Dec 12, 2023
@Technoboy- Technoboy- added this to the 3.2.0 milestone Dec 12, 2023
@Technoboy- Technoboy- changed the title [fix][function] Fix Deadlock in Functions Worker LeaderService [fix][fn] Fix Deadlock in Functions Worker LeaderService Dec 12, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 12, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #21711 (b0f8772) into master (495b141) will increase coverage by 36.63%.
Report is 2 commits behind head on master.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #21711       +/-   ##
=============================================
+ Coverage     36.75%   73.39%   +36.63%     
- Complexity    12271    32761    +20490     
=============================================
  Files          1717     1893      +176     
  Lines        131197   140714     +9517     
  Branches      14339    15502     +1163     
=============================================
+ Hits          48220   103270    +55050     
+ Misses        76596    29335    -47261     
- Partials       6381     8109     +1728     
Flag Coverage Δ
inttests 24.10% <100.00%> (-0.01%) ⬇️
systests 24.79% <100.00%> (+0.01%) ⬆️
unittests 72.67% <100.00%> (+40.80%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../apache/pulsar/functions/worker/LeaderService.java 82.66% <100.00%> (+8.00%) ⬆️

... and 1463 files with indirect coverage changes

@liangyepianzhou liangyepianzhou merged commit 3396065 into apache:master Dec 12, 2023
48 checks passed
Technoboy- added a commit that referenced this pull request Dec 14, 2023
Fixes #21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```
Technoboy- added a commit that referenced this pull request Jan 3, 2024
Fixes #21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```
Technoboy- added a commit that referenced this pull request Jan 3, 2024
Fixes #21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 4, 2024
Fixes apache#21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```

(cherry picked from commit ac11655)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jan 8, 2024
Fixes apache#21501

### Motivation

No need to `synchronized` the method `isLeader` in LeaderService

See the deadlock stack :
```
"pulsar-external-listener-44525-1":
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.giveupLeadership(FunctionMetaDataManager.java)
	- waiting to lock <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.LeaderService.becameInactive(LeaderService.java:167)
	- locked <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$activeConsumerChanged$27(ConsumerImpl.java:1136)
	at org.apache.pulsar.client.impl.ConsumerImpl$$Lambda$2606/0x00007f854ce9cb10.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.8.1/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.8.1/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.8.1/Thread.java:833)
"pulsar-web-44514-6":
	at org.apache.pulsar.functions.worker.LeaderService.isLeader(LeaderService.java)
	- waiting to lock <0x000010001344c6d8> (a org.apache.pulsar.functions.worker.LeaderService)
	at org.apache.pulsar.functions.worker.SchedulerManager.scheduleInternal(SchedulerManager.java:200)
	at org.apache.pulsar.functions.worker.SchedulerManager.schedule(SchedulerManager.java:229)
	at org.apache.pulsar.functions.worker.FunctionMetaDataManager.updateFunctionOnLeader(FunctionMetaDataManager.java:251)
	- locked <0x0000100013535c90> (a org.apache.pulsar.functions.worker.FunctionMetaDataManager)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.internalProcessFunctionRequest(ComponentImpl.java:1775)
	at org.apache.pulsar.functions.worker.rest.api.ComponentImpl.updateRequest(ComponentImpl.java:996)
	at org.apache.pulsar.functions.worker.rest.api.FunctionsImpl.registerFunction(FunctionsImpl.java:222)
	at org.apache.pulsar.broker.admin.impl.FunctionsBase.registerFunction(FunctionsBase.java:196)
```

(cherry picked from commit ac11655)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Deadlock in Functions Worker service at shutdown in tests
6 participants