Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive CPU use in idle mode #14015

Closed
rustyx opened this issue Jan 28, 2022 · 12 comments · Fixed by #15212
Closed

Excessive CPU use in idle mode #14015

rustyx opened this issue Jan 28, 2022 · 12 comments · Fixed by #15212
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@rustyx
Copy link

rustyx commented Jan 28, 2022

The Pulsar 2.9.1 Docker container is using ~10..20% CPU in idle mode.

Pulsar Excessive CPU use in idle

  81.0% - 2,384 s io.netty.util.concurrent.FastThreadLocalRunnable.run
  80.7% - 2,376 s io.netty.util.internal.ThreadExecutorMap$2.run
  80.7% - 2,376 s io.netty.util.concurrent.SingleThreadEventExecutor$4.run
  80.7% - 2,376 s io.netty.channel.epoll.EpollEventLoop.run
  47.5% - 1,398 s io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange
  47.5% - 1,398 s io.netty.channel.epoll.Native.epollWait
  47.5% - 1,398 s io.netty.channel.epoll.Native.epollWait
  47.5% - 1,398 s io.netty.channel.epoll.Native.epollWait
  33.2% - 978 s io.netty.channel.epoll.EpollEventLoop.epollWait
  33.2% - 978 s io.netty.channel.epoll.Native.epollWait
  33.2% - 978 s io.netty.channel.epoll.Native.epollWait0
  0.0% - 17,667 µs io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks
  0.0% - 9,033 µs io.netty.channel.nio.NioEventLoop.run
  0.2% - 6,377 ms java.util.concurrent.ThreadPoolExecutor$Worker.run
  0.0% - 1,118 ms io.netty.util.HashedWheelTimer$Worker.run
  16.6% - 489 s io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run
  16.6% - 489 s io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run
  16.6% - 489 s io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run
  16.6% - 489 s io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run
  9.5% - 279 s io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.epollWait
  7.1% - 209 s io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.epollWaitNoTimerChange

To Reproduce
Steps to reproduce the behavior:

  1. docker run -d -p 6650:6650 -p 8080:8080 apachepulsar/pulsar:2.9.1 bin/pulsar standalone
  2. Add a postgres source: pulsar-admin source localrun --source-config-file pg.yaml (not sure if this is relevant)
    pg.yaml:
    tenant: "public"
    namespace: "default"
    name: "pg-source"
    topicName: "pg-topic"
    archive: "/mnt/pulsar-io-debezium-postgres-2.9.1.nar"
    parallelism: 1
    configs:
      plugin.name: "pgoutput"
      database.hostname: "postgres"
      database.port: "5432"
      database.user: "postgres"
      database.password: "postgres"
      database.dbname: "postgres"
      database.server.name: "postgres"
      schema.whitelist: "public"
      database.history.pulsar.service.url: "pulsar://localhost:6650"
    
  3. Wait a couple of minutes
  4. Notice 10..20% cpu use in the java process inside the container.
    Running top inside the container:
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
        1 root      20   0   10.5g   1.1g  34452 S  14.0   9.1   3:27.74 java
      711 root      20   0    4244   3528   2988 S   0.0   0.0   0:00.04 bash
      724 root      20   0    6124   3288   2760 R   0.0   0.0   0:00.00 top
    

Desktop (please complete the following information):

  • OS: Windows 10 x64, WSL2 Linux x64 running Docker

Logs:
Couldn't see anything wrong in the logs. See attached:
pulsar.log

@rustyx rustyx added the type/bug The PR fixed a bug or issue reported a bug label Jan 28, 2022
@lhotari
Copy link
Member

lhotari commented Jan 28, 2022

Possibly related: #12045, #4722, #9958

@lhotari
Copy link
Member

lhotari commented Jan 28, 2022

@rustyx Do you experience the same behavior with Pulsar 2.8.2 and Pulsar 2.7.4 ?

@rustyx
Copy link
Author

rustyx commented Jan 28, 2022

Yes, exact same issue with 2.8.2 and 2.7.4.

docker run -d -p 6650:6650 -p 8080:8080 --name pulsar apachepulsar/pulsar:2.8.2 bin/pulsar standalone

(wait 5..20 mins)

docker exec -it pulsar top

top - 17:12:41 up 1 day, 23:03,  0 users,  load average: 0.37, 0.35, 0.34
Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.3 us,  1.3 sy,  0.0 ni, 96.9 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem :  12688.4 total,    208.0 free,   2197.7 used,  10282.7 buff/cache
MiB Swap:   4096.0 total,   4095.0 free,      1.0 used.   9767.9 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    1 root      20   0   10.5g   1.1g  34052 S  13.0   9.2   2:02.98 java
 1178 root      20   0    6084   3196   2700 R   0.0   0.0   0:00.03 top

@nicoloboschi
Copy link
Contributor

Maybe this can help #13981
From your dump it does not look the same fixed here, but it's worth to try

Are you able to run the test from current master ? Otherwise I can push a docker image later

@nicoloboschi
Copy link
Contributor

I tried it locally, same behaviour with current master

@lhotari
Copy link
Member

lhotari commented Jan 29, 2022

#9958 (comment)

It seems that adjusting managedLedgerCacheEvictionFrequency configuration value is one way to workaround the performance issue. The description of this configuration setting is "Configure the cache eviction frequency for the managed ledger cache (evictions/sec)" , default value is 100.0 .

Please try running with the value 1000 to see if CPU usage decreases

docker run -it --name pulsar -p 8080:8080 -p 6650:6650 \
 -e  managedLedgerCacheEvictionFrequency=1000 apachepulsar/pulsar:2.8.2 bash -c \
"bin/apply-config-from-env.py conf/standalone.conf && bin/pulsar standalone --no-functions-worker --no-stream-storage"

@lhotari
Copy link
Member

lhotari commented Jan 29, 2022

Might be related to #3650 .
Try passing -e PULSAR_GC=-XX:+UseG1GC.

@lhotari
Copy link
Member

lhotari commented Jan 29, 2022

--no-functions-worker --no-stream-storage were also in the command line example I provided. Those disable the functions worker and the stream storage which aren't used in many cases.

@rustyx
Copy link
Author

rustyx commented Jan 31, 2022

Did some tests, not much improvement from managedLedgerCacheEvictionFrequency=1000 and PULSAR_GC=-XX:+UseG1GC.
--no-functions-worker --no-stream-storage together bring ~15% CPU usage down to ~4%. Around 2/3 of that improvement seems to come from --no-stream-storage alone.
Still, it is not ideal that an idle server uses 4% CPU, it drains the battery..

@Jason918
Copy link
Contributor

Jason918 commented Feb 7, 2022

Could related to #14125.
epollWait runs actively on idle is mostly because we have started too many scheduled task in EventLoop.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@lhotari
Copy link
Member

lhotari commented Mar 13, 2022

This might be related to https://twitter.com/normanmaurer/status/1499778839563718664

Much ❤️ for people who include flamegraphs in issues when reporting performance problems in #netty. This one was a nice one netty/netty#11695 … And here is the proposed fix netty/netty#12145

The fix is scheduled for Netty 4.1.76.Final release.

lhotari added a commit that referenced this issue Apr 19, 2022
)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes #14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this issue Apr 20, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004
lhotari added a commit to datastax/pulsar that referenced this issue Apr 20, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
gaoran10 pushed a commit that referenced this issue Apr 21, 2022
)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes #14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
gaoran10 pushed a commit that referenced this issue Apr 21, 2022
)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes #14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
lhotari added a commit to datastax/pulsar that referenced this issue Apr 22, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
lhotari added a commit to datastax/pulsar that referenced this issue Apr 22, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
codelipenghui pushed a commit that referenced this issue Apr 28, 2022
)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes #14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
nicoloboschi pushed a commit to datastax/pulsar that referenced this issue Apr 28, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
Jason918 pushed a commit to Jason918/pulsar that referenced this issue Aug 1, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
Jason918 pushed a commit to Jason918/pulsar that referenced this issue Aug 1, 2022
…che#15212)

* Upgrade Netty to 4.1.76.Final and Netty Tcnative to 2.0.51.Final

Fixes apache#14015
- release notes https://netty.io/news/2022/04/12/4-1-76-Final.html
  - contains fix for netty/netty#11695

* Upgrade grpc to 1.45.1 and protobuf to 3.19.2

- grpc < 1.45.1 is not compatible with Netty > 4.1.74.Final
  - grpc/grpc-java#9004

(cherry picked from commit 332a3c7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants