Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspected resource leak in worker recovery feature #145

Open
ashie opened this issue Feb 22, 2024 · 0 comments
Open

Suspected resource leak in worker recovery feature #145

ashie opened this issue Feb 22, 2024 · 0 comments

Comments

@ashie
Copy link
Collaborator

ashie commented Feb 22, 2024

In fluent-package v5.0.2 for Windows, we observed that supervisor process finished unexpectedly after about 1 hour passed while repeating worker recovery.
After digging this issue, we found handle leak in the supervisor process.

fluent/fluent-package-builder#620 (comment)

While this situation, opened handles are continually increased, over than 8400 at last.

We got following backtrace on finishing supervisor process.

2024-02-21 15:34:23 +0900 [debug]: fluent/log.rb:341:debug: Got Win32 event "fluentd_7928_STOP_EVENT_THREAD"
Unexpected error undefined method `pid' for nil:NilClass
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:417:in `after_start'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:77:in `ensure in start_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:77:in `start_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:175:in `delayed_start_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:159:in `restart_worker'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:125:in `block in keepalive_workers'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each_with_index'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `keepalive_workers'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:58:in `run'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:50:in `run'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/server.rb:128:in `main'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:119:in `main'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:68:in `run'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:796:in `supervise'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:582:in `run_supervisor'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
  <internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  <internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
  C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
  C:/opt/fluent/bin/fluentd:32:in `load'
  C:/opt/fluent/bin/fluentd:32:in `<main>'

In the above log, the root cause is squashed by ensure. So I fetched additional backtrace of the exception:

#<Errno::EMFILE: Too many open files - dup>
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/process_manager.rb:190:in `spawn'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:413:in `spawn'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:75:in `start_worker'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:175:in `delayed_start_worker'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:159:in `restart_worker'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:125:in `block in keepalive_workers'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `each_with_index'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:102:in `keepalive_workers'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_worker_server.rb:58:in `run'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/multi_spawn_server.rb:50:in `run'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/server.rb:128:in `main'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:119:in `main'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/serverengine-2.3.2/lib/serverengine/daemon.rb:68:in `run'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:796:in `supervise'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/supervisor.rb:582:in `run_supervisor'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/lib/fluent/command/fluentd.rb:352:in `<top (required)>'
<internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
<internal:C:/opt/fluent/lib/ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:85:in `require'
C:/opt/fluent/lib/ruby/gems/3.2.0/gems/fluentd-1.16.3/bin/fluentd:15:in `<top (required)>'
C:/opt/fluent/bin/fluentd:32:in `load'
C:/opt/fluent/bin/fluentd:32:in `<main>'

Although I'm not sure it's Fluentd's issue or ServerEngine's issue yet, we'll track this issue here until we clarify the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant