Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow service start with Ruby 3 #84

Open
daipom opened this issue Feb 15, 2024 · 3 comments · May be fixed by #85
Open

Slow service start with Ruby 3 #84

daipom opened this issue Feb 15, 2024 · 3 comments · May be fixed by #85
Assignees

Comments

@daipom
Copy link

daipom commented Feb 15, 2024

Description

Service startup is noticeably slower with Ruby 3.
I'm talking about the time to reach the Running state. In other words, the time to finish StartServiceCtrlDispatcher.
With Ruby 2, it takes about 5 seconds, but with Ruby 3, it can take more than 20 seconds.
The application implementation is irrelevant.

Windows service needs to finish StartServiceCtrlDispatcher in 30 seconds by default.
With Ruby 3 or later, this problem can cause timeout errors.

EventID 7000

The testservice service failed to start due to the following error: 
The service did not respond to the start or control request in a timely fashion.

EventID 7009

A timeout was reached (30000 milliseconds) while waiting for the testservice service to connect.

This problem occurs when both of the following conditions are met.

  • Ruby 3 (Ruby 3.0.0 ~ 3.3.0)
  • Mulitple CPU
    • This problem does not occur with single CPU (such as EC2 t2.small type)

Gem Version

2.3.2

Windows Version

I confirm this problem in the following:

  • Windows 10 Home
  • Windows Server 2019 Base EC2 Instance (t2.medium, t2.large, c5.large)

Replication Case

Here is a graph of the number of seconds to start a sample service for each Ruby version:

2024-02-15_20h41_38

(I will explain 322 with patch at the end)

Measured using a minimum implementation service:

require "win32/daemon"

class TestService < Win32::Daemon
  def service_main
    while running?
      sleep 10
    end
  end
end

TestService.new.mainloop

Environment:

  • Windows 10 Home 64bit
  • 4 core CPU, 8 logical processors
    • Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99GHz
  • 16GB RAM

Steps:

  • Register the sample service above (C:\test\run.rb)
require "win32/service"

# Specify ruby bin path
ruby_path = ""

Win32::Service.create(
  service_name: "testservice",
  binary_path_name: "\"#{ruby_path}\" C:\\test\\run.rb",
  display_name: "testservice",
)
  • Measure the time with the following PowerShell script
while($true)
{
    $datetime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    echo "$datetime, Start launching the service"
    Start-Service testservice
    $datetime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    echo "$datetime, Finish launching the service"
    Start-Sleep -seconds 10
    Stop-Service testservice
    Start-Sleep -seconds 10
}
  • After measuring, delete the service.
require "win32/service"

Win32::Service.delete("testservice")

Additional Information

  • I confirm the same tendency in the AWS EC2 instance Windows Server 2019 Base t2.medium, t2.large, c5.large
  • Interestingly, this problem does not occur in t2.small
  • I think it occurs in environments with multiple CPUs
  • Maybe some implementation about threading has changed with Ruby 3 series, and this seems to be influenced by that

Possible causes and solutions

Maybe the problem is on Ruby's side, but I think it could be avoided by implementation on this side.

In the current implementation, it switches between multiple Ruby threads and non-Ruby threads.
This complex process may be unsuitable for Ruby 3.

https://github.com/chef/win32-service/blob/win32-service-2.3.2/lib/win32/daemon.rb#L257

CreateThread makes a non-Ruby thread, and it calls Ruby Proc, and it makes a Ruby thread, and ...
Also, StartServiceCtrlDispatcher creates a non-Ruby thread, and it makes a new Ruby thread again and executes Service_Main Ruby Proc...

Although this is still PoC, I found that using Ruby thread from the beginning, as shown below, was very fast.
(It is 322 with patch data in the above graph).

lib\win32\daemon.rb
@@ -254,26 +254,21 @@
         raise SystemCallError.new("CreateEvent", FFI.errno)
       end
 
-      hThread = CreateThread(nil, 0, ThreadProc, Service_Main, 0, nil)
-
-      if hThread == 0
-        raise SystemCallError.new("CreateThread", FFI.errno)
-      end
-
-      events = FFI::MemoryPointer.new(:pointer, 2)
-      events.put_pointer(0, FFI::Pointer.new(hThread))
-      events.put_pointer(FFI::Pointer.size, FFI::Pointer.new(@@hStartEvent))
-
-      while (index = WaitForMultipleObjects(2, events, 0, 1000)) == WAIT_TIMEOUT
-      end
-
-      if index == WAIT_FAILED
-        raise SystemCallError.new("WaitForMultipleObjects", FFI.errno)
-      end
-
-      # The thread exited, so the show is off.
-      if index == WAIT_OBJECT_0
-        raise "Service_Main thread exited abnormally"
+      hThread = Thread.new(Service_Main) do |service_proc|
+        ste = FFI::MemoryPointer.new(SERVICE_TABLE_ENTRY, 2)
+
+        s = SERVICE_TABLE_ENTRY.new(ste[0])
+        s[:lpServiceName] = FFI::MemoryPointer.from_string("")
+        s[:lpServiceProc] = service_proc
+
+        s = SERVICE_TABLE_ENTRY.new(ste[1])
+        s[:lpServiceName] = nil
+        s[:lpServiceProc] = nil
+
+        StartServiceCtrlDispatcher(ste)
+      end
+
+      while WaitForSingleObject(@@hStartEvent, 1000) == WAIT_TIMEOUT
       end
 
       thr = Thread.new do
@@ -322,6 +317,7 @@
       end
 
       thr.join
+      hThread.join
     end
 
     # Returns the state of the service (as an constant integer) which can be any
@daipom
Copy link
Author

daipom commented Feb 15, 2024

Is there any reason using CreateThread Win32 API to call StartServiceCtrlDispatcher?

https://github.com/chef/win32-service/blob/win32-service-2.3.2/lib/win32/daemon.rb#L257

Is it related to GVL?

@daipom
Copy link
Author

daipom commented Feb 26, 2024

Looks like this PR tries to fix the same problem.

daipom added a commit to fluent-plugins-nursery/win32-service that referenced this issue Feb 26, 2024
This does not change any specification.
This just changes to use a Ruby thread instead of `CreateThread`.
It is very very fast to use a Ruby thread from the beginning.

Service startup is noticeably slower with Ruby 3.
(Please see chef#84).

Windows service needs to finish `StartServiceCtrlDispatcher` in
30 seconds by default.
With Ruby 3 or later, this problem can cause timeout errors.

The more CPUs, the slower it tends to be.
Maybe some implementation about threading has changed with Ruby 3 series.

The current implementation switches between multiple Ruby threads and
non-Ruby threads.
This complex process may be unsuitable for Ruby 3.
`CreateThread` makes a non-Ruby thread, and it calls Ruby Proc, and it
makes a Ruby thread, and ...

Signed-off-by: Daijiro Fukuda <fukuda@clear-code.com>
@daipom daipom linked a pull request Feb 26, 2024 that will close this issue
3 tasks
@daipom
Copy link
Author

daipom commented Feb 26, 2024

I have made a PR #85 to solve this issue.
I would like to consider #85 together with #83.

@tpowell-progress tpowell-progress self-assigned this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants