Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crystal::System::Process#spawn forks, should call posix_spawn() or posix_spawnp() instead #11337

Open
BrucePerens opened this issue Oct 18, 2021 · 7 comments

Comments

@BrucePerens
Copy link

pid = self.fork(will_exec: true)

On Unix lookalikes, the posix_spawn() and posix_spawnp() functions should be available, which would allow this to be implemented without fork(). Obviously fork() is going to break multithread, and even though this implementation hints that it will exec(), there still seems to be unacceptable overhead for large-memory processes.

@straight-shoota
Copy link
Member

According to the manpage, posix_spawn is intended as a workaround when fork + exec is unavailable. It's apparently not supposed to be a replacement on systems where that already works. And in fact, it works pretty similar to our Process.spawn implementation. Only difference on Linux seems to be that it uses Linux-specific clone instead of fork.

So I'm not sure what actual benefit we would get from using posix_spawn. And it doesn't appear to me how it would work different in a multithreaded environment.

@j8r
Copy link
Contributor

j8r commented Oct 19, 2021

Here is a related Golang issue with useful insights: golang/go#5838.
Even if posix_spawn is not used, using directly clone or vfork can be considered to improve performance. It has been done in Go: https://go-review.googlesource.com/c/go/+/37439/.

@asterite
Copy link
Member

Oooh...! fork doesn't work at all in interpreted mode, but I don't know why. Maybe we could at least avoid it in that scenario?

@BrucePerens
Copy link
Author

BrucePerens commented Oct 19, 2021

I wrote a benchmark, using clone on Linux, which attempts to do exactly what Process#new does. It runs 5 times faster. If you remove the 3GB of allocated RAM, there is a 2x difference - still significant but the effect of a large resident set is evident. Add even more memory, and the present Crystal implementation slows down linearly, so it seems that the present Crystal implementation spends a significant amount of time copying pages. The Linux clone seems to have more to it than just vfork.

$ time ./spawn 0
Running version 0

real	0m0.759s
user	0m0.497s
sys	0m0.484s
bruce@server:~/Software/UserClub/userclub$ !2444
$ time ./spawn 1
Running version 1

real	0m0.134s
user	0m0.012s
sys	0m0.096s

lib LibC
  fun clone(func : Void* -> LibC::Int, stack : Void*, flags : LibC::Int) : LibC::Int
  fun execv(LibC::Char*, LibC::Char**) : LibC::Int 
  fun dup2(LibC::Int, LibC::Int) : LibC::Int
end

def do_exec(opaque : Void*) : LibC::Int
  LibC.dup2(STDOUT.fd, 1)
  command = "/bin/echo\0".to_unsafe
  args = Array(LibC::Char*).new
  args << command
  args << "-n\0".to_unsafe
  args << "\0".to_unsafe
  args << Pointer(LibC::Char).new(0)
  error = LibC.execv(command, args.to_unsafe)
  return error
end

CLONE_VFORK = 0x00004000
CLONE_VM = 0x00000100

stack_array = Array(LibC::Long).new(1000)
# I am told that everything Linux runs upon except for PowerPC grows the stack downwards.
stack = stack_array.to_unsafe + stack_array.size

big_array1 = Array(UInt8).new(1024 * 1024 * 1024)
# Linux reserves pages when they are allocated, and allocates them when they are written.
big_array1.map! { 0u8 }
big_array2 = big_array1.clone
big_array3 = big_array2.clone

if ARGV.size > 0
  version = ARGV[0].to_i
else
  version = 0
end
STDERR.puts "Running version #{version}"
case version
when 0
  1000.times do
    Process.new(command: "/bin/echo", args: ["-n", ""], output: STDOUT)
  end
when 1
  1000.times do
    LibC.clone(->(opaque : Void *){ do_exec(opaque) }, stack, CLONE_VFORK|CLONE_VM)
  end
end

@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 21, 2021

The change in go that @j8r referred to had some pretty happy users: https://about.gitlab.com/blog/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/

Aside: Personally I'd prefer to simply not use fork for anything in the codebase - it is not portable and it doesn't work all that well with the runtime (ie, it prohibits the GC from being incremental and complicates anything that relates to files or the event loop). That said those issues probably doesn't matter for this particular issue :)

@stakach
Copy link
Contributor

stakach commented Oct 29, 2021

Also in the pro-vfork camp, having out-of-memory issues when trying to shell out to a really small executable (plenty of memory available)

ignore me, probably just some weird overcommit settings issues on a particular clients server. Lots of processes failing all over the place with out-of-memory and upwards of 8GB of memory free
(p.s. it was an overcommit setting)

@BrucePerens
Copy link
Author

BrucePerens commented Oct 29, 2021

@stakach Crystal is using vfork, the issue is that clone is faster. posix_spawn uses clone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants