perf: reduce allocs when creating $_SERVER #540

dunglas · 2024-02-01T16:31:16Z

The main idea is to allocate only one time (Go-side) the memory needed to populate $_SERVER environment variables.

All the strings.Clone() calls are a workaround for golang/go#65286 (comment), they will not be necessary anymore in Go 1.22.

On my machine, this implementation is around 5% faster than the current one (BenchmarkServerSuperGlobal, introduced in #539).

Please note that when running Go benchmarks, the alloc counter doesn't take into account C allocations.

dunglas · 2024-02-01T22:50:42Z

I did some more benchmarks using the k6 script we provide, and they are non-conclusive (the gap is small and the winner varies on each run...). @withinboredom, would you mind checking if you notice any improvement or deterioration using your benchmarks?

In the meantime, let's mark this patch as a draft.

withinboredom · 2024-02-12T08:42:28Z

I also looked into optimizing this back when I was digging into the memory leak. Looking at flame graphs, most our overhead these days is in go -> c -> go stack switches and optimizing anything else will be negligible. Any way we could reduce that would speed up frankenphp by quite a bit.

I was using tinygo for a bit there, which can compile the entire stack onto the llvm and it sped things up quite a bit. However, we're using some incompatible cgo features now, so it won't compile.

dunglas · 2024-02-12T09:24:50Z

That's weird because according to recent benchmarks, cgo has now a negligible overhead as long as we batch calls (and we do).

withinboredom · 2024-02-12T11:37:57Z

Oh yeah, that's to say the stack switching is still ridiculously fast, not that it's slow. But back then that was literally the slowest part (which is a good thing).

dunglas · 2024-02-12T22:00:17Z

With the latest changes, the gains are more significant: 32% fewer allocations and slightly less memory used in Go (probably much less in C but it's hard to measure).

Before:

goos: darwin
goarch: arm64
pkg: github.com/dunglas/frankenphp
BenchmarkServerSuperGlobal
BenchmarkServerSuperGlobal-10    	    6571	    178113 ns/op	   19977 B/op	      93 allocs/op
PASS
ok  	github.com/dunglas/frankenphp	2.435s

After:

goos: darwin
goarch: arm64
pkg: github.com/dunglas/frankenphp
BenchmarkServerSuperGlobal
BenchmarkServerSuperGlobal-10    	    6859	    173809 ns/op	   19955 B/op	      63 allocs/op
PASS
ok  	github.com/dunglas/frankenphp	2.325s

cgi.go

frankenphp.c

frankenphp.go

cgi.go

worker.go

dunglas · 2024-02-15T09:01:59Z

K6 benchmark on a Macbook Pro (M1 Pro):

Before:

     execution: local
        script: load-test.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 200
     ✓ is echoed

     checks.........................: 100.00% ✓ 435614      ✗ 0     
     data_received..................: 910 MB  30 MB/s
     data_sent......................: 1.1 GB  36 MB/s
     http_req_blocked...............: avg=2.1µs    min=0s     med=0s      max=4.59ms  p(90)=1µs     p(95)=1µs    
     http_req_connecting............: avg=1.2µs    min=0s     med=0s      max=3.06ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=13.72ms  min=1.25ms med=13.43ms max=67.75ms p(90)=15.46ms p(95)=16.43ms
       { expected_response:true }...: avg=13.72ms  min=1.25ms med=13.43ms max=67.75ms p(90)=15.46ms p(95)=16.43ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 217807
     http_req_receiving.............: avg=881.68µs min=8µs    med=527µs   max=40.65ms p(90)=1.68ms  p(95)=2.38ms 
     http_req_sending...............: avg=14.61µs  min=8µs    med=12µs    max=3.28ms  p(90)=19µs    p(95)=23µs   
     http_req_tls_handshaking.......: avg=0s       min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=12.83ms  min=444µs  med=12.67ms max=63.73ms p(90)=14.21ms p(95)=14.82ms
     http_reqs......................: 217807  7256.993086/s
     iteration_duration.............: avg=13.77ms  min=1.36ms med=13.48ms max=67.8ms  p(90)=15.51ms p(95)=16.48ms
     iterations.....................: 217807  7256.993086/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 


running (0m30.0s), 000/100 VUs, 217807 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  30s

After:

     execution: local
        script: load-test.js
        output: -

     scenarios: (100.00%) 1 scenario, 100 max VUs, 1m0s max duration (incl. graceful stop):
              * default: 100 looping VUs for 30s (gracefulStop: 30s)


     ✓ is status 200
     ✓ is echoed

     checks.........................: 100.00% ✓ 437732      ✗ 0     
     data_received..................: 915 MB  31 MB/s
     data_sent......................: 1.1 GB  36 MB/s
     http_req_blocked...............: avg=2.2µs    min=0s     med=0s      max=4.43ms  p(90)=1µs     p(95)=1µs    
     http_req_connecting............: avg=1.21µs   min=0s     med=0s      max=2.97ms  p(90)=0s      p(95)=0s     
     http_req_duration..............: avg=13.65ms  min=1.33ms med=13.38ms max=60.16ms p(90)=15.36ms p(95)=16.25ms
       { expected_response:true }...: avg=13.65ms  min=1.33ms med=13.38ms max=60.16ms p(90)=15.36ms p(95)=16.25ms
     http_req_failed................: 0.00%   ✓ 0           ✗ 218866
     http_req_receiving.............: avg=878.69µs min=8µs    med=522µs   max=23.27ms p(90)=1.63ms  p(95)=2.32ms 
     http_req_sending...............: avg=14.71µs  min=7µs    med=12µs    max=3.78ms  p(90)=19µs    p(95)=23µs   
     http_req_tls_handshaking.......: avg=0s       min=0s     med=0s      max=0s      p(90)=0s      p(95)=0s     
     http_req_waiting...............: avg=12.76ms  min=305µs  med=12.63ms max=57.49ms p(90)=14.17ms p(95)=14.71ms
     http_reqs......................: 218866  7292.430161/s
     iteration_duration.............: avg=13.7ms   min=1.37ms med=13.43ms max=60.21ms p(90)=15.41ms p(95)=16.31ms
     iterations.....................: 218866  7292.430161/s
     vus............................: 100     min=100       max=100 
     vus_max........................: 100     min=100       max=100 


running (0m30.0s), 000/100 VUs, 218866 complete and 0 interrupted iterations
default ✓ [======================================] 100 VUs  30s

(0,5% improvement). Memory usage seems improved too.

It would be nice if someone could try the benchmark on Linux.

I will also try to use some sync.Pool to prevent memory allocations.

ChrisRiddell · 2024-02-19T02:41:47Z

(0,5% improvement). Memory usage seems improved too.

It would be nice if someone could try the benchmark on Linux.

Where about is the k6 file located to run the same benchmark but on linux?

withinboredom · 2024-02-19T07:21:58Z

In test-data/load-test.js is usually the one I use. I won't be able to test it out until after I get back from vacation at the end of the week.

dunglas · 2024-03-05T14:46:44Z

@maypok86 sorry to bother you again (and I hope that it's not for nothing) but it looks like the latest failure is also related to Otter. It may be this known Go bug: https://pkg.go.dev/sync/atomic#pkg-note-BUG

maypok86 · 2024-03-05T16:03:31Z

@maypok86 sorry to bother you again (and I hope that it's not for nothing) but it looks like the latest failure is also related to Otter. It may be this known Go bug: https://pkg.go.dev/sync/atomic#pkg-note-BUG

@dunglas Damn, yeah, I completely forgot about that when refactoring and there were no tests on 32-bit archs.

Try the go get -u github.com/maypok86/otter@dev version, it seems to pass tests on 32-bit architecture.

dunglas · 2024-03-05T16:57:23Z

@maypok86 thanks for this swift fix!! I bumped Otter, let's see if the tests are green.

maypok86 · 2024-03-05T21:06:03Z

The tests seem to have passed, then I'll create a release with the bug fix now.

maypok86 · 2024-03-05T21:23:32Z

Done.

dunglas requested a review from withinboredom February 1, 2024 16:31

dunglas changed the title ~~refactor: prevent C allocs when populating $_SERVER~~ perf: reduce allocs when creating $_SERVER Feb 1, 2024

dunglas marked this pull request as draft February 1, 2024 22:50

dunglas force-pushed the refactor/env-var-creation branch 2 times, most recently from f6aea4d to 9b05728 Compare February 12, 2024 10:46

dunglas force-pushed the refactor/env-var-creation branch 2 times, most recently from cfc4cb9 to 3dfc3e2 Compare February 12, 2024 21:55

dunglas mentioned this pull request Feb 13, 2024

[BUG] crash because of a bad use of unsafe.Pointer? maypok86/otter#54

Closed

withinboredom reviewed Feb 13, 2024

View reviewed changes

cgi.go Outdated Show resolved Hide resolved

frankenphp.c Outdated Show resolved Hide resolved

frankenphp.go Show resolved Hide resolved

cgi.go Show resolved Hide resolved

worker.go Show resolved Hide resolved

dunglas force-pushed the refactor/env-var-creation branch 2 times, most recently from 45a1d3b to aeac7b1 Compare February 13, 2024 23:32

dunglas force-pushed the refactor/env-var-creation branch from 814c82b to 01ad3e5 Compare March 4, 2024 22:36

dunglas force-pushed the refactor/env-var-creation branch from 5639a8a to 1e28fc6 Compare March 5, 2024 16:58

dunglas force-pushed the refactor/env-var-creation branch from 1e28fc6 to 5de011e Compare March 11, 2024 15:41

dunglas marked this pull request as ready for review March 11, 2024 15:42

dunglas added 2 commits March 12, 2024 11:47

perf: reduce allocs when creating $_SERVER

4be2736

improve

8de33ad

dunglas added 20 commits March 12, 2024 11:47

refactor: prevent C allocs when populating $_SERVER

61d0530

cs

b54d6a9

remove append()

99e2e84

simplify

d07b8a2

wip

0c8e082

cleanup

9eb5737

add cache

bc81abc

cleanup otter init

cd1b6b7

some fixes

00ea9ff

cleanup

a36c4f0

test with a leak

b16461c

remove const?

1c47d02

add const

f90e5a7

wip

3e86f43

wip

4c92aa2

allocate dynamic variables in Go memory

1c2e8bd

cleanup

518b511

typo

366f681

bump otter

80ef108

chore: bump deps

a80bc95

dunglas force-pushed the refactor/env-var-creation branch from 5de011e to a80bc95 Compare March 12, 2024 10:47

dunglas merged commit 07a74e5 into main Mar 12, 2024
41 checks passed

dunglas deleted the refactor/env-var-creation branch March 12, 2024 17:31

dunglas mentioned this pull request May 21, 2024

$_SERVER['FRANKENPHP_WORKER'] must not be NULL-terminated #809

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce allocs when creating $_SERVER #540

perf: reduce allocs when creating $_SERVER #540

dunglas commented Feb 1, 2024 •

edited

Loading

dunglas commented Feb 1, 2024

withinboredom commented Feb 12, 2024

dunglas commented Feb 12, 2024

withinboredom commented Feb 12, 2024

dunglas commented Feb 12, 2024

dunglas commented Feb 15, 2024

ChrisRiddell commented Feb 19, 2024

withinboredom commented Feb 19, 2024

dunglas commented Mar 5, 2024 •

edited

Loading

maypok86 commented Mar 5, 2024

dunglas commented Mar 5, 2024

maypok86 commented Mar 5, 2024

maypok86 commented Mar 5, 2024

perf: reduce allocs when creating $_SERVER #540

perf: reduce allocs when creating $_SERVER #540

Conversation

dunglas commented Feb 1, 2024 • edited Loading

dunglas commented Feb 1, 2024

withinboredom commented Feb 12, 2024

dunglas commented Feb 12, 2024

withinboredom commented Feb 12, 2024

dunglas commented Feb 12, 2024

dunglas commented Feb 15, 2024

ChrisRiddell commented Feb 19, 2024

withinboredom commented Feb 19, 2024

dunglas commented Mar 5, 2024 • edited Loading

maypok86 commented Mar 5, 2024

dunglas commented Mar 5, 2024

maypok86 commented Mar 5, 2024

maypok86 commented Mar 5, 2024

dunglas commented Feb 1, 2024 •

edited

Loading

dunglas commented Mar 5, 2024 •

edited

Loading