Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart a HTTP server on schema updates #5

Merged
merged 12 commits into from
Nov 5, 2021
Merged

Conversation

Geal
Copy link
Contributor

@Geal Geal commented Nov 4, 2021

In a previous version of the code, the schema that was shared between the state machine
and the HTTP server behind a lock was removed, so schema update was
broken.
We want to fix it without reintroducing a lock in the middle of the hot
path for queries. Our solution here is to launch a new HTTP server with
the new schema configuration on schema updates, as launching a server is
very cheap.

We need to replace warp's HTTP server with our own loop though, to get
the ability to reuse the TCP listener socket from one server to the next
and avoid losing TCP connections

it is still missing the graceful shutdown feature and some options on the HTTP session

Checklist for merge

  • Every team member had read and analyze the mechanism and understand how this work. Somehow deep understanding.
  • Benchmarks have be re-ran and the perf are at least equivalent to what was before
  • last pass on renaming/commenting code to clarify it

@Geal
Copy link
Contributor Author

Geal commented Nov 5, 2021

that pull request introduced a perf regression, that should now be fixed with 02a51d3, and more than fixed, because performance actually improves compared to the main branch.

I have been testing locally with the rust based subservices, and limiting the router to 2 threads so I can saturate it more easily:

-#[tokio::main]
-async fn main() -> Result<()> {
+//#[tokio::main]
+fn main() -> Result<()> {
+    let runtime = tokio::runtime::Builder::new_multi_thread()
+        .enable_all()
+        .worker_threads(2)
+        .build()
+        .unwrap();
+    runtime.block_on(rt_main())
+}
+
+async fn rt_main() -> Result<()> {

I was benchmarking using hey:

hey -m POST -H 'Content-type: application/json' \
  -d '{"query":"query { products(limit: 2) { review { body } related { price price1 price2 price3 } } }\n","price price1 price2 price3 variables":{}}' \
  http://127.0.0.1:4000/

benchmark results: 200000 requests, 200 concurrent clients

main

Summary:                                                  
  Total:        21.4130 secs                              
  Slowest:      0.0432 secs                               
  Fastest:      0.0016 secs                               
  Average:      0.0213 secs                               
  Requests/sec: 9340.1292                                 
                                                          
                                                          
Response time histogram:                                  
  0.002 [1]     |                                         
  0.006 [309]   |                                         
  0.010 [1982]  |■                                        
  0.014 [12292] |■■■■■■■■                                 
  0.018 [38284] |■■■■■■■■■■■■■■■■■■■■■■■■                 
  0.022 [63965] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 
  0.027 [54758] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■       
  0.031 [22980] |■■■■■■■■■■■■■■                           
  0.035 [4872]  |■■■                                      
  0.039 [544]   |                                         
  0.043 [13]    |        
                                 
Latency distribution:
  10% in 0.0149 secs 
  25% in 0.0180 secs 
  50% in 0.0213 secs 
  75% in 0.0246 secs 
  90% in 0.0275 secs 
  95% in 0.0293 secs 
  99% in 0.0327 secs 

PR

Summary:                                                         
  Total:        20.3336 secs                                     
  Slowest:      0.2140 secs                                      
  Fastest:      0.0013 secs                                      
  Average:      0.0201 secs                                      
  Requests/sec: 9835.9168                                        
                                                                 
                                                                 
Response time histogram:                                         
  0.001 [1]     |                                                
  0.023 [133232]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.044 [27366]    |■■■■■■■■                                        
  0.065 [39186]    |■■■■■■■■■■■■
  0.086 [13]    |      
  0.108 [6]     |
  0.129 [5]     |      
  0.150 [10]    |                                                
  0.171 [78]    |                                                
  0.193 [85]    |                                                
  0.214 [18]    |   

Latency distribution:
  10% in 0.0061 secs
  25% in 0.0071 secs
  50% in 0.0085 secs
  75% in 0.0430 secs
  90% in 0.0454 secs
  95% in 0.0470 secs
  99% in 0.0494 secs

benchmark results: 200000 requests, 400 concurrent clients

main

Summary:                                                         
  Total:        21.5871 secs                                     
  Slowest:      0.6736 secs                                      
  Fastest:      0.0010 secs                                      
  Average:      0.0418 secs                                      
  Requests/sec: 9264.7878                                        
                                                                 
                                                                 
Response time histogram:                                         
  0.001 [1]     |                                                
  0.068 [173710] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.135 [9496]     |■■                                              
  0.203 [11967]   |■■■                                             
  0.270 [3806]     |■                                               
  0.337 [714]   |                                                
  0.405 [221]   |                                                
  0.472 [66]    |                                                
  0.539 [15]    |                                                
  0.606 [2]     |                                                
  0.674 [2]     |                                                

Latency distribution:
  10% in 0.0151 secs 
  25% in 0.0195 secs 
  50% in 0.0251 secs 
  75% in 0.0320 secs 
  90% in 0.1205 secs 
  95% in 0.1670 secs 
  99% in 0.2423 secs 

PR

Summary:            
  Total:        22.5670 secs
  Slowest:      0.3571 secs
  Fastest:      0.0011 secs
  Average:      0.0432 secs         
  Requests/sec: 8862.5084                                         
                                                                  
                                                                  
Response time histogram:                                          
  0.001 [1]     |                                                 
  0.037 [108173] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.072 [69116]   |■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.108 [9664]     |■■■■                          
  0.143 [7103]     |■■■
  0.179 [3767]     |■
  0.215 [1489]     |■
  0.250 [475]   |                                                 
  0.286 [149]   |                                                 
  0.321 [48]    |                                                 
  0.357 [15]    |

Latency distribution:
  10% in 0.0167 secs
  25% in 0.0222 secs
  50% in 0.0326 secs
  75% in 0.0496 secs
  90% in 0.0823 secs
  95% in 0.1206 secs
  99% in 0.1816 secs

benchmark results: 200000 requests, 800 concurrent clients

main

Summary:                                                          
  Total:        22.1926 secs                                      
  Slowest:      1.5100 secs                                       
  Fastest:      0.0011 secs                                       
  Average:      0.0829 secs                                       
  Requests/sec: 9012.0180                                         
                                                                  
                                                                  
Response time histogram:                                          
  0.001 [1]     |                                                 
  0.152 [172874] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 
  0.303 [2524]     |■                                                
  0.454 [12209]   |■■■                                              
  0.605 [10396]   |■■                                               
  0.756 [982]   |                                                 
  0.906 [562]   |                                                 
  1.057 [342]   |                                                 
  1.208 [82]    |                                                 
  1.359 [14]    |                                                 
  1.510 [14]    |                                                 

Latency distribution: 
  10% in 0.0157 secs  
  25% in 0.0200 secs  
  50% in 0.0256 secs  
  75% in 0.0330 secs  
  90% in 0.3754 secs  
  95% in 0.4750 secs  
  99% in 0.6042 secs  

PR

Summary:             
  Total:        22.4960 secs
  Slowest:      1.3021 secs
  Fastest:      0.0008 secs
  Average:      0.0836 secs
  Requests/sec: 8890.4800
                    
                       
Response time histogram:
  0.001 [1]     |                              
  0.131 [170423] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.261 [2849]     |■                                                
  0.391 [14959]   |■■■■                                             
  0.521 [9903]     |■■                                               
  0.651 [662]   |                                                 
  0.782 [728]   |
  0.912 [404]   |                              
  1.042 [47]    |                              
  1.172 [10]    |
  1.302 [14]    |

Latency distribution:
  10% in 0.0170 secs
  25% in 0.0224 secs
  50% in 0.0326 secs
  75% in 0.0502 secs
  90% in 0.3258 secs
  95% in 0.4050 secs
  99% in 0.5106 secs

Those results are interesting: with 200 clients, the PR has slightly higher latency, but much higher throughput, while with 400 clients, the PR has lower throughput but a lower latency distribution. This is good, because for a high scalabillity server, we want to optimize for the case of a high number of clients doing few queries rather than a low number of clients doing a lot of queries.

(those are 20s long benchmarks, so not enough data, but from what I've seen, results do not change when running the benchmark longer, and anyway a benchmark on localhost on a laptop is not enough anyway)

next: running that in the gatling benchmarks

@Geal
Copy link
Contributor Author

Geal commented Nov 5, 2021

Current benchmarks with 50 fields, breadth 1, depth 2, backend response time 40ms:
main branch at 3de1a12, the breaking point is at 10099 rps

Screenshot from 2021-11-05 12-39-02

this PR at 02a51d3, the breaking point is at 10659 rps

Screenshot from 2021-11-05 12-39-37

Both of them keep a very stable p99 latency until the breaking point.

With this I declare that this PR can be merged 🥳

Copy link
Contributor

@o0Ignition0o o0Ignition0o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

Geal and others added 12 commits November 5, 2021 15:19
Fixes #235
In a previous PR, the schema that was shared between the state machine
and the HTTP server behind a lock was removed, so schema update was
broken.
We want to fix it without reintroducing a lock in the middle of the hot
path for queries. Our solution here is to launch a new HTTP server with
the new schema configuration on schema updates, as launching a server is
very cheap.

We need to replace warp's HTTP server with our own loop though, to get
the ability to reuse the TCP listener socket from one server to the next
and avoid losing TCP connections
Since we do not have access to the private hyper structs and traits used
to implement it (Graceful, Watch, Exec, ConnStreamExec...), it is
challenging to make a struct wrapper for a Connection, especially if we
want to satisfy the bounds of
https://docs.rs/hyper/0.14.13/hyper/server/conn/struct.Connection.html#method.graceful_shutdown

What we can do, though, is to select over the shutdown watcher and the
connection:
- if the connection finishes first, exit there
- if the shutdown watcher exits first, call graceful_shutdown() on the
  connection then await on the connection
this will help in isolating the TcpListener dance
this matches more the initial API, with only a oneshot::Sender<()> in
HttpServerHandle. Everything else is handled internally in the
implementation of WarpHttpServerFactory
move the socket unwrap to the spawned session task

that way, if that accept() call failed, it only affects the current
session and not future ones
Co-authored-by: Cecile Tonglet <cecile.tonglet@cecton.com>
Example case: the new configuration sets up override addresses for
backend services, so the HttpServiceRegistry used by the graph fetcher
must be recreated
maybe those are causing a perf regression
it is much faster than watch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants