Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observed Performance Decrease in Request Filter Level with 403 Response. Suspecting there are blocked I/O in the service. #190

Closed
OshoOS opened this issue Apr 8, 2024 · 5 comments
Labels
question Further information is requested

Comments

@OshoOS
Copy link

OshoOS commented Apr 8, 2024

Describe the bug

I created a gateway service using the pingora proxy by referencing the gateway example. While I was benchmarking the service, I found that the latency of the requests that got rejected by request filter increased a lot faster than the requests that passed request filter. However, the CPU utilization remained stable. I suspect that there are blocked I/Os in the service, which got triggered by
session.respond_error(403).await;

Pingora info

Please include the following information about your environment:

Pingora version: 0.1.0
Rust version: i.e. 1.75
Operating system version: Centos 7 Kernel x86_64

Steps to reproduce

If we create a service with the gateway example, and compare the max RPS between rejected requests vs passed requests.

Max RPS for rejected requests (1,200 rps ) is a lot smaller than passed requests (3,200 rps). And the CPU utilization usage remains stable when latency is increased.

Expected results

Max RPS for rejected requests should be a lot higher than passed requests since it does not need to call upstream.
The CPU utilization should increase as the concurrent requests go up.

Observed results

Max RPS for rejected requests (1,200 rps ) is a lot smaller than passed requests (3,200 rps). And the CPU utilization usage remains stable when latency is increased.

Additional context

The tests were both completed in release mode, and the conf was

---
version: 1
threads: 4
pid_file: /run/load_balancer.pid
error_log:  /tmp/load_balancer_error.log
upgrade_sock: /tmp/load_balancer.sock
upstream_keepalive_pool_size:10240

The tests were completed by Apache Jmeter.

@vicanso
Copy link

vicanso commented Apr 8, 2024

Because session.respond_error closes the connection.

pub async fn respond_error(&mut self, error: u16) {
      let resp = match error {
          /* common error responses are pre-generated */
          502 => error_resp::HTTP_502_RESPONSE.clone(),
          400 => error_resp::HTTP_400_RESPONSE.clone(),
          _ => error_resp::gen_error_response(error),
      };

      // TODO: we shouldn't be closing downstream connections on internally generated errors
      // and possibly other upstream connect() errors (connection refused, timeout, etc)
      //
      // This change is only here because we DO NOT re-use downstream connections
      // today on these errors and we should signal to the client that pingora is dropping it
      // rather than a misleading the client with 'keep-alive'
      self.set_keepalive(None);

      self.write_response_header(Box::new(resp))
          .await
          .unwrap_or_else(|e| {
              error!("failed to send error response to downstream: {e}");
          });
  }

@vicanso
Copy link

vicanso commented Apr 8, 2024

You should use session.write_response_header and session.write_response_body instead of session.respond_error.

My performance test result:

        let path = session.req_header().uri.path();
        if path.starts_with("/login") {
            let _ = session.respond_error(403).await;
            return Ok(true);
        }
        if path.starts_with("/logout") {
            let _ = HttpResponse {
                status: StatusCode::FORBIDDEN,
                ..Default::default()
            }
            .send(session)
            .await;
            return Ok(true);
        }
wrk 'http://127.0.0.1:6188/login'
Running 10s test @ http://127.0.0.1:6188/login
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   129.34us   83.50us   3.14ms   96.79%
    Req/Sec    14.88k     7.84k   20.66k    72.73%
  16286 requests in 10.09s, 2.36MB read
  Non-2xx or 3xx responses: 16286
Requests/sec:   1613.95
Transfer/sec:    239.57KB

wrk 'http://127.0.0.1:6188/logout'
Running 10s test @ http://127.0.0.1:6188/logout
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    64.49us   22.15us   1.51ms   81.94%
    Req/Sec    75.88k     2.08k   79.69k    98.02%
  1524149 requests in 10.10s, 203.50MB read
  Non-2xx or 3xx responses: 1524149
Requests/sec: 150901.18
Transfer/sec:     20.15MB

@OshoOS
Copy link
Author

OshoOS commented Apr 9, 2024

Nice. I was able to reproduce it as well. Thank you!

@drcaramelsyrup drcaramelsyrup added the question Further information is requested label Apr 9, 2024
@drcaramelsyrup
Copy link
Contributor

Thanks @vicanso ! I will close this, let us know if there are further concerns.

@WENPIN1
Copy link

WENPIN1 commented Apr 12, 2024

Though it is a common behavior for a subsequent 'Connection: Close' header while responding a error status.
It is recommended to have a caveat (connection close) on "Return Error pages" while doing load tests.
( precaution of too many connect requests after connection close )
https://github.com/cloudflare/pingora/blob/main/docs/user_guide/modify_filter.md#return-error-pages

self.set_keepalive(None);

Thank you all !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants