Skip to content

App time calculation is incorrect #348

@domdom82

Description

@domdom82

Is this a security vulnerability?

No.

Issue

  • With the attempt details commit the calculation for the app backend time was moved from the proxy to proxy_round_tripper
  • This had a subtle side effect.
  • In the original code, the AppRequestFinishedAt timestamp was set only after all of the middleware had finished, including ReverseProxy
  • ReverseProxy first calls out to Transport (which in our case is proxy_round_tripper) just to send the request and receive a response object. -> By this time we had mistakingly set AppRequestFinishedAt already <-
  • Then, the response body is copied to the client which blocks until the entire body has been sent. -> This is where the actual time is spent on the backend, if a response body exists <-
  • In the new code, we set the AppRequestFinishedAt timestamp early on in proxy_round_tripper (which is actually the Transport used for ReverseProxy).
  • This causes the time that is needed to copy the response body to client to be attributed to gorouter_time because we subtract appTime from roundTripTime to get gorouter_time. Since appTime is so little as it misses the response body copy time, this skews gorouter_time to be much larger than it actually is.

Affected Versions

Routing-Release v0.272.0 and greater

CF-Deployment v30.6.0 and greater

Steps to Reproduce

You can easily reproduce this issue by pushing two apps that are slow in different ways:

App A is "slow" because it waits for 10s before sending a response:

func slowResponse(w http.ResponseWriter, r *http.Request) {
	time.Sleep(10 * time.Second)

	_, _ = w.Write([]byte("OK after 10s"))
}

This will yield an access_log such as this:

(...) gorouter_time:0.001506 backend_time:10.016147

App B is "slow" because it waits for 10s before sending a body:

func slowBody(w http.ResponseWriter, r *http.Request) {
	w.WriteHeader(200)

	if flusher, ok := w.(http.Flusher); ok {
		flusher.Flush()
	}

	time.Sleep(10 * time.Second)

	_, _ = w.Write([]byte("OK after 10s"))
}

This will yield an access_log such as this:

(...) gorouter_time:10.010651 backend_time:0.006369

Expected result

  • gorouter_time is very small and backend_time is about 10s for both app A and app B

Current result

  • gorouter_time is very small for app A
  • gorouter_time is about 10s for app B

Possible Fix

  • Gorouter needs to take the time for transmitting the response body into account again
  • This can be done by moving the calculation of app_time from proxy_round_tripper back to proxy as before
  • The cases where retries happened during RoundTrip because one, several or all backend instances were unavailable will also have to be taken into account. A possible solution will be to calculate this retry time separately and subtract it from app_time to get the "real" time spent at the backend app.
  • If none of the attempts worked (i.e. the app could not be reached at all), we must make sure that app_time is zero.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions