Gateway errors in production #443

esheehan-gsl · 2023-11-15T21:06:50Z

Describe the bug

We keep getting 502 Bad Gateway and 504 Gate Timeout errors in production. The timeout typically happens on the time series endpoint when you load the application. This seems to trigger an out of memory error that causes Kubernetes to kill the container. While Kubernetes is managing the containers, you start to see the bad gateway errors for the entire application and all of the data endpoints.

esheehan-gsl · 2023-11-15T22:04:12Z

The working theory is that the history endpoint runs out of RAM because we have no limits on how far back we pull data, so we end up with all of the data in the store, which just increases over time. Meaning this endpoint will require increasing amounts of RAM over time.

Limit the amount of data read in for the historical data to just two weeks prior to the initialization time. This should reduce memory usage in production and allow the application to continue working, solving #443 (I hope). In future, we may make this range configurable by users, instead of hard-coding a two week limit.

esheehan-gsl added the bug Something isn't working label Nov 15, 2023

esheehan-gsl added this to the Cycle 2023.5 milestone Nov 15, 2023

esheehan-gsl self-assigned this Nov 15, 2023

esheehan-gsl mentioned this issue Nov 16, 2023

Limit time series display to 2 weeks #446

Merged

esheehan-gsl linked a pull request Nov 16, 2023 that will close this issue

Limit time series display to 2 weeks #446

Merged

esheehan-gsl closed this as completed in #446 Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway errors in production #443

Gateway errors in production #443

esheehan-gsl commented Nov 15, 2023

esheehan-gsl commented Nov 15, 2023

Gateway errors in production #443

Gateway errors in production #443

Comments

esheehan-gsl commented Nov 15, 2023

Describe the bug

esheehan-gsl commented Nov 15, 2023