Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] better work with large query responses #385

Closed
kasimtj opened this issue Dec 25, 2023 · 7 comments
Closed

[Feature] better work with large query responses #385

kasimtj opened this issue Dec 25, 2023 · 7 comments

Comments

@kasimtj
Copy link
Contributor

kasimtj commented Dec 25, 2023

Is your feature request related to a problem? Please describe.
We had a problem with memory consumption of chproxy and found 2 problems.

  1. Reading large payload from Redis creates 2MiB buffers in cycle and garbage collector unable to free memory properly
    Screenshot 2023-12-22 at 10 48 40

  2. If proxy request failes or gets canceled chproxy tries to extract error reason and large tmp file with partial response will be read into memory
    Screenshot 2023-12-25 at 12 23 34

Describe the solution you'd like

  1. We solved first problem with max_payload_size, we just stopped caching large responses. But it would be good to reuse buffers instead of creating new ones, altho it seems like limitations of go-redis library
  2. Setting wait_end_of_query=1 in Clickhouse may solve our problem, but it will not save us from reading large tmp files entirely and increases latency of proxy requests. Possible solutions are:
  • not reading tmp file if response code is 200 (no partial response will be read)
  • having a limit to error reason same as max_payload_size

Describe alternatives you've considered

Additional context

@kasimtj
Copy link
Contributor Author

kasimtj commented Dec 25, 2023

Example for second problem solve #386

@mga-chka
Copy link
Collaborator

mga-chka commented Jan 3, 2024

Hi, it took a quick look at the code.
Indeed, the code will fetch data from redis by bocks of 2MBytes for large queries. Since Redis doesn't provide an API to reuse a buffer, the following call creates a new 2Mbytes buffer at each cycle.

That being said, it should be garbage collected and I don't see where in the code a memory leak could be. Just to be sure, did you check during your test that the garbage collector was triggered?

nb: I'll take a look at your fix of pb 2 this week.

@kasimtj
Copy link
Contributor Author

kasimtj commented Jan 8, 2024

Hi, thank you for a response, fixing these bugs is really important for our team now

With redis we think the main problem is that garbage collector is not triggered immediately so heap grows much faster than it's been cleaned. We will run additional tests on it, maybe profiling showed total memory used and not peak memory usage

If your team is ok with the general idea of fix to the second problem I can send proper PR with tests and etc.

@mga-chka
Copy link
Collaborator

mga-chka commented Jan 8, 2024

Yes we're ok with your second fix.
If you do the changes I asked in your PR, we'll merge the query.

@mga-chka
Copy link
Collaborator

I'm closing this issue since you have a fix for 1 and your fix for 2 will soon be merged.
Feel free to repeon it if needed

@Blokje5
Copy link
Collaborator

Blokje5 commented Jan 16, 2024

@kasimtj 1.26.0 is released and includes your fix.

@kasimtj
Copy link
Contributor Author

kasimtj commented Jan 17, 2024

@kasimtj 1.26.0 is released and includes your fix.

Thanks a lot for a quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants