Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processes reading from cache blocked by generational gc process #197

Closed
szajbus opened this issue Mar 13, 2023 · 1 comment · Fixed by #210
Closed

Processes reading from cache blocked by generational gc process #197

szajbus opened this issue Mar 13, 2023 · 1 comment · Fixed by #210

Comments

@szajbus
Copy link
Contributor

szajbus commented Mar 13, 2023

We use nebulex to cache responses to web requests, we use Nebulex.Adapters.Local with :ets backend.

We observed periodic spikes in response times for small number of requests, they were correlated in time with generational garbage collection.

I believe the problem has a similar root cause to #121, i.e. a race condition - gc may be started while there still are processes accessing the ets table.

In #121 it was mitigated by delaying the deletion of the ets table (so that other processes can still access it), but instead removing all its data.

However, :ets.delete_all_objects/1 is an "atomic and isolated" operation, which means that the same processes that used to crash before #121 will now need to wait until this operation finishes.

In our case, with cache of almost 1GB deleting all objects takes > 1 second, which unfortunately is a noticeable problem.

I'm happy to submit a PR with a solution, but I'm not sure what would be the best approach, some options:

  • instead of calling :ets.delete_all_obejcts/1 immediately, schedule it to happen in some time (perhaps configurable) when race condition is highly unlikely
  • don't call :ets.delete_all_obejcts/1 at all and wait until next gc process, which simply deletes the backend table
@cabol
Copy link
Owner

cabol commented Mar 22, 2023

This is a bit tricky, I agree we should avoid calling :ets.delete_all_obejcts/1, and regarding the options, I'd rather the first one schedule it to happen in some time (perhaps configurable). The problem with the other one is the GC may take some time, depending on the configured value, which means not releasing memory when we are supposed to do it. So, the first option seems to me better, but since we schedule the deletion, perhaps we can use :ets.delete/1 instead, that will be much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants