Fragment cache somtimes returns null despite cached data loading initially #72

RudiBoshoff · 2021-08-30T07:12:22Z

Hi there,

ENV:
We are currently using your gem on our Rails api which passes data through to an Apollo Angular app. We originally had a pure rails solution but got better performance results with your gem.

Issue:
The problem is that sometimes the cache returns null. This doesn't happen locally but happens when we use the cache on Redis for staging and production. The cached data is returned most of the time but sometimes returns null. We have investigated as much as we could: we first thought it might be the front end but we have narrowed it down to the cache returning null.

How to reproduce:
This only works some of the time

load the site the first time to populate the cache (works)
hard refresh and reload the site (network results show that cache is working)
waiting a while (5 - 10 ish minutes) and loading the page again (returns null). The cache set to expire after 2 hours. Checking Redis shows that the key is still present when this failure occurs.

Question:
Do you have any ideas on what might be causing this or what we might be able to do? Thank you in advance.

DmitryTsepelev · 2021-09-09T14:18:36Z

Hi @RudiBoshoff!

That sounds really strange, because gem does reading in a dead simple way:

FragmentCache.cache_store.read(cache_key).tap do |cached|
  return NIL_IN_CACHE if cached.nil? && FragmentCache.cache_store.exist?(cache_key)
end

In this case FragmentCache.cache_store is a Redis adapter, and if you see that key is still there, the value should be loaded without any issues. Since you know the key, could you please try to call FragmentCache.cache_store.read(cache_key) from the console and make sure value not nil?

Reading happens here, looks like it also does nothing more that a nilcheck.

DmitryTsepelev · 2022-10-19T17:02:45Z

Closing this for now, feel free to reopen 🙂

gsdean · 2023-01-11T07:16:26Z

FWIW: we're seeing similar behavior.

Isn't there a race condition.

FragmentCache.cache_store.read(cache_key) returns nil for an uncached value.
Someone else populates the cache key
FragmentCache.cache_store.exist?(cache_key) return true (event though it wasn't cached for step 1)

Thoughts? @DmitryTsepelev

gsdean · 2023-01-12T04:09:35Z

We've been doing some testing in production with the following

def value_from_cache
  return nil unless FragmentCache.cache_store.exist?(cache_key)

  FragmentCache.cache_store.read(cache_key).tap do |cached|
    return NIL_IN_CACHE if cached.nil? && FragmentCache.cache_store.exist?(cache_key)
  end
end

It seems to have addressed the race condition

@RudiBoshoff @DmitryTsepelev I think we should reopen this

tsugitta · 2023-02-07T12:28:31Z

@gsdean
same here, can you explain how the workaround works? I understand the problem derives from fetching cache multiple times, but is it safe to fetch cache multiple times in that way?

gsdean · 2023-02-07T16:17:36Z

The workaround, isn't perfect. There is still a chance a race condition could cause a misread from the cache (specifically when an eviction occurs after the initial read, and then the cache is repopulated). Basically it reduces the risk of value_from_cahe incorrectly returning nil from probable to less probable. I'll let you decide if that is safe or not.

For what it's worth we've tested both with and without this patch in a high load/concurrency production environment and without this patch we see numerous intermittent errors where nil is returned incorrectly. With the patch we have yet to have an error.

DmitryTsepelev · 2023-02-07T19:29:45Z

Sorry guys I don't know how but github does not sends me events from some projects. I'll take a look tomorrow morning

DmitryTsepelev · 2023-02-08T13:57:25Z

If I understand correctly, the problem is that race condition makes cache to return nil even if it's not nil anymore. This is only possible when it was nil before cache was rewritten. I think this behavior is kinda acceptable, because user not gets wrong data, it's just outdated, and the tradeoff of the cache is to have fast data vs. fresh data.

I guess we could add it to the docs and call it a day, what do you think?

gsdean · 2023-02-08T14:41:05Z

The cache returns the wrong value not an outdated value. I do not believe documentation alone is sufficient. In a moderately concurrent deployment, this is a serious issue.

tsugitta · 2023-02-08T14:42:22Z

If I understand correctly, the problem is that race condition makes cache to return nil even if it's not nil anymore.

in my situation, the problem seems that race condition makes cache to return nil even though it has never been nil before.

In reality, there are only two possibilities: the cache exists and is filled with values, or the cache does not exist. However, it seems to behave as if the cache exists and the value is nil.

DmitryTsepelev · 2023-02-08T15:53:40Z

The only way to synchronize threads here is using some kind of lock: we could make it configurable and let users to opt–in. It will fix the issue completely but will slow down things a bit. WDYT?

DmitryTsepelev · 2023-02-08T15:57:43Z

Also, sounds like we could change the behavior to keep something different rather than nil in cache. In this case we will have no problems with differentiating scenarios "cache contains nil" and "cache does not have this key"

gsdean · 2023-02-08T16:16:50Z

My vote would be to either

Just take the hit on the extra read (as we've done in our workaround). This is a fairly pragmatic solution. Maybe start here and add an issue to revisit the extra read.

OR

Avoid caching nil by default (as you've suggested). Perhaps we could also add a parameter cache_nil: true to preserve compatibility and use the extra read from 1.

I would actively discourage locks. As you've said they would be slow, but also very hard to manage in a multi-server deployment such as the one we have in production (load balancer in front of multiple web servers).

tsugitta · 2023-02-08T17:32:42Z

I'm not familiar with the cache API or instrumentation API, but it seems that the read method records whether or not there is a cache hit, so by using it, can't we call read only once without calling exist? ?

https://github.com/rails/rails/blob/v7.0.4.2/activesupport/lib/active_support/cache.rb#L379

gsdean · 2023-02-08T17:35:35Z

@tsugitta I agree, this is the most correct fix, but it seemed like a big change 🤷

DmitryTsepelev · 2023-02-08T17:54:05Z

Big change is not an issue, we can make a major release. The problem I'm trying to solve is that read returns nil in either when there's no such key at all (so we need to run the resolver method) and when there is a key with nil (so we want to return nil without running the resultion again). We also could add an option to force re–resolution when there's nil for any reason (e.g., resolve_when_nil: true).

Do you want to create the PR for the workaround?

tsugitta · 2023-02-09T13:23:30Z

The problem I'm trying to solve is that read returns nil in either when there's no such key at all (so we need to run the resolver method) and when there is a key with nil (so we want to return nil without running the resultion again).

Yes, I'm also trying to solve the same problem. As I commented above, read seems to notify us with Instrumentation whether there is a cache hit or not, so by looking at it, we can determine whether the cache doesn't exist or the cache exists and the value is nil.
In addition, for the case where nil is regarded as a valid cache value, there seems to be an advantage that the two requests for read and exist? can be reduced to one.

Aside from that, the workaround seems great. I think it would be nice if I could submit a PR, but I haven't figured out the structure of the library yet, so I don't know where to start now and it may take some time. So at least I don't think I am the right person.

tsugitta · 2023-02-10T06:29:47Z

I had overlooked that the version I was using was outdated. After upgrading this library from 1.9.1 to 1.18.1 (and Rails from 6.1.4.4 to 7.0.4), this problem, which used to happen dozens of times a day in our application that accesses cache frequently, never happened again (has never been reported to Sentry).

DmitryTsepelev · 2023-02-12T12:06:57Z

Interesting 🤔I didn't remember we addressed it here in the gem, maybe something in Rails had changed for better

gsdean · 2023-06-03T12:17:03Z

As far as I can tell this is still an issue. Any thoughts on the PR? It would be really great to either merge that or leave this open until it’s resolved.

DmitryTsepelev closed this as completed Oct 19, 2022

gsdean mentioned this issue Feb 6, 2023

Clarification on cache_key options #98

Closed

DmitryTsepelev reopened this Feb 7, 2023

DmitryTsepelev closed this as completed Jun 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fragment cache somtimes returns null despite cached data loading initially #72

Fragment cache somtimes returns null despite cached data loading initially #72

RudiBoshoff commented Aug 30, 2021

DmitryTsepelev commented Sep 9, 2021

DmitryTsepelev commented Oct 19, 2022

gsdean commented Jan 11, 2023 •

edited

Loading

gsdean commented Jan 12, 2023 •

edited

Loading

tsugitta commented Feb 7, 2023

gsdean commented Feb 7, 2023

DmitryTsepelev commented Feb 7, 2023

DmitryTsepelev commented Feb 8, 2023

gsdean commented Feb 8, 2023

tsugitta commented Feb 8, 2023 •

edited

Loading

DmitryTsepelev commented Feb 8, 2023

DmitryTsepelev commented Feb 8, 2023

gsdean commented Feb 8, 2023

tsugitta commented Feb 8, 2023 •

edited

Loading

gsdean commented Feb 8, 2023

DmitryTsepelev commented Feb 8, 2023

tsugitta commented Feb 9, 2023

tsugitta commented Feb 10, 2023 •

edited

Loading

DmitryTsepelev commented Feb 12, 2023

gsdean commented Jun 3, 2023

Fragment cache somtimes returns null despite cached data loading initially #72

Fragment cache somtimes returns null despite cached data loading initially #72

Comments

RudiBoshoff commented Aug 30, 2021

DmitryTsepelev commented Sep 9, 2021

DmitryTsepelev commented Oct 19, 2022

gsdean commented Jan 11, 2023 • edited Loading

gsdean commented Jan 12, 2023 • edited Loading

tsugitta commented Feb 7, 2023

gsdean commented Feb 7, 2023

DmitryTsepelev commented Feb 7, 2023

DmitryTsepelev commented Feb 8, 2023

gsdean commented Feb 8, 2023

tsugitta commented Feb 8, 2023 • edited Loading

DmitryTsepelev commented Feb 8, 2023

DmitryTsepelev commented Feb 8, 2023

gsdean commented Feb 8, 2023

tsugitta commented Feb 8, 2023 • edited Loading

gsdean commented Feb 8, 2023

DmitryTsepelev commented Feb 8, 2023

tsugitta commented Feb 9, 2023

tsugitta commented Feb 10, 2023 • edited Loading

DmitryTsepelev commented Feb 12, 2023

gsdean commented Jun 3, 2023

gsdean commented Jan 11, 2023 •

edited

Loading

gsdean commented Jan 12, 2023 •

edited

Loading

tsugitta commented Feb 8, 2023 •

edited

Loading

tsugitta commented Feb 8, 2023 •

edited

Loading

tsugitta commented Feb 10, 2023 •

edited

Loading