Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Deprecate ContentCache.invalidateAll #10494

Merged
merged 3 commits into from
Oct 14, 2024
Merged

Conversation

findepi
Copy link
Member

@findepi findepi commented Jun 14, 2024

This method does only best-effort invalidation and is susceptible to a
race condition. If the caller changed the state that could be cached
(perhaps files on the storage) and calls this method, there is no
guarantee that the cache will not contain stale entries some time after
this method returns. This is a similar problem as the one described at
google/guava#1881. ContentCache doesn't use
a Guava Cache, it uses Caffeine. Caffeine offers partial solution to
this issue, but not for invalidateAll call. To avoid accidental
incorrect use of ContentCache, deprecate the invalidateAll method,
which can be deceptive for the caller and remove it later.

@github-actions github-actions bot added the core label Jun 14, 2024
@findepi
Copy link
Member Author

findepi commented Jun 14, 2024

BTW i did a lot of work on google/guava#1881 problem in Trino.
The result of this was the EvictableCache library in Trino https://github.com/trinodb/trino/blob/8b2c12e83c039025c2cbac28987d5b9829560538/lib/trino-cache/src/main/java/io/trino/cache/EvictableCache.java (which i also made available for people outside of Trino as https://github.com/findepi/evictable-cache)

@findepi findepi force-pushed the findepi/cc branch 2 times, most recently from c12502c to 1567b6e Compare June 14, 2024 17:14
@findepi findepi changed the title Remove ContentCache.invalidateAll Deprecate ContentCache.invalidateAll Jun 14, 2024
@findepi findepi requested review from rdblue and Fokko June 14, 2024 17:44
@findepi
Copy link
Member Author

findepi commented Jun 14, 2024

cc @rizaon @szehon-ho

@rizaon
Copy link
Contributor

rizaon commented Jun 14, 2024

Hi @findepi, thank you for looking into this. google/guava#1881 also came to my attention recently in an Impala code review.

I don't mind marking invalidateAll as deprecated and remove it later. Impala does not actively call this method and relies on time and byte based eviction.
https://github.com/apache/impala/blob/9f5fbcd841a87b144f3410891fe6bf1f8fe4288c/testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.py#L129-L134
Although, I think it is less of an issue in Iceberg since Iceberg manifest files are immutable?

Can you also review if ManifestFiles.dropCache is safe? Maybe we can document that dropping the whole ContentCache should achieve the same goal as ContentCache.invalidateAll.

/** Drop manifest file cache object for a FileIO if exists. */
public static synchronized void dropCache(FileIO fileIO) {
CONTENT_CACHES.invalidate(fileIO);
CONTENT_CACHES.cleanUp();
}

@findepi
Copy link
Member Author

findepi commented Jun 14, 2024

@rizaon thank you for your feedback!

Although, I think it is less of an issue in Iceberg since Iceberg manifest files are immutable?

Good point, I also thought about this. The ContentCache class is designed as generic purpose cache of file contents. Given immutability, we shouldn't need invalidation at all.

Can you also review if ManifestFiles.dropCache is safe?

Thanks for asking! I don't see a problem with that method

@findepi
Copy link
Member Author

findepi commented Jun 27, 2024

Thank you @rizaon @jbonofre @ajantha-bhat for your review!

@amogh-jahagirdar @jackye1995 can you please take a look?

@findepi
Copy link
Member Author

findepi commented Jul 16, 2024

@amogh-jahagirdar please take another look, thanks!

@findepi
Copy link
Member Author

findepi commented Jul 19, 2024

BTW Trino challenges around cache invalidation led me to believe this topic is complex and deserving a presentation. there was one at the Trino summit 2023 (slides).

This method does only best-effort invalidation and is susceptible to a
race condition. If the caller changed the state that could be cached
(perhaps files on the storage) and calls this method, there is no
guarantee that the cache will not contain stale entries some time after
this method returns. This is a similar problem as the one described at
google/guava#1881. `ContentCache` doesn't use
a Guava Cache, it uses Caffeine. Caffeine offers partial solution to
this issue, but not for `invalidateAll` call.  To avoid accidental
incorrect use of ContentCache, deprecate the `invalidateAll` method,
which can be deceptive for the caller and remove it later.
@findepi findepi requested review from nastra, RussellSpitzer and stevenzwu and removed request for amogh-jahagirdar October 6, 2024 20:03
@RussellSpitzer
Copy link
Member

@findepi This seems like a great find, is there anything we need to do right now though? I just want to make sure we aren't leaving open any issues by only marking it as deprecated?

@findepi
Copy link
Member Author

findepi commented Oct 14, 2024

thank you @RussellSpitzer for your review.
i think this one could deserve a fix: #10493

@findepi findepi changed the title Deprecate ContentCache.invalidateAll Core: Deprecate ContentCache.invalidateAll Oct 14, 2024
@findepi findepi merged commit 6fdc69a into apache:main Oct 14, 2024
49 checks passed
@findepi findepi deleted the findepi/cc branch October 14, 2024 19:57
public void invalidate(String key) {
cache.invalidate(key);
}

/**
* @deprecated since 1.6.0, will be removed in 1.7.0; This method does only best-effort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this deprecation message still accurate? Since it says it was deprecated in 1.6.0 but actually it was deprecated with 1.7.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks! see #11325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants