Skip to content

OAK-11946 : migrate cache APIs to Caffeine#2807

Open
rishabhdaim wants to merge 7 commits intotrunkfrom
OAK-11946
Open

OAK-11946 : migrate cache APIs to Caffeine#2807
rishabhdaim wants to merge 7 commits intotrunkfrom
OAK-11946

Conversation

@rishabhdaim
Copy link
Contributor

@rishabhdaim rishabhdaim commented Mar 19, 2026

Summary

Migrates all Guava cache API usages to Caffeine (com.github.benmanes.caffeine) across the Oak codebase.

Changes

  • oak-core-spi: CacheLIRS, CacheStats, AbstractCacheStats, EmpiricalWeigher — replace Guava Cache/CacheBuilder/Weigher/CacheStats; bump package-info.java version
  • oak-store-document: DocumentNodeStore, DocumentNodeStoreBuilder, NodeDocumentCache, NodeCache, PersistentCache, CachingCommitValueResolver, LocalDiffCache, MemoryDiffCache, TieredDiffCache, NodeDocument, ForwardingListener, EvictionListener — replace Guava cache imports; adapt CallableFunction, reload()asyncReload()
  • oak-segment-tar: SegmentCache, ReaderCache, RecordCache, WriterCacheManager, PriorityCache, CacheWeights, RecordCacheStats, SegmentCacheStats — replace Guava cache imports; add .executor(Runnable::run) and cleanUp() for synchronous eviction
  • oak-blob-plugins: FileCache, CompositeDataStoreCache, UploadStagingCache, CachingBlobStore, AbstractSharedCachingDataStore, DataStoreBlobStore — replace Guava Cache/CacheBuilder/CacheLoader/AbstractCache/Weigher/RemovalCause
  • oak-blob: BlobIdSet — replace CacheBuilder with Caffeine
  • oak-blob-cloud: S3Backend — replace CacheBuilder with Caffeine
  • oak-blob-cloud-azure: AzureBlobStoreBackend, AzureBlobStoreBackendV8 — replace CacheBuilder with Caffeine
  • oak-search: ExtractedTextCache — replace Guava Cache/CacheBuilder/Weigher with Caffeine
  • oak-search-elastic: ElasticIndexStatistics — replace CacheBuilder/CacheLoader/LoadingCache/Ticker; reload()asyncReload() returning CompletableFuture; getUnchecked()get()
  • oak-run-commons: DocumentNodeStoreHelper — replace Guava Cache import
  • oak-benchmarks: PersistentCacheTest — replace Guava Cache import
  • oak-core: CacheStatsMetricsTest — adapt to Caffeine CacheStats.of()
  • pom.xml (oak-blob, oak-blob-cloud, oak-blob-cloud-azure, oak-search, oak-search-elastic, oak-run-commons, oak-benchmarks) — add Caffeine dependency

Test Plan

  • mvn test -pl oak-core-spi — CacheLIRS, CacheStats tests
  • mvn test -pl oak-store-document — NodeCache, DocumentNodeStore, persistent cache tests
  • mvn test -pl oak-segment-tar — SegmentCache, PriorityCache tests
  • mvn test -pl oak-blob-plugins — FileCache, CompositeDataStoreCache tests
  • mvn test -pl oak-search-elastic — ElasticIndexStatistics tests

Links

Copy link
Contributor

@reschke reschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We should do this in smaller pieces; easier to review. Let's start with oak-store-document.
  2. If we just replace we just swapped one present problem with a potential future one. We need to get rid of Oak APIs that depend on an implementation that we do not control. In practice this means that we need to define an Oak caching API, and implement wrappers.

@rishabhdaim
Copy link
Contributor Author

@reschke how would you suggest to split this ? we are only removing one guava import and rest all the linked to it.

@rishabhdaim rishabhdaim requested a review from reschke March 21, 2026 03:45
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
62.3% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

@jsedding
Copy link
Contributor

I share the concern of @reschke. Do we really want all our modules (or many of them) to depend on caffeine? Do we know if caffeine follows semantic versioning of its OSGi exports?

Defining an Oak API and implementing it with a (thin) wrapper would compartmentalize this issue to a single module. I would suggest for this wrapper to live in oak-commons (or in a separate module). The implementing bundle could just embed caffeine, without exporting it. This would allow us to update or swap out the implementation without any impact on downstream users.

@rishabhdaim
Copy link
Contributor Author

rishabhdaim commented Mar 23, 2026

I share the concern of @reschke. Do we really want all our modules (or many of them) to depend on caffeine? Do we know if caffeine follows semantic versioning of its OSGi exports?

Defining an Oak API and implementing it with a (thin) wrapper would compartmentalize this issue to a single module. I would suggest for this wrapper to live in oak-commons (or in a separate module). The implementing bundle could just embed caffeine, without exporting it. This would allow us to update or swap out the implementation without any impact on downstream users.

@jsedding We did a discussion on this topic here: https://issues.apache.org/jira/browse/OAK-11791 and IIUC, we decided to move to caffeine.

Also, all the packages exporting these are internal so won't be a problem in case we would want to switch again.

But I am open to discussion on using OAK APIs as well, just don't see a pretty solid reason to do it apart from internal exported packages.

cc @thomasmueller @reschke

@jsedding
Copy link
Contributor

jsedding commented Mar 23, 2026

In the past, guava has been problematic, because it needed to be installed alongside the oak bundles in the same OSGi container. Applications built on top of Oak would then start using guava as well.

This caused two points of contention, a bit of a catch 22:

  1. the application wanted to use a newer guava version but couldn't update due to Oak and
  2. we wanted to update the guava version in Oak, but couldn't due to applications depending on the lower guava version.

Matters were made worse in guava's case due to the fact that guava does not follow semantic versioning.

IIUC, the whole exercise of removing guava is done in order to get rid of this sort of problem. Switching caches to caffeine, imported as a dependency in OSGi, does not IMHO solve the problem.

Defining a custom caching API (which could be pretty much a copy of the caffeine API, or whatever) creates a level of decoupling. Embedding the implementation in the bundle implementing this API maximizes the level of decoupling and, for me, is optional, as it could be done at a later time if needed.

BTW, the ticket you linked is closed "Won't do" and concerns itself only with oak-documentstore IIUC.

BTW 2: I am not objecting to caffeine as the implementation, it looks like a solid choice 🙂 I only object to splashing its API all over Oak.

@reschke
Copy link
Contributor

reschke commented Mar 23, 2026

Yup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants