Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-10494 - Add cache to reduce number of remote blobstore calls. #1155

Open
wants to merge 20 commits into
base: trunk
Choose a base branch
from

Conversation

ahanikel
Copy link
Contributor

The cache size should be made configurable and I should probably add some tests, too, but we can still start the discussion.

@ahanikel ahanikel marked this pull request as ready for review October 17, 2023 08:30
@ahanikel
Copy link
Contributor Author

@amit-jain Could you have a look when you have time? Thanks a lot! The test is kinda stupid but I couldn't think of a better one...

@amit-jain
Copy link
Contributor

@ahanikel the PR looks good but I am not sure if the invalidation also has to happen somewhere.
The tests are in CachingDataStoreTest for this class.

Overall, I think the call need not go to the backend in case the the corresponding file is available in the cache.

@ahanikel
Copy link
Contributor Author

ahanikel commented Oct 18, 2023

@amit-jain I think the least recently used entries in the cache automatically fall off the cliff when the maxSize is reached, but I've still added an expiration of 15 minutes, so we don't waste memory unnecessarily.

I've added my test to the tests in CachingDataStoreTest. I've also tried to use Mockito but it does not seem to work in this case (modifying the class variable seems to be ignored somehow in the mocking process).

Overall, I think the call need not go to the backend in case the the corresponding file is available in the cache.

Yes, but my understanding is that we try to avoid loading the blob if we don't need it, and so if we only call backend.getRecord() it never ends up in the existing cache, and therefore getRecordIfStored() always falls back to backend.getRecord(). If my understanding is wrong, then there must be something else at play, because the recordCache has a measurable impact on performance (I've put the details on my internal git.corp account unfortunately, the link is in GRANITE-47685).

@@ -638,4 +642,165 @@ private void waitFinish() {
e.printStackTrace();
}
}
private AbstractSharedCachingDataStore createDataStore() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use the existing dummy objects to construct the datastore. For the variables we can create instance variables instead of static variables to make it easier to inject.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amit-jain I don't know what I was thinking when using static variables for the recordCache configuration, sorry for that. I'm using the existing test datastore now, but I had to adapt it a bit for simulating a delay when doing (remote) backend requests. Could you have another look? Thank you very much!

LOG.trace("" + dataStore.getRecordIfStored(di)); // LOG.trace to avoid the call being optimised away
}
long timeCached = System.nanoTime() - start;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Now I understood why you are adding a delay. The assertion with time can be quite fragile. Can't we just assert on the cache object not being empty and presence of the record with identifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to "prove" the effectiveness of the cache but although I've only checked for a 5x improvement where the difference should be at least 100x for that test case, I agree it is still fragile and should be avoided.

I've changed the test to only check that the record is in the cache after the first access, and to ensure that it is loaded from the cache when accessed a second time. Is that what you meant?

Copy link
Contributor

@amit-jain amit-jain Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. If you need to add test performance numbers maybe you can also add to oak-benchmarks.

assertNotNull("Record with id " + id + " should be in the recordCache now",
dataStore.recordCache.get().getIfPresent(id));
// make sure the record is loaded from the cache
backend.deleteRecord(di);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to go through the AbstractSharedCachingDataStore#deleteRecord. That way we can assert that the record is not available in the cache as well. AbstractBacked#deleteRecord wouldn't be called directly imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amit-jain The thing is that I'm deleting the record from the cache in AbstractSharedCachingDataStore#deleteRecord as well: https://github.com/ahanikel/jackrabbit-oak/blob/issues/OAK-10494/oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/AbstractSharedCachingDataStore.java#L331 so the following assert would fail.
The reason I'm deleting from the backend here is just to prove that the record is actually read from the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you meant to make sure the record is not read from the existing cache either? I've added an invalidate for that in the commit below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was if you call AbstractSharedCachingDataStore#deleteRecord. then invalidate is called implicitly and the test then also covers that change.

dataStore.cache.invalidate(id);
assertNull("Record with id " + id + " should not be in the backend anymore",
backend.getRecord(di));
assertNotNull("The record could not be loaded from the cache",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has to be changed to assertNull because would have been called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you want to make sure that the record is no longer in the cache after deletion, right? I'll add a check for that in the next commit. The check above is to ensure that the record is loaded from the cache without using the backend when accessed a second time. That's why I'm deleting it from the backend, that way I can be sure it comes from the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants