Skip to content

[#8710] feat(core): Support cache entities in relation operations.#8712

Merged
jerryshao merged 21 commits intoapache:mainfrom
yuqi1129:issue_8710
Nov 20, 2025
Merged

[#8710] feat(core): Support cache entities in relation operations.#8712
jerryshao merged 21 commits intoapache:mainfrom
yuqi1129:issue_8710

Conversation

@yuqi1129
Copy link
Contributor

What changes were proposed in this pull request?

Cache entities in relation operations.

Why are the changes needed?

It's a big improvement.

Fix: #8710

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

UTs and ITs.

@yuqi1129 yuqi1129 self-assigned this Sep 28, 2025
@yuqi1129
Copy link
Contributor Author

More effort should be taken to support get and list operations for relation operations at the same time, and the current change only supports list operations.

@yuqi1129 yuqi1129 marked this pull request as ready for review September 29, 2025 13:28
@yuqi1129
Copy link
Contributor Author

@Abyss-lord do you have time to take a look?

@Abyss-lord
Copy link
Contributor

@Abyss-lord do you have time to take a look?

sure, plz assign to me.

@yuqi1129 yuqi1129 requested a review from Abyss-lord October 9, 2025 01:30
@yuqi1129 yuqi1129 requested a review from roryqi October 9, 2025 12:57
*/
private void syncEntitiesToCache(EntityCacheRelationKey key, List<Entity> newEntities) {
if (key.relationType() != null) return;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why do you remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If if (key.relationType() != null) return; exists here, entity store cache will not cache entities with relation, that is why the cache does not support store relational operations.

Note: key.relationType() is not null when we are going to put relational entities into cache.

@jerryshao
Copy link
Contributor

The main thing is that caching the relations will significantly increase the memory usage, we should 1. make sure that policy/tags should be kicked out first if the threshold is reached; 2. something like statistics should not be cached, this will waste lots of memory.

@jerryshao
Copy link
Contributor

I don't see any UTs about this PR, can you add some tests about it?

@yuqi1129
Copy link
Contributor Author

The main thing is that caching the relations will significantly increase the memory usage, we should 1. make sure that policy/tags should be kicked out first if the threshold is reached; 2. something like statistics should not be cached, this will waste lots of memory.

  1. make sure that policy/tags should be kicked out first if the threshold is reached;

That makes sense. I will open an issue to track it.

  1. something like statistics should not be cached, this will waste lots of memory.

I have opened an issue aimed at it. Please see #8785

@yuqi1129
Copy link
Contributor Author

I don't see any UTs about this PR, can you add some tests about it?

done.

*/
public class ReverseIndexCache {
private RadixTree<EntityCacheKey> reverseIndex;
private RadixTree<List<EntityCacheKey>> reverseIndex;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change aims to fix #8817.

@yuqi1129 yuqi1129 requested a review from jerryshao October 20, 2025 06:49
@yuqi1129 yuqi1129 requested review from diqiu50 and mchades November 17, 2025 09:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements caching for entities in relation operations, providing a significant performance improvement by reducing backend database queries. The changes introduce cache support for tag-to-metadata-object and policy-to-metadata-object relations, with appropriate cache invalidation logic when entities are updated.

Key changes include:

  • Added cache support in RelationalEntityStore.getEntityByRelation() to retrieve and cache entity relations
  • Introduced new reverse index rules for PolicyEntity, TagEntity, and GenericEntity to track bidirectional relationships
  • Enhanced cache invalidation logic to clear both sides of relation mappings when entities are updated
  • Updated error message checks in TagManager and PolicyManager to match the new generic error format

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
core/src/test/java/org/apache/gravitino/storage/TestEntityStorage.java Added comprehensive test testTagRelationCache to verify caching behavior for tag relations including cache population and invalidation on entity updates
core/src/main/java/org/apache/gravitino/tag/TagManager.java Updated error message check from "No such tag entity" to "No such entity" for consistency with new error format
core/src/main/java/org/apache/gravitino/storage/relational/RelationalEntityStore.java Implemented cache-aware getEntityByRelation() method and added getCache() getter for testing
core/src/main/java/org/apache/gravitino/policy/PolicyManager.java Updated error message check from "No such policy entity" to "No such entity" for consistency
core/src/main/java/org/apache/gravitino/cache/ReverseIndexRules.java Added three new reverse index rules for policy, tag, and generic entity relationships
core/src/main/java/org/apache/gravitino/cache/ReverseIndexCache.java Registered new reverse index rules for PolicyEntity, TagEntity, and GenericEntity
core/src/main/java/org/apache/gravitino/cache/CaffeineEntityCache.java Enhanced cache invalidation to handle bidirectional relation clearing and modified empty list caching behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@yuqi1129
Copy link
Contributor Author

All have been resolved.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jerryshao
Copy link
Contributor

I'm OK with the current change. Please fix the above issue.

Besides, my feeling is that this code is quite complex and hard to understand by others. We'd better add more comments as possible as we can, so others can hand over this part.

@yuqi1129
Copy link
Contributor Author

my feeling is that this code is quite complex and hard to understand by others.

I think so, it introduces radix tree, segment lock, caffeine cache, and maintains both key/value data and relational data. I will update the comment in the next few improvements, it should be raised in the release 1.1.0.

@yuqi1129
Copy link
Contributor Author

All resolved.

@jerryshao jerryshao merged commit 8ddb174 into apache:main Nov 20, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Cache entities in relation operations

5 participants