Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cache): add hazelcast distributed cache option #6645

Merged
merged 9 commits into from
Jan 19, 2023

Conversation

RyanHolstien
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Dec 5, 2022
@github-actions
Copy link

github-actions bot commented Dec 5, 2022

Unit Test Results (build & test)

638 tests  +17   634 ✔️ +17   16m 0s ⏱️ -1s
165 suites +  8       4 💤 ±  0 
165 files   +  8       0 ±  0 

Results for commit 988a71b. ± Comparison against base commit fdcb731.

This pull request removes 4 and adds 21 tests. Note that renamed tests count towards both.
com.datahub.authorization.ranger.RangerAuthorizerTest ‑ testAuthorizationAllow
com.datahub.authorization.ranger.RangerAuthorizerTest ‑ testAuthorizationDeny
com.datahub.authorizer.plugin.ranger.TestDataHubRangerAuthPlugin ‑ testLookupResource
com.datahub.authorizer.plugin.ranger.TestDataHubRangerAuthPlugin ‑ testValidateConfig
com.datahub.plugins.auth.TestConfig ‑ testConfig
com.datahub.plugins.auth.TestConfigProvider ‑ testConfigurationLoading
com.datahub.plugins.auth.TestConfigValidationUtils ‑ testListShouldNotBeEmpty
com.datahub.plugins.auth.TestConfigValidationUtils ‑ testListShouldNotHaveDuplicate
com.datahub.plugins.auth.TestConfigValidationUtils ‑ testMapShouldNotBeEmpty
com.datahub.plugins.auth.TestConfigValidationUtils ‑ testWhiteSpacesValidation
com.datahub.plugins.auth.TestIsolatedClassLoader ‑ testAuthenticatorPlugin
com.datahub.plugins.auth.TestIsolatedClassLoader ‑ testAuthorizerPlugin
com.datahub.plugins.auth.TestIsolatedClassLoader ‑ testDuplicatePluginName
com.datahub.plugins.auth.TestIsolatedClassLoader ‑ testIncorrectImplementation
…

♻️ This comment has been updated with latest results.

import lombok.Data;


@Data
public class CachedEntityLineageResult {
private final EntityLineageResult entityLineageResult;
private final String entityLineageResult;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are serializing this result? interesting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is essentially a DTO wrapper around the result to be able to include the timestamp (and anything else we want to add in later) onto the cached value so any RecordTemplate used within it has to be serialized

private int cacheMaxSize;

@Value("${searchService.cache.hazelcast.serviceName:hazelcast-service}")
private String hazelcastServiceName;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have hazelcast in the name of variables here?
isn't the idea that we can swap in differnt impls for the caffeine cache?
does this seem overly-specific?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see - so we are only accounting for 2 cases: caffeine and hazel. i think i get it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spring Cache Manager is intended for very very simple use cases when switching out cache implementations. Unfortunately our use cases are complex enough where it requires using the provider level interfaces so we would need to reimplement any other implementations we support. This is unchanged from how it previously was, but just exposes the implementation name in the config. It's possible that other providers would also require a headless K8s service for their distributed cache implementation, but unlikely (if we even do implement another underlying supported cache) so I think it's okay to be specific here.

@@ -82,6 +82,10 @@ graphService:
searchService:
resultBatchSize: ${SEARCH_SERVICE_BATCH_SIZE:100}
enableCache: ${SEARCH_SERVICE_ENABLE_CACHE:false}
cacheImplementation: ${SEARCH_SERVICE_CACHE_IMPLEMENTATION:caffeine}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor minor: in terms of config layout, what if we did this

cache:
    type: "hazelcase"
    config:
         hazelcast:
                 serviceName: .... same

this may read a bit easier. thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I had it at the upper level to be in line with the enableCache variable, but I think it's fine to start a new pattern here.

Timer.Context cacheAccess = MetricUtils.timer(this.getClass(), "getBatch_cache_access").time();
K cacheKey = cacheKeyGenerator.apply(batch);
String json = cache.get(cacheKey, String.class);
result = json != null ? toRecordTemplate(SearchResult.class, json) : null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hazelcast requires that the cached object is a string? Or serializable? Cannot record template be serialized by itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires that it be serializable, RecordTemplate does not implement Serializable and any RecordTemplate being used in the key or value was throwing errors. Looked into trying to inject a custom deserializer into the Hazelcast deserialization config, but this was much easier.

@@ -0,0 +1,64 @@
package com.linkedin.gms.factory.search;

import com.hazelcast.config.Config;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test

@aditya-radhakrishnan aditya-radhakrishnan added platform PR-s that make changes to core parts of the platform and removed product PR or Issue related to the DataHub UI/UX labels Dec 19, 2022
public class CachedEntityLineageResult {
private final EntityLineageResult entityLineageResult;
public class CachedEntityLineageResult implements Serializable {
private final byte[] entityLineageResult;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Ideally this detail would be hidden to consumer of the CachedEntityLineageResult. We should auto deserialize it on "getEntityLineageResult"


@Data
public class CachedEntityLineageResult implements Serializable {
private final byte[] entityLineageResult;
private final long timestamp;

public CachedEntityLineageResult(EntityLineageResult lineageResult, long timestamp) {
this.entityLineageResult = gzipCompress(toJsonString(lineageResult));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice encapsulation!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely better than having external consumers needing to know about this internal detail

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if it looks good to others.

@jjoyce0510 jjoyce0510 merged commit 8323128 into datahub-project:master Jan 19, 2023
@RyanHolstien RyanHolstien deleted the feat/distCache branch January 19, 2023 17:06
ericyomi pushed a commit to ericyomi/datahub that referenced this pull request Feb 8, 2023
…6645)

Co-authored-by: Aseem Bansal <asmbansal2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform PR-s that make changes to core parts of the platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants