[FLINK-28420] Support partial caching in sync and async lookup runner #20439

PatrickRen · 2022-08-03T10:05:16Z

What is the purpose of the change

This pull request add partial caching functionality in sync and async lookup runners, which is a part of FLIP-221.

Brief change log

Support partial caching for sync lookup table
Port TestValuesLookupFunction onto new LookupFunction API and add tests for sync lookup
Support partial caching for async lookup table
Port AsyncTestValuesLookupFunction onto new AsyncLookupFunction and add tests for async lookup

Verifying this change

This change modifies some existing tests, including LookupJoinHarnessTest and LookupJoinITCase.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

flinkbot · 2022-08-03T10:22:45Z

CI report:

fce98b5 Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

…ction API and add tests for sync lookup

…cLookupFunction and add tests for async lookup

wuchong

Thanks for the contribution, @PatrickRen .

The biggest concern of the implementation is the partial cache invades LookupJoinRunners (4 classes). Actually, the cache implementation can be a simple wrapper on the LookupFunction, such as CachedLookupFunction. This can clearly separate the caching (partial or full), fetcher, and join runner logic, which is very important for long-term maintenance. I can imagine that the LookupRunner would be much more complex when we introduce full caching and many if else branches there.

What do you think about refactoring to a simple PartialCachedLookupFunction extends LookupFunction which wraps user LookupFunction and the LookupCache? In the future, we can support full cache in the same way which just adding a new class makes things work.

wuchong · 2022-08-05T08:17:09Z

...n/src/main/java/org/apache/flink/table/connector/source/lookup/cache/LookupCacheManager.java

+ * table for which it is serving.
+ */
+@Internal
+public class LookupCacheManager {


It seems this is only used internally (flink-table-runtime), why put it in a common module?

wuchong · 2022-08-05T08:19:07Z

...n/src/main/java/org/apache/flink/table/connector/source/lookup/cache/LookupCacheManager.java

+    public synchronized LookupCache registerCache(String tableIdentifier, LookupCache cache) {
+        checkNotNull(cache, "Could not register null cache in the manager");
+        if (cachesByTableIdentifier.containsKey(tableIdentifier)) {
+            return cachesByTableIdentifier.get(tableIdentifier);
+        } else {
+            cachesByTableIdentifier.put(tableIdentifier, cache);
+            return cache;
+        }
+    }


I have several concerns about this method:

the same dim table can be used multiple times in a SQL query with different table options (e.g. cache TTL). I think we shouldn't reuse them if the configuration differs. That means we may need to introduce a LookupCache#getIdentifier() interface to get an identifier of a specific cache.

tableIdentifier is not enough to identify a cache. Because in session mode, different jobs may use the same table identifier to refer to different external tables. Maybe we should add JobID as part of the cache id.

Therefore, the final registered key should be composite of JobID, table identifier and cache identifier.

From the method signature, the cache parameter is the one registered, however, it maybe not. Maybe registerCacheIfAbsent would be better (similar to Map#putIfAbsent).

wuchong · 2022-08-05T08:32:25Z

...nk-table-planner/src/main/java/org/apache/flink/table/planner/plan/utils/LookupJoinUtil.java

+        if (provider instanceof PartialCachingLookupProvider) {
+            LookupCache cache = ((PartialCachingLookupProvider) provider).getCache();
+            if (cache == null) {
+                return Optional.empty();


Shall we allow a null cache for PartialCachingLookupProvider?

Users should use LookupFunctionProvider if no cache is provided.

PartialCachingLookupProvider#getCache() is not annotated @Nullable.

I think we also not allow null getScanRuntimeProvider and getCacheReloadTrigger for FullCachingLookupProvider.

wuchong · 2022-08-05T08:38:11Z

...nk-table-planner/src/main/java/org/apache/flink/table/planner/plan/utils/LookupJoinUtil.java

+                            LookupJoinCodeGenerator.generateLeftTableKeyProjection(
+                                    tableConfig,
+                                    classLoader,
+                                    leftTableRowType,
+                                    keyRowType,
+                                    lookupKeys),


Please use org.apache.flink.table.planner.plan.utils.KeySelectorUtil#getRowDataSelector to create the key select/projection which guarantees always generates BinaryRowData for the consistent hashcode() and equals() behavior.

wuchong · 2022-08-05T08:39:57Z

...nk-table-planner/src/main/java/org/apache/flink/table/planner/plan/utils/LookupJoinUtil.java

+                                    leftTableRowType,
+                                    keyRowType,
+                                    lookupKeys),
+                            new RowDataSerializer(keyRowType),


cacheKeySerializer is not needed when we use the above keyselector.

wuchong · 2022-08-05T08:42:25Z

...e/src/main/java/org/apache/flink/table/runtime/operators/join/lookup/LookupCacheHandler.java

+    @Nullable private final RowDataSerializer cacheKeySerializer;
+    @Nullable private final RowDataSerializer cacheValueSerializer;
+
+    private LookupCache cache;


transient?

wuchong · 2022-08-05T08:44:21Z

...e/src/main/java/org/apache/flink/table/runtime/operators/join/lookup/LookupCacheHandler.java

+        if (cacheKeySerializer != null && cacheValueSerializer != null) {
+            RowData copiesKey = cacheKeySerializer.copy(key);
+            Collection<RowData> copiedValues =
+                    value.stream()
+                            .map(cacheValueSerializer::copy)
+                            .collect(Collectors.toCollection(ArrayList::new));
+            getCache().put(copiesKey, copiedValues);


wuchong · 2022-08-05T08:51:29Z

...ime/src/main/java/org/apache/flink/table/runtime/operators/join/lookup/LookupJoinRunner.java

+     * <p>- Input from left table (id, name): +I(1, Alice)
+     *
+     * <p>- Value return by user's fetcher (id, age, gender): +I(1, 18, female)
+     *
+     * <p>Then the entry stored in the cache would be: +I(1), +I(1, 18, female), even calculation


The change flag can be omitted when explaining the cache strategy because the cache doesn't store any changelogs and all records should be insert-only.

+I(1), +I(1, 18, female) ==> key=(1), value=(1, 18, female)

wuchong · 2022-08-05T09:27:27Z

...src/test/scala/org/apache/flink/table/planner/runtime/stream/sql/AsyncLookupJoinITCase.scala

+        HEAP_BACKEND,
+        ENABLE_OBJECT_REUSE,
+        AsyncOutputMode.ALLOW_UNORDERED,
+        DISABLE_CACHE),


I think we don't need double cases for dynamic table sources. I think mixing cache cases into the existing 4 cases (for dynamic table sources) is enough.

wuchong · 2022-08-05T09:49:44Z

...rc/main/java/org/apache/flink/table/runtime/operators/join/lookup/AsyncLookupJoinRunner.java

            }
        }
+        if (cacheHandler != null) {
+            cacheHandler.getCache().close();


Should unregister the cache as well. Otherwise, memory leaks. That means LookupCacheManager may need to track the reference count.

PatrickRen · 2022-08-07T08:02:34Z

Thanks for the review @wuchong ! I made a refactor based on your comments so I open a new PR #20480. Will close this one for now.

flinkbot added the component=TableSQL/Runtime label Aug 3, 2022

PatrickRen force-pushed the FLINK-28420 branch from b2f4e67 to 192be0d Compare August 4, 2022 01:52

PatrickRen added 4 commits August 5, 2022 13:05

[FLINK-28420][table] Support partial caching for sync lookup table

c8bc171

[FLINK-28420][table] Port TestValuesLookupFunction onto new LookupFun…

4a1ed78

…ction API and add tests for sync lookup

[FLINK-28420][table] Support partial caching for async lookup table

13d5301

[FLINK-28420][table] Port AsyncTestValuesLookupFunction onto new Asyn…

fce98b5

…cLookupFunction and add tests for async lookup

PatrickRen force-pushed the FLINK-28420 branch from 192be0d to fce98b5 Compare August 5, 2022 05:05

wuchong reviewed Aug 5, 2022

View reviewed changes

PatrickRen closed this Aug 7, 2022

[FLINK-28420] Support partial caching in sync and async lookup runner #20439

[FLINK-28420] Support partial caching in sync and async lookup runner #20439

Uh oh!

Conversation

PatrickRen commented Aug 3, 2022

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PatrickRen commented Aug 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flinkbot commented Aug 3, 2022 •

edited

Loading