Spark 3.5: Support executor cache locality #9563

aokolnychyi · 2024-01-26T20:39:23Z

This PR adds an ability to enable executor cache locality. The new SQL property is off by default as it does not make sense in all cases and may introduce unnecessary task waits. When enabled and there are deletes, this logic will try to co-locate tasks for one partition to the same executors to increase the probability of the cache reuse.

advancedxy · 2024-02-01T14:55:23Z

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java

+  }
+
+  private static boolean isPartitioned(PartitionScanTask task) {
+    return task.partition() != null && task.partition().size() > 0;


how about checks task.spec().isPartitioned() instead? Otherwise, VoidTransform in V1 table is not handled in the above code.

That's what I started with but task.spec() may be quite expensive.

@Override public PartitionSpec spec() { if (spec == null) { synchronized (this) { if (spec == null) { this.spec = PartitionSpecParser.fromJson(schema(), specString); } } } return spec; }

Even though there is a cache in PartitionSpecParser, we need to do to a lookup by the spec json. I don't think void transforms will be an issue as we check for presence of deletes and V1 tables can't have them.

That's what I started with but task.spec() may be quite expensive.

I think we already called specs() in SparkPartitioningAwareScan to preserve data grouping. If it's quite expensive to call task.spec(), it would be better that we can reduce this overhead.

First thing came out of my mind: how about add a int specId() method in the PartitionScanTask interface? The specId should be easier to store and retrieve. In the driver side, we can leverage table.specs() to retrieve the actual spec.

I don't think void transforms will be an issue as we check for presence of deletes and V1 tables can't have them.

Maybe some extreme case, such as: v1 table with identity(c1) -- dropped partition transform --> v1 table with void transform -- upgrade to v2 --> v2 table with void transform --> added deletes.

I think you are right. Let me profile that part and see how expensive it is.

It looks like the schema and spec caches are doing their jobs, I don't see much time spent on that, I'll switch to using specs. Good call, @advancedxy!

I switched to proper hashes, will add another JMH benchmark later but seems to perform alright.

advancedxy · 2024-02-01T15:09:10Z

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPlanningUtil.java

+    return task.partition() != null && task.partition().size() > 0;
+  }
+
+  private static int hash(StructLike struct) {


is it possible to reuse org.apache.iceberg.types.JavaHash(es) here?

It would require having access to task.spec() to know the struct type, which I think is expensive to get.

This logic would cause issues if the underlying StructLike instances may have different value representations for the same column. For instance, one struct has String and the other one has some other form of CharSequence. I am not sure how realistic that would be. Even if that happens, there would be no correctness issues as such tasks would simply be assigned to different slots.

If there is a cheap way to leverage JavaHashes, I am all for it.

advancedxy · 2024-02-01T15:14:26Z

spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/SparkReadConf.java

+        .parse();
+  }
+
+  private boolean executorCacheLocalityEnabledInternal() {


I'm a little concerned that this doesn't play well with Spark's dynamicAllocation which should be enabled by default for most production systems.

Did you test how would this work with dynamic allocation enabled?

My original approach was to enable the executor cache locality by default only if dynamic allocation is disabled. After thinking more about it, I decided to simply disable it by default no matter whether static or dynamic allocation is used. As of right now, folks have to opt-in explicitly to enable executor cache locality. That way, we ensure there are no extra waits added on our end as we can't guarantee the locality would be beneficial.

My original approach was to enable the executor cache locality by default only if dynamic allocation is disabled.

That's my first thought too. Then I realize what if users want to enable this anyway. It should be up to the users to decide.

What about log a warning when both dynamic allocation and executor cache locality are enabled.

My worry is that we don't really know if enabling this with dynamic allocation is going to hurt. For instance, it still may make sense if the min number of executors is big enough or if the cluster is hot. Given that we would also have to add logic to parse the dynamic allocation config, I'd probably not log it and trust the person setting this for now.

advancedxy · 2024-02-01T15:18:27Z

...park-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMergeOnReadDelete.java

+          sql("DELETE FROM %s WHERE id = 1", commitTarget());
+          sql("DELETE FROM %s WHERE id = 3", commitTarget());
+
+          assertEquals(


Seems it only checks for records equality, but doesn't check the executor cache locality?

I think we may check spark RDD's getPreferredLocations instead?

I am afraid we can't really test this as SparkUtil$executorLocations would return an empty list in our local testing env. This test is to simply ensure nothing breaks if run on the driver.

Ah, I see. Iceberg's spark tests only support local mode. There's a local-cluster mode which requires extra setups. I think it's fine to left it as it is.

rdblue

This looks reasonable to me, and low risk since it is disabled by default.

advancedxy

LGTM.

aokolnychyi · 2024-02-05T18:33:48Z

Thanks, @advancedxy @rdblue! I am going to test this with our RC on a cluster. I can't cover everything locally. I tested the initial prototype on a cluster and it worked well.

github-actions bot added the spark label Jan 26, 2024

aokolnychyi added this to the Iceberg 1.5.0 milestone Jan 26, 2024

aokolnychyi force-pushed the executor-cache-locality branch 2 times, most recently from f19a39b to e118b88 Compare January 31, 2024 03:28

advancedxy reviewed Feb 1, 2024

View reviewed changes

Spark 3.5: Support executor cache locality

076f23b

aokolnychyi force-pushed the executor-cache-locality branch from e118b88 to 076f23b Compare February 3, 2024 00:49

github-actions bot added the core label Feb 3, 2024

rdblue approved these changes Feb 3, 2024

View reviewed changes

advancedxy approved these changes Feb 4, 2024

View reviewed changes

aokolnychyi merged commit c745ac3 into apache:main Feb 5, 2024
41 checks passed

aokolnychyi mentioned this pull request Feb 5, 2024

Spark 3.4: Support executor cache locality #9658

Merged

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Spark 3.5: Support executor cache locality (apache#9563)

0d1dd16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.5: Support executor cache locality #9563

Spark 3.5: Support executor cache locality #9563

aokolnychyi commented Jan 26, 2024 •

edited

advancedxy Feb 1, 2024

aokolnychyi Feb 1, 2024

advancedxy Feb 2, 2024

aokolnychyi Feb 2, 2024

aokolnychyi Feb 2, 2024 •

edited

aokolnychyi Feb 3, 2024

advancedxy Feb 1, 2024

aokolnychyi Feb 1, 2024 •

edited

aokolnychyi Feb 1, 2024

aokolnychyi Feb 1, 2024

advancedxy Feb 1, 2024

aokolnychyi Feb 1, 2024

advancedxy Feb 2, 2024

aokolnychyi Feb 3, 2024 •

edited

advancedxy Feb 1, 2024

aokolnychyi Feb 1, 2024

advancedxy Feb 2, 2024

rdblue left a comment

advancedxy left a comment

aokolnychyi commented Feb 5, 2024

Spark 3.5: Support executor cache locality #9563

Spark 3.5: Support executor cache locality #9563

Conversation

aokolnychyi commented Jan 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Feb 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Feb 1, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Feb 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue left a comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

aokolnychyi commented Feb 5, 2024

aokolnychyi commented Jan 26, 2024 •

edited

aokolnychyi Feb 2, 2024 •

edited

aokolnychyi Feb 1, 2024 •

edited

aokolnychyi Feb 3, 2024 •

edited