Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33870][CORE] Enable spark.storage.replication.proactive by default #30876

Closed
wants to merge 2 commits into from
Closed

[SPARK-33870][CORE] Enable spark.storage.replication.proactive by default #30876

wants to merge 2 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Dec 21, 2020

What changes were proposed in this pull request?

This PR aims to enable spark.storage.replication.proactive by default for Apache Spark 3.2.0.

Why are the changes needed?

spark.storage.replication.proactive is added by SPARK-15355 at Apache Spark 2.2.0 and has been helpful when the block manager loss occurs frequently like K8s environment.

Does this PR introduce any user-facing change?

Yes, this will make the Spark jobs more robust.

How was this patch tested?

Pass the existing UTs.

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37773/

@dongjoon-hyun
Copy link
Member Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37773/

@SparkQA
Copy link

SparkQA commented Dec 21, 2020

Test build #133174 has finished for PR 30876 at commit 7c54051.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot added the CORE label Dec 21, 2020
@dongjoon-hyun
Copy link
Member Author

Could you review this, @HyukjinKwon ?

@@ -384,7 +384,7 @@ package object config {
"get the replication level of the block to the initial number")
.version("2.2.0")
.booleanConf
.createWithDefault(false)
.createWithDefault(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe just enable this by default when we're in Kubernates? I am okay with enabling it by default too if other people are fine. cc @tgravescs and @Ngone51 too FYI

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other resource managers, this will be helpful because this is a kind of self-healing code. And, this code has been here for a long time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how widely this is used, particularly as it is not enabled by default.
Especially in context of dynamic resource allocation, it can become very chatty when executor's start getting dropped.

Given this, I am not very keen on enabling it atleast for yarn. Thoughts @tgravescs ?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Dec 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mridulm . Actually, we are using it now and it's a good time to test it by default, isn't it?

I am not sure how widely this is used, particularly as it is not enabled by default.

For the following, Apache Spark usually drop only empty executors. If you are saying a storage timeout configuration, I believe that what we need is to improve storage timeout configuration behavior after this enabling. I guess storage timeout had better not cause any chatty situation, of course.

Especially in context of dynamic resource allocation, it can become very chatty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. I think I'm OK with the change. And shall we document the behaviour change in core-migration-guide.md?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update the PR, @Ngone51 .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, I found this to be noisy for the cases where replication was enabled - but this was a while back, and I would like to understand better what the 'cost' of enabling this for nontrivial usecases is for master : disabled by default means only developers who specifically test for it pay the price; not everyone.
It is quite common for an application to have references to a persisted RDD even after its use - with the loss of the RDD blocks having little to no functional impact.
This is similar to loss of blocks for an unreplicated persisted RDD - we do not proactively recompute the lost blocks; but do so on demand.

If the idea is we enable this for master, and evaluate the impact over the next 6 months and revisit at the end, I am fine with that: but an evaluation would need to be done before this goes out - else anyone using replicated storage will also get hit with the impact of proactive replication as well, and will need to disable this for their applications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but an evaluation would need to be done before this goes out

or perhaps identify the subset of conditions where it makes sense to enable it by default.

@dongjoon-hyun
Copy link
Member Author

Also, cc @mridulm .

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mridulm.
I'm trying to understand the risk you mentioned for YARN environment.
Could you give me more hints about your concerns on this at the YARN dynamic allocation situation? We can fix it the behavior and move forward if that's valid.

@dongjoon-hyun
Copy link
Member Author

Hi, All. The core migration guide is updated.

@github-actions github-actions bot added the DOCS label Dec 23, 2020
@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Test build #133259 has finished for PR 30876 at commit 28fb534.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37857/

@Ngone51
Copy link
Member

Ngone51 commented Dec 23, 2020

LGTM if tests pass.

@SparkQA
Copy link

SparkQA commented Dec 23, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37857/

@dongjoon-hyun
Copy link
Member Author

Thank you, @Ngone51 . The one failure is the following and it's fixed on the master branch at the very latest commit.

SQLQueryTestSuite.transform.sql
org.scalatest.exceptions.TestFailedException: transform.sql
Expected "...h status 127. Error:[ /bin/bash: some_non_existent_command: command not found]", but got "...h status 127. Error:[]" Result did not match for query #2
SELECT TRANSFORM(a)
USING 'some_non_existent_command' AS (a)
FROM t

@dongjoon-hyun
Copy link
Member Author

Thank you all! Merged to master for Apache Spark 3.2.0.
If there is an issue, we can fix the behavior in Apache Spark 3.2.0 timeframe.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-33870 branch December 23, 2020 06:00
@mridulm
Copy link
Contributor

mridulm commented Dec 23, 2020

Would be nice if we hold off merging when there is ongoing discussion, unless there is an immediate need to push changes (like hotfix)

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 23, 2020

Oh, sorry, @mridulm . Do you want me to revert this?

@dongjoon-hyun
Copy link
Member Author

If you are reluctant on enabling this, we can revert this and I'll make another PR, @mridulm .

@dongjoon-hyun
Copy link
Member Author

Also, I hope we can address your concerns as new official JIRA issues instead of abandoning the existing Spark features. The features behind the false configuration are still accessible to the users. We had better test and improve it. WDYT, @mridulm .

@mridulm
Copy link
Contributor

mridulm commented Dec 23, 2020

There are usecases for which this is an invaluable feature - particularly for applications with aggressive DRA or flakey cluster env.
I am not sure if this has to be enabled by default though ... which is why the functionality as such needs to be there (for applications/deployments which need to leverage it), but we need to discuss if we need to enable it by default.

@mridulm
Copy link
Contributor

mridulm commented Dec 23, 2020

Oh, sorry, @mridulm . Do you want me to revert this?

Let us continue with the discussion - this is a trivial enough PR to revert if we decide to do so.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 23, 2020

Thank you, @mridulm . Sure!

In terms of resource management, currently K8s environment is more aggressive than the other existing resource managers. For example, not only the competition between heterogeneous apps(Spark/Hive/other kind of jobs) in the cluster occurs, but also the cluster size itself is dynamically adjusted. (e.g. EKS)

Your recent series of external shuffle service patches are helpful of course. In addition to that, there are more helpful options like this. For example, there is Worker Decommission and its spark.storage.decommission.rddBlocks.enabled. If it's enabled, BlockManagerDecommissioner's rddBlockMigrationRunnable is trying to migrate all RDD blocks by using BlockManager.replicateBlock. It's logically similar with spark.storage.replication.proactive=true.

  private def migrateBlock(blockToReplicate: ReplicateBlock): Boolean = {
    val replicatedSuccessfully = bm.replicateBlock(
      blockToReplicate.blockId,
      blockToReplicate.replicas.toSet,
      blockToReplicate.maxReplicas,
      maxReplicationFailures = Some(maxReplicationFailuresForDecommission))
    if (replicatedSuccessfully) {
      logInfo(s"Block ${blockToReplicate.blockId} offloaded successfully, Removing block now")
      bm.removeBlock(blockToReplicate.blockId)
      logInfo(s"Block ${blockToReplicate.blockId} removed")
    } else {
      logWarning(s"Failed to offload block ${blockToReplicate.blockId}")
    }
    replicatedSuccessfully
  }

However, sometimes, K8s doesn't wait for an enough time and kill executors during migration processing at its predefined grace period. At that time, spark.storage.replication.proactive become important again to recover it. spark.storage.replication.proactive is much less chatty than BlockManagerDecommissioner which tries to migrate all storage blocks including shuffles.

This is my focused use case and I love to hear your concerns. You have all my ears. 😄

@mridulm
Copy link
Contributor

mridulm commented Dec 24, 2020

Thanks for the details, that definitely sounds like a good rationale to enable it by default for k8s: for other resource managers, this does not necessarily apply.
One option would be to do this by default for k8s as @HyukjinKwon suggested, and expand later as we understand how executor decomm and replication works in other env's ?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 24, 2020

Thank you for your advice. I added my replies.

If the idea is we enable this for master, and evaluate the impact over the next 6 months and revisit at the end, I am fine with that

Yes, it does.

but an evaluation would need to be done before this goes out or perhaps identify the subset of conditions where it makes sense to enable it by default.

Sure, I'll make it sure to identify the desirable and problematic conditions and try to document it officially at Apache Spark 3.2.0 timeframe. This feature is added at 2.2.0 and seems to need more community love due to the recent environment changes especially in cloud environments.

BTW, for the following comments (@mridulm and @HyukjinKwon ),

One option would be to do this by default for k8s as @HyukjinKwon suggested
for other resource managers, this does not necessarily apply.

The following questions come to me.

  1. The SSD disks of YARN/Mesos cluster nodes sometime wear out due to the Spark's heavy access pattern. The SSD disk failures happen theoretically. I'm aware of this SSD disk issue and, AFAIK, LinkedIn's external shuffle service also tried to attack those kind of disk issues.
  2. YARN/Mesos cluster node also can go offline for the failure or the regular maintenances.
  3. YARN/Mesos clusters without external shuffle service exist. Especially, Mesos clusters frequently are running without it.
  4. In YARN/Mesos clusters with external shuffle service, external shuffle service itself is also not invincible. YARN service sometimes can crash and Apache Spark has the configurations to mitigate it.

This is a kind of self-healing feature. So, I believe it can help (1) ~ (4).

For the followings, it's unclear to me.

It is quite common for an application to have references to a persisted RDD even after its use - with the loss of the RDD blocks having little to no functional impact.
This is similar to loss of blocks for an unreplicated persisted RDD - we do not proactively recompute the lost blocks; but do so on demand.

Where is the location of the a persisted RDD in this context? Is it in external storage like S3 or HDFS?

In this PR, although Apache Spark RDD maintains the lineages, the number of replicas and proactive recovery is a matter of performance trade-off, not a functional impact. The loss of data re-trigger the previous stages and its ancestor stages partially and I'm observing that this retries can hurt the performance severely.

@dongjoon-hyun
Copy link
Member Author

Merry Christmas and Happy New Year, @mridulm , @HyukjinKwon , @Ngone51 . 😄

@mridulm
Copy link
Contributor

mridulm commented Dec 25, 2020

@dongjoon-hyun proactive replication only applies to persisted RDD blocks, not shuffle blocks - not sure if I am missing something here.

Even for persisted RDD blocks, it specifically applies when RDD is persisted with storage levels where replication > 1 [1].
I view loss of all replicas of a RDD blockId similarly - whether replication is 1 or higher.
Having said that, specifically for usecases where spark cluster might be source of truth (or cost of recomputation is prohibitive), applications can ofcourse enable proactive replication via this flag.
I am not sure I am seeing a concrete reason to turn this on for all applications.

Please let me know if I am missing something in my understanding.

[1] ESS serving disk backed blocks might have some corner cases to this flow which I have not thought through.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 28, 2020

@mridulm . Your comments are true and nothing wrong to me. I agree with you in every bits and we can disable this back for the problematic cases before Apache Spark 3.2.0 vote.

To do that, I believe that we are able to agree that we need to identify what are the problematic corner cases in this threads. At least, we need to provide a better document to the community about this option if some PMCs already have an implicit knowledge about the reasons why this option should be prohibited in YARN environment. It's an invaluable knowledge for the community to share. Besides, we are continuing this discussion because your initial concerns are crucial to the community. AFAIK, nobody else shared the concerns before explicitly.

  1. Especially in context of dynamic resource allocation, it can become very chatty when executor's start getting dropped.
  2. In the past, I found this to be noisy for the cases where replication was enabled.

Could you elaborate about your concern more specifically?

  1. What is the negative side-effect of very chatty and noisy?
  2. How severe it was?

Again, I'm not aiming to protect the default value of the configuration. It's just a configuration and the decision is up to us (you and me and all the community member). It's easy to disable this or to abandon this while it's difficult to improve this for Apache Spark. I'm trying to understand why this should be prohibited in some resource managers or in a normal Spark operation environment and trying to make the Apache Spark better for those cases. That's the reason why I tried to go deeper for that part by proposing the potential points and asking you similar questions repeatedly in this thread specifically. We will have many choices in Apache Spark 3.2.0 if the implicit knowledge is shared more.

  1. For the following, Apache Spark usually drop only empty executors. If you are saying a storage timeout configuration, I believe that what we need is to improve storage timeout configuration behavior after this enabling. I guess storage timeout had better not cause any chatty situation, of course.
  2. I'm trying to understand the risk you mentioned for YARN environment. Could you give me more hints about your concerns on this at the YARN dynamic allocation situation? We can fix it the behavior and move forward if that's valid.
  3. This is my focused use case and I love to hear your concerns. You have all my ears.

So far, I didn't get your answers explicitly. Please let me know if I missed something there.

@mridulm
Copy link
Contributor

mridulm commented Dec 29, 2020

(Sigh, github prematurely posted my previous comment - fleshing it out here).

As I mentioned above, the flag helps applications which are fine with paying the overhead for proactive replication. I have sketched some cases where proactive replication does not help, and others where they could be useful - these are examples ofcourse : but in the end, it is specific to the application.

Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ?

If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ?
What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ?

Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration.
That just exacerbates the problem : since a larger number of executors could be lost.

@mridulm
Copy link
Contributor

mridulm commented Dec 29, 2020

Specifically for this usecase, we dont need to make it a spark default right ?
If I understood right, the following conditions are being met:
a) Application is using RDD with replication > 1 (pre-req)
b) Application is fine with the proactive replication cost, and is coded such that out-of-scope RDD's are ensured to be GC'ed.
c) k8s is (aggressively) configured such that executor decomm is unable to do a sufficiently good enough job.
d) recomputation cost is high enough to be offset by proactive replication (either due to latency SLA or high cost of computation).

If true, then sure, for those applications/cluster env, making proactive replication an application default might makes sense.
But this feels sufficiently narrow enough not to require a global default, right ? It feels more like a deployment/application default and not a platform level default ?

In the scenario above though, how do we handle everything else ?
Shuffle ? Replicated RDD where replication == 1 ?
Perhaps better tuning for (c) might help more holistically ?

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Dec 29, 2020

Thank you, @mridulm . I really appreciate your replies. Let me follow your thoughts.

  1. For this question, I answered at the beginning that this is a kind of self-healing feature here

Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ?

  1. For the following question, I asked your evidence first because I'm not aware of. :)

If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ?

  1. For the following question, it seems that you assume that the current Spark's behavior is the best. I don't think this question justifies that the loss of data inside Spark side is good.

What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ?

  1. For the following, yes, but exacerbates doesn't look like a proper term here because we had better make Spark smarter to handle those cases as I replied at here already.

Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration. That just exacerbates the problem : since a larger number of executors could be lost.

  1. For the following, I didn't make this PR for that specific use case. I made this PR to improve this feature in various environment in Apache Spark 3.2.0 timeframe here.

Specifically for this usecase, we dont need to make it a spark default right ? ...

  1. For the following, I replied that YARN environment also can suffer from disk loss or executor loss here because you insisted that YARN doesn't need this feature from the beginning. I'm still not sure that YARN environment is so invincible like that.

But this feels sufficiently narrow enough not to require a global default, right ? It feels more like a deployment/application default and not a platform level default ?

  1. For replication == 1, spark.storage.replication.proactive only tries to replicate when there exists at least live data. So, replication doesn't occur.

Shuffle ? Replicated RDD where replication == 1 ?

  1. I'm trying to utilize all features from Apache Spark and open for that too . We are developing this and Spark is not a bible written on the rock.

Perhaps better tuning for (c) might help more holistically ?

I know that this is a holiday season and I'm really grateful about your opinions. If you don't mind, can we have a Zoom meeting when you are available, @mridulm ? I think we have different ideas on the open source development and about the scope of this work. I want to make a progress in this area in Apache Spark 3.2.0 by completing a document or a better implementation or anything more. Please let me know if you can have a Zoom meeting. Thanks!

@mridulm
Copy link
Contributor

mridulm commented Dec 29, 2020

Before answering specific queries below, I want to set the context.
a) Enabling proactive replication could result in reduced recomputation cost when executors fail.
b) Enabling it will result in increased transfers when executor(s) are lost.
(Ignoring other minor impacts)

I was trying to understand what the impact would be, what the tradeoffs involved are, when we enable by default:

  1. Are the replication costs (b) lower now ? How do we estimate that cost ?
    (There was non-trivial impact when I had last done some expt's earlier)

  2. Are we (community) running into cases where we benefit from (a) but are not (very) negatively impacted by (b) ?
    Is there any commonality when this happens ?
    (application types/characterstics ? resource manager ? almost all usage ?)

  3. What is the impact to the application (and cluster) when we have nontrivial executor loss - executor release in DRA is one example of this, preemption is another.

  4. Anything else to watch out for ?

As I mentioned earlier, I am fine with collecting data by enabling this flag by default.
I am hoping this and other discussions will help us understand what questions to better evaluate before we release 3.2.

  1. For this question, I answered at the beginning that this is a kind of self-healing feature here

Making it default will impact all applications which have replication > 1: given this PR is proposing to make it the default, I would like to know if there was any motivating reason to make this change ?

Spark is self-healing via lineage :-)
Having said that, as mentioned above, I want to understand what the tradeoff for enabling this flag are.

  1. For the following question, I asked your evidence first because I'm not aware of. :)

If the cost of proactive replication is close to zero now (my experiments were from a while back), ofcourse the discussion is moot - did we have any results for this ?

I am not proposing to change the default behavior, you are ... hence my query :-)
As I mentioned above, when I had looked at this in the past - it was very helpful for some applications, but not others : it depended on the application and their requirements - replication > 1 itself was not very commonly used then.

  1. For the following question, it seems that you assume that the current Spark's behavior is the best. I don't think this question justifies that the loss of data inside Spark side is good.

What is the ongoing cost when application holds RDD references, but they are not in active use for rest of the application (not all references can be cleared by gc) - resulting in replication of blocks for an RDD which is legitimately not going to be used again ?

Couple of points here:
a) There is no data loss - spark recomputes when a lost block is required (but at some recomputation cost).
b) My query was specifically about the cost for replication - given what I described is a common pattern in user applications : I was not saying this is desired code pattern, but it is a commonly observed behavior.

  1. For the following, yes, but exacerbates doesn't look like a proper term here because we had better make Spark smarter to handle those cases as I replied at here already.

Note that the above is orthogonal to DRA evicting an executor via storage timeout configuration. That just exacerbates the problem : since a larger number of executors could be lost.

If we can do better on this, I am definitely very keen on it !
Until that happens, we need to continue supporting existing scenarios where DRA impacts use of this flag.

  1. For the following, I didn't make this PR for that specific use case. I made this PR to improve this feature in various environment in Apache Spark 3.2.0 timeframe here.

Specifically for this usecase, we dont need to make it a spark default right ? ...

This was in response to the scenario described.
Let us decouple discussion of that scenario from our discussion here - and focus on what we need to evaluate for enabling this by default.

  1. For the following, I replied that YARN environment also can suffer from disk loss or executor loss here because you insisted that YARN doesn't need this feature from the beginning. I'm still not sure that YARN environment is so invincible like that.

But this feels sufficiently narrow enough not to require a global default, right ? It feels more like a deployment/application default and not a platform level default ?

I am not sure where we got this from my comments ("because you insisted that YARN doesn't need this feature from the beginning. I'm still not sure that YARN environment is so invincible like that") ? I clearly miscommunicated something here !

My comment on yarn was in agreement with @HyukjinKwon's suggestion. The other was in response to the specific k8s scenario you presented - "currently K8s environment is more aggressive than the other existing resource managers".

  1. For replication == 1, spark.storage.replication.proactive only tries to replicate when there exists at least live data. So, replication doesn't occur.

Shuffle ? Replicated RDD where replication == 1 ?

  1. I'm trying to utilize all features from Apache Spark and open for that too . We are developing this and Spark is not a bible written on the rock.

Perhaps better tuning for (c) might help more holistically ?

I know that this is a holiday season and I'm really grateful about your opinions. If you don't mind, can we have a Zoom meeting when you are available, @mridulm ? I think we have different ideas on the open source development and about the scope of this work. I want to make a progress in this area in Apache Spark 3.2.0 by completing a document or a better implementation or anything more. Please let me know if you can have a Zoom meeting. Thanks!

Sure !

@dongjoon-hyun
Copy link
Member Author

Thanks for here and the email, @mridulm . I replied your email with my zoom link.

flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…ault

### What changes were proposed in this pull request?

This PR aims to enable `spark.storage.replication.proactive` by default for Apache Spark 3.2.0.

### Why are the changes needed?

`spark.storage.replication.proactive` is added by SPARK-15355 at Apache Spark 2.2.0 and has been helpful when the block manager loss occurs frequently like K8s environment.

### Does this PR introduce _any_ user-facing change?

Yes, this will make the Spark jobs more robust.

### How was this patch tested?

Pass the existing UTs.

Closes apache#30876 from dongjoon-hyun/SPARK-33870.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 90d6f86)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants