Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30556][SQL][2.4] Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext #27340

Closed
wants to merge 1 commit into from

Conversation

ajithme
Copy link
Contributor

@ajithme ajithme commented Jan 23, 2020

What changes were proposed in this pull request?

In org.apache.spark.sql.execution.SubqueryExec#relationFuture make a copy of org.apache.spark.SparkContext#localProperties and pass it to the sub-execution thread in org.apache.spark.sql.execution.SubqueryExec#executionContext

Why are the changes needed?

Local properties set via sparkContext are not available as TaskContext properties when executing jobs and threadpools have idle threads which are reused

Explanation:
When SubqueryExec, the relationFuture is evaluated via a separate thread. The threads inherit the localProperties from sparkContext as they are the child threads.
These threads are created in the executionContext (thread pools). Each Thread pool has a default keepAliveSeconds of 60 seconds for idle threads.
Scenarios where the thread pool has threads which are idle and reused for a subsequent new query, the thread local properties will not be inherited from spark context (thread properties are inherited only on thread creation) hence end up having old or no properties set. This will cause taskset properties to be missing when properties are transferred by child thread via sparkContext.runJob/submitJob

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

@ajithme
Copy link
Contributor Author

ajithme commented Jan 23, 2020

cc @dongjoon-hyun @cloud-fan

This is a backport of #27267 fix
I had to do some modification:

  • org.apache.spark.util.Utils#cloneProperties was missing hence added
  • UT to use scalar subquery to reproduce the issue

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jan 23, 2020

Test build #117303 has finished for PR 27340 at commit ede4044.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-30556][SQL][branch2.4] Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext [SPARK-30556][SQL][2.4] Copy sparkContext.localproperties to child thread inSubqueryExec.executionContext Jan 23, 2020
@dongjoon-hyun
Copy link
Member

All tests passed. The R failure is a known incoming feasibility test flakiness. cc @viirya

* checking CRAN incoming feasibility ...Error in .check_package_CRAN_incoming(pkgdir) : 
  dims [product 26] do not match the length of object [0]

@dongjoon-hyun
Copy link
Member

Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request Jan 23, 2020
…read inSubqueryExec.executionContext

### What changes were proposed in this pull request?
In `org.apache.spark.sql.execution.SubqueryExec#relationFuture` make a copy of `org.apache.spark.SparkContext#localProperties` and pass it to the sub-execution thread in `org.apache.spark.sql.execution.SubqueryExec#executionContext`

### Why are the changes needed?
Local properties set via sparkContext are not available as TaskContext properties when executing  jobs and threadpools have idle threads which are reused

Explanation:
When `SubqueryExec`, the relationFuture is evaluated via a separate thread. The threads inherit the `localProperties` from `sparkContext` as they are the child threads.
These threads are created in the `executionContext` (thread pools). Each Thread pool has a default keepAliveSeconds of 60 seconds for idle threads.
Scenarios where the thread pool has threads which are idle and reused for a subsequent new query, the thread local properties will not be inherited from spark context (thread properties are inherited only on thread creation) hence end up having old or no properties set. This will cause taskset properties to be missing when properties are transferred by child thread via `sparkContext.runJob/submitJob`

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added UT

Closes #27340 from ajithme/subquerylocalprop2.

Authored-by: Ajith <ajith2489@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Thank you, @ajithme and @cloud-fan .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants