Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22340][PYTHON][FOLLOW-UP] Add a better message and improve documentation for pinned thread mode #26588

Closed
wants to merge 3 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Nov 19, 2019

What changes were proposed in this pull request?

This PR proposes to show different warning message when the pinned thread mode is enabled:

When enabled:

PYSPARK_PIN_THREAD feature is enabled. However, note that it cannot inherit the local properties from the parent thread although it isolates each thread on PVM and JVM with its own local properties.
To work around this, you should manually copy and set the local properties from the parent thread to the child thread when you create another thread.

When disabled:

Currently, 'setLocalProperty' (set to local properties) with multiple threads does not properly work.
Internally threads on PVM and JVM are not synced, and JVM thread can be reused for multiple threads on PVM, which fails to isolate local properties for each thread on PVM.
To work around this, you can set PYSPARK_PIN_THREAD to true (see SPARK-22340). However, note that it cannot inherit the local properties from the parent thread although it isolates each thread on PVM and JVM with its own local properties.
To work around this, you should manually copy and set the local properties from the parent thread to the child thread when you create another thread.

Why are the changes needed?

Currently, it shows the same warning message regardless of PYSPARK_PIN_THREAD being set. In the warning message it says "you can set PYSPARK_PIN_THREAD to true ..." which is confusing.

Does this PR introduce any user-facing change?

Documentation and warning message as shown above.

How was this patch tested?

Manually tested.

$ PYSPARK_PIN_THREAD=true ./bin/pyspark
sc.setJobGroup("a", "b")
.../pyspark/util.py:141: UserWarning: PYSPARK_PIN_THREAD feature is enabled. However, note that it cannot inherit the local properties from the parent thread although it isolates each thread on PVM and JVM with its own local properties.
To work around this, you should manually copy and set the local properties from the parent thread to the child thread when you create another thread.
  warnings.warn(msg, UserWarning)
$ ./bin/pyspark
sc.setJobGroup("a", "b")
.../pyspark/util.py:141: UserWarning: Currently, 'setJobGroup' (set to local properties) with multiple threads does not properly work.
Internally threads on PVM and JVM are not synced, and JVM thread can be reused for multiple threads on PVM, which fails to isolate local properties for each thread on PVM.
To work around this, you can set PYSPARK_PIN_THREAD to true (see SPARK-22340). However, note that it cannot inherit the local properties from the parent thread although it isolates each thread on PVM and JVM with its own local properties.
To work around this, you should manually copy and set the local properties from the parent thread to the child thread when you create another thread.
  warnings.warn(msg, UserWarning)

@HyukjinKwon
Copy link
Member Author

cc @squito and @WeichenXu123

@HyukjinKwon HyukjinKwon changed the title SPARK-22340][PYTHON][FOLLOW-UP] Add a better message and improve documentation for pinned thread mode [SPARK-22340][PYTHON][FOLLOW-UP] Add a better message and improve documentation for pinned thread mode Nov 19, 2019
@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114044 has finished for PR 26588 at commit cfc1277.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • Workaround class can be written as below (unofficial way):
  • >>> class CustomThread(threading.Thread):
  • Workaround class can be written as below (unofficial way):
  • >>> class CustomThread(threading.Thread):
  • Workaround class can be written as below (unofficial way):
  • >>> class CustomThread(threading.Thread):

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114050 has finished for PR 26588 at commit f0b1cc3.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 19, 2019

Test build #114054 has finished for PR 26588 at commit f0b1cc3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Nov 20, 2019

I like improving the error msg.

I'm not so sure about including the extra msg explaining the custom thread code, I have a feeling its more confusing than useful. I feel like users often won't even be creating threads themselves -- eg. they're running an http server which has a thread pool in it somewhere, but the developer doesn't even see it. It seems much more straightforward to just reset the relevant properties in each thread.

Maybe I'm not properly understanding how folks use the inheritable properties?

@HyukjinKwon
Copy link
Member Author

Hm, right. I just realised that the use cases can be different. In our case, it needs to launch multiple jobs in multiple threads with each group id. Later, in the main thread, it cancels the jobs by the job group id.

Let me just remove the workaround codes. I think I agree that it might more cause confusions.

@SparkQA
Copy link

SparkQA commented Nov 21, 2019

Test build #114182 has finished for PR 26588 at commit 5a4c142.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Let me merge this one as is. I think it's now pretty conservative.

@squito
Copy link
Contributor

squito commented Nov 21, 2019

👍

@HyukjinKwon HyukjinKwon deleted the SPARK-22340 branch March 3, 2020 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants