-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38073][PYTHON] Update atexit function to avoid issues with late binding #35396
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HyukjinKwon
approved these changes
Feb 4, 2022
dongjoon-hyun
approved these changes
Feb 5, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. I also verified it in Python 3.10.2. Merged to master/3.2.
BEFORE
$ PYSPARK_DRIVER_PYTHON=$(which ipython3) bin/pyspark
Python 3.10.2 (main, Feb 1 2022, 14:05:50) [Clang 13.0.0 (clang-1300.0.29.30)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/04 20:17:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.0-SNAPSHOT
/_/
Using Python version 3.10.2 (main, Feb 1 2022 14:05:50)
Spark context Web UI available at http://172.16.0.51:4040
Spark context available as 'sc' (master = local[*], app id = local-1644034626009).
SparkSession available as 'spark'.
In [1]:
Do you really want to exit ([y]/n)? y
Exception ignored in atexit callback: <function <lambda> at 0x11f149510>
Traceback (most recent call last):
File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/shell.py", line 49, in <lambda>
atexit.register(lambda: sc.stop())
NameError: name 'sc' is not defined
AFTER
$ PYSPARK_DRIVER_PYTHON=$(which ipython3) bin/pyspark
Python 3.10.2 (main, Feb 1 2022, 14:05:50) [Clang 13.0.0 (clang-1300.0.29.30)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/04 20:18:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.0-SNAPSHOT
/_/
Using Python version 3.10.2 (main, Feb 1 2022 14:05:50)
Spark context Web UI available at http://172.16.0.51:4040
Spark context available as 'sc' (master = local[*], app id = local-1644034701684).
SparkSession available as 'spark'.
In [1]:
Do you really want to exit ([y]/n)? y
$
dongjoon-hyun
pushed a commit
that referenced
this pull request
Feb 5, 2022
…e binding ### What changes were proposed in this pull request? This PR updates function registered in PySpark shell `atexit` to capture `SparkContext` instead of depending on the surrounding context. **Note** A simpler approach ```python atexit.register(sc.stop) ``` is possible, but won't work properly in case of contexts with monkey patched `stop` methods (for example like [pyspark-asyncactions](https://github.com/zero323/pyspark-asyncactions)) I also consider using `_active_spark_context` ```python atexit.register(lambda: ( SparkContext._active_spark_context.stop() if SparkContext._active_spark_context else None )) ``` but `SparkContext` is also out of scope, so that doesn't work without introducing a standard function within the scope. ### Why are the changes needed? When using `ipython` as a driver with Python 3.8, `sc` goes out of scope before `atexit` function is called. This leads to `NameError` on exit. This is a mild annoyance and likely a bug in ipython (there are quite a few of these with similar behavior), but it is easy to address on our side, without causing regressions for users of earlier Python versions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual testing to confirm that: - Named error is no longer thrown on exit with ipython and Python 3.8 or later. - `stop` is indeed invoked on exit with both plain interpreter and ipython shells. Closes #35396 from zero323/SPARK-38073. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 3e0d489) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Thanks guys! |
kazuyukitanimura
pushed a commit
to kazuyukitanimura/spark
that referenced
this pull request
Aug 10, 2022
…e binding ### What changes were proposed in this pull request? This PR updates function registered in PySpark shell `atexit` to capture `SparkContext` instead of depending on the surrounding context. **Note** A simpler approach ```python atexit.register(sc.stop) ``` is possible, but won't work properly in case of contexts with monkey patched `stop` methods (for example like [pyspark-asyncactions](https://github.com/zero323/pyspark-asyncactions)) I also consider using `_active_spark_context` ```python atexit.register(lambda: ( SparkContext._active_spark_context.stop() if SparkContext._active_spark_context else None )) ``` but `SparkContext` is also out of scope, so that doesn't work without introducing a standard function within the scope. ### Why are the changes needed? When using `ipython` as a driver with Python 3.8, `sc` goes out of scope before `atexit` function is called. This leads to `NameError` on exit. This is a mild annoyance and likely a bug in ipython (there are quite a few of these with similar behavior), but it is easy to address on our side, without causing regressions for users of earlier Python versions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual testing to confirm that: - Named error is no longer thrown on exit with ipython and Python 3.8 or later. - `stop` is indeed invoked on exit with both plain interpreter and ipython shells. Closes apache#35396 from zero323/SPARK-38073. Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 3e0d489) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2e382c8)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR updates function registered in PySpark shell
atexit
to captureSparkContext
instead of depending on the surrounding context.Note
A simpler approach
is possible, but won't work properly in case of contexts with monkey patched
stop
methods (for example like pyspark-asyncactions)I also consider using
_active_spark_context
but
SparkContext
is also out of scope, so that doesn't work without introducing a standard function within the scope.Why are the changes needed?
When using
ipython
as a driver with Python 3.8,sc
goes out of scope beforeatexit
function is called. This leads toNameError
on exit. This is a mild annoyance and likely a bug in ipython (there are quite a few of these with similar behavior), but it is easy to address on our side, without causing regressions for users of earlier Python versions.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual testing to confirm that:
stop
is indeed invoked on exit with both plain interpreter and ipython shells.