Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38073][PYTHON] Update atexit function to avoid issues with late binding #35396

Closed
wants to merge 1 commit into from

Conversation

zero323
Copy link
Member

@zero323 zero323 commented Feb 4, 2022

What changes were proposed in this pull request?

This PR updates function registered in PySpark shell atexit to capture SparkContext instead of depending on the surrounding context.

Note

A simpler approach

atexit.register(sc.stop)

is possible, but won't work properly in case of contexts with monkey patched stop methods (for example like pyspark-asyncactions)

I also consider using _active_spark_context

atexit.register(lambda: (
    SparkContext._active_spark_context.stop()
    if SparkContext._active_spark_context
    else None
))

but SparkContext is also out of scope, so that doesn't work without introducing a standard function within the scope.

Why are the changes needed?

When using ipython as a driver with Python 3.8, sc goes out of scope before atexit function is called. This leads to NameError on exit. This is a mild annoyance and likely a bug in ipython (there are quite a few of these with similar behavior), but it is easy to address on our side, without causing regressions for users of earlier Python versions.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual testing to confirm that:

  • Named error is no longer thrown on exit with ipython and Python 3.8 or later.
  • stop is indeed invoked on exit with both plain interpreter and ipython shells.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. I also verified it in Python 3.10.2. Merged to master/3.2.

BEFORE

$ PYSPARK_DRIVER_PYTHON=$(which ipython3) bin/pyspark
Python 3.10.2 (main, Feb  1 2022, 14:05:50) [Clang 13.0.0 (clang-1300.0.29.30)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/04 20:17:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.0-SNAPSHOT
      /_/

Using Python version 3.10.2 (main, Feb  1 2022 14:05:50)
Spark context Web UI available at http://172.16.0.51:4040
Spark context available as 'sc' (master = local[*], app id = local-1644034626009).
SparkSession available as 'spark'.

In [1]:
Do you really want to exit ([y]/n)? y
Exception ignored in atexit callback: <function <lambda> at 0x11f149510>
Traceback (most recent call last):
  File "/Users/dongjoon/APACHE/spark-merge/python/pyspark/shell.py", line 49, in <lambda>
    atexit.register(lambda: sc.stop())
NameError: name 'sc' is not defined

AFTER

$ PYSPARK_DRIVER_PYTHON=$(which ipython3) bin/pyspark
Python 3.10.2 (main, Feb  1 2022, 14:05:50) [Clang 13.0.0 (clang-1300.0.29.30)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.0.1 -- An enhanced Interactive Python. Type '?' for help.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/02/04 20:18:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.0-SNAPSHOT
      /_/

Using Python version 3.10.2 (main, Feb  1 2022 14:05:50)
Spark context Web UI available at http://172.16.0.51:4040
Spark context available as 'sc' (master = local[*], app id = local-1644034701684).
SparkSession available as 'spark'.

In [1]:
Do you really want to exit ([y]/n)? y
$

dongjoon-hyun pushed a commit that referenced this pull request Feb 5, 2022
…e binding

### What changes were proposed in this pull request?

This PR updates function registered in PySpark shell `atexit` to capture `SparkContext` instead of depending on the surrounding context.

**Note**

A simpler approach

```python
atexit.register(sc.stop)
```

is possible, but won't work properly in case of contexts with monkey patched `stop` methods (for example like [pyspark-asyncactions](https://github.com/zero323/pyspark-asyncactions))

I also consider using `_active_spark_context`

```python
atexit.register(lambda: (
    SparkContext._active_spark_context.stop()
    if SparkContext._active_spark_context
    else None
))
```

but `SparkContext` is also out of scope, so that doesn't work without introducing a standard function within the scope.

### Why are the changes needed?

When using `ipython` as a driver with Python 3.8, `sc` goes out of scope before `atexit` function is called. This leads to `NameError` on exit. This is a mild annoyance and likely a bug in ipython (there are quite a few of these with similar behavior), but it is easy to address on our side, without causing regressions for users of earlier Python versions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing to confirm that:

- Named error is no longer thrown on exit with ipython and Python 3.8 or later.
- `stop` is indeed invoked on exit with both plain interpreter and ipython shells.

Closes #35396 from zero323/SPARK-38073.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 3e0d489)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@zero323
Copy link
Member Author

zero323 commented Feb 5, 2022

Thanks guys!

@zero323 zero323 deleted the SPARK-38073 branch February 5, 2022 09:27
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…e binding

### What changes were proposed in this pull request?

This PR updates function registered in PySpark shell `atexit` to capture `SparkContext` instead of depending on the surrounding context.

**Note**

A simpler approach

```python
atexit.register(sc.stop)
```

is possible, but won't work properly in case of contexts with monkey patched `stop` methods (for example like [pyspark-asyncactions](https://github.com/zero323/pyspark-asyncactions))

I also consider using `_active_spark_context`

```python
atexit.register(lambda: (
    SparkContext._active_spark_context.stop()
    if SparkContext._active_spark_context
    else None
))
```

but `SparkContext` is also out of scope, so that doesn't work without introducing a standard function within the scope.

### Why are the changes needed?

When using `ipython` as a driver with Python 3.8, `sc` goes out of scope before `atexit` function is called. This leads to `NameError` on exit. This is a mild annoyance and likely a bug in ipython (there are quite a few of these with similar behavior), but it is easy to address on our side, without causing regressions for users of earlier Python versions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing to confirm that:

- Named error is no longer thrown on exit with ipython and Python 3.8 or later.
- `stop` is indeed invoked on exit with both plain interpreter and ipython shells.

Closes apache#35396 from zero323/SPARK-38073.

Authored-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 3e0d489)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2e382c8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants