Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44105][SQL] LastNonNull should be lazily resolved #41672

Closed

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

LastNonNull should be lazily resolved

Why are the changes needed?

to fix https://github.com/apache/spark/pull/41670/files#r1234805869

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing GA and manually check

@github-actions github-actions bot added the SQL label Jun 20, 2023
@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Jun 20, 2023

manually check with #41670:

(spark_dev_309) ~/Dev/spark (interpolate ✗) bin/pyspark --remote "local[*]" 
Python 3.9.16 (main, Mar  8 2023, 04:29:24) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.13.2 -- An enhanced Interactive Python. Type '?' for help.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/06/20 15:53:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.5.0.dev0
      /_/

Using Python version 3.9.16 (main, Mar  8 2023 04:29:24)
Client connected to the Spark Connect server at localhost
SparkSession available as 'spark'.

In [1]: import pyspark.pandas as ps

In [2]: import numpy as np

In [3]: psser = ps.Series([1, np.nan, 3], name="a")

In [4]: psser.interpolate()
/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/expressions.py:945: UserWarning: WARN WindowExpression: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
  warnings.warn(
Out[4]: 23/06/20 15:54:06 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:06 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:06 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:06 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:06 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:07 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:07 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:07 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
23/06/20 15:54:07 WARN WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.

0    1.0
1    2.0
2    3.0
Name: a, dtype: float64


@zhengruifeng
Copy link
Contributor Author

cc @HyukjinKwon @itholic

@zhengruifeng zhengruifeng changed the title [SPARK-44105][PS] LastNonNull should be lazily resolved [SPARK-44105][PS] LastNonNull should be lazily resolved Jun 20, 2023
@zhengruifeng zhengruifeng changed the title [SPARK-44105][PS] LastNonNull should be lazily resolved [SPARK-44105][PS][SQL] LastNonNull should be lazily resolved Jun 20, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44105][PS][SQL] LastNonNull should be lazily resolved [SPARK-44105][SQL] LastNonNull should be lazily resolved Jun 20, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for Apache Spark 3.5.0.

@zhengruifeng zhengruifeng deleted the ps_fix_last_not_null branch June 20, 2023 16:36
@zhengruifeng
Copy link
Contributor Author

@dongjoon-hyun @HyukjinKwon thank you for reviewing

@itholic
Copy link
Contributor

itholic commented Jun 21, 2023

Late review. Nice fix, thanks :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants