Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix count(*) and count(col(*)) #39622

Closed
wants to merge 5 commits into from

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Jan 17, 2023

What changes were proposed in this pull request?

1, add UnresolvedStar to expressions.py;
2, Fix count(*) and count(col(*)), should return Column(UnresolvedStar(None)) instead of Column(UnresolvedAttribute("*")), see:

def this(name: String) = this(name match {
case "*" => UnresolvedStar(None)
case _ if name.endsWith(".*") =>
val parts = UnresolvedAttribute.parseAttributeName(name.substring(0, name.length - 2))
UnresolvedStar(Some(parts))
case _ => UnresolvedAttribute.quotedString(name)
})

3, remove the count(*) -> count(1) transformation in group.py, since it's no longer needed.

Why are the changes needed?

#39636 fixed the count(*) issue in the server side, and then count(expr(*)) works after that PR.

This PR makes the corresponding changes in the Python Client side, in order to support count(*), and count(col(*))

Does this PR introduce any user-facing change?

yes

How was this patch tested?

enabled UT and added UT

@zhengruifeng
Copy link
Contributor Author

I think we should fix it in Scala side, but didn't find the right place.

@zhengruifeng
Copy link
Contributor Author

@zhengruifeng zhengruifeng changed the title [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix count(*) and count(expr(*)) [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix count(*), count(col(*)), count(expr(*)) Jan 17, 2023
@zhengruifeng
Copy link
Contributor Author

I checked that #39636 can resolve the count(*) issue in Spark Connect.

@zhengruifeng zhengruifeng changed the title [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix count(*), count(col(*)), count(expr(*)) [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix count(*) and count(col(*)) Jan 20, 2023
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI seems to complain at compilation failure. Could you double-check and re-trigger, @zhengruifeng ?

Error: SparkConnectPlannerSuite.scala:555:
value addTarget is not a member of 
org.apache.spark.connect.proto.Expression.UnresolvedStar.Builder

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@zhengruifeng
Copy link
Contributor Author

merged into master, thank you @cloud-fan @HyukjinKwon @dongjoon-hyun !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants