Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26979][PYTHON] Add missing string column name support for some SQL functions #23882
[SPARK-26979][PYTHON] Add missing string column name support for some SQL functions #23882
Changes from all commits
4e22433
935bf4c
cf8ede4
23d0222
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
lit
takes the string as column names? how do we create string literal?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and .. what's really "name function" ... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No,
lit()
takes a literal value and creates a column with that literal. That's what it's for. It doesn't make sense for it to accept a column name, given its nature. So if you give it a string it will create a column with that string literal.The name "name function" is something I came up with just to distinguish these functions from the ones that take columns as input. They are defined by that distinction - they are "functions that take a column name as their argument", exclusively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lit
doesn't take a column name as their argument. Why did we use suchname function
category forlit
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair,
lit
is such a unique function semantically that it would warrant its own category. But since the implementation is exactly the same as those "name functions", I left it there for practical purposes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One (lit) of five (col, column, asc, desc, lit) doesn't sound like a special case tho. It had to have a better category if 20% of items doesn't fit to the category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This had to stay in string functions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initcap
was also defined explicitly below, with better documentation, so I preserved that instance.lower
andupper
have been introduced in the 1.3 API, and were being defined there as well, so I prioritised that instance because it used the correct@since
number. Arguably we could've used a separate mapping likestring_functions_1_3
or something, but that didn't seem appropriate since the point of these is to minimise code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying
lower
andupper
. Looks it has been overwritten by this, so if we keep here, it keeps previous definition. Did you check which PR and JIRAs addedlower
andupper
?Also, strictly this should have not been removed in this PR as it doesn't target to remove overwritten functions. As you said, we should avoid such function definition way later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if it was an exception, it had to describe it specifically.
This doesn't look making sense if you read the codes from scratch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The java/scala API documentation says it was added in 1.3, but I just tracked down the JIRA/PR and it seems it actually was 1.0.
https://issues.apache.org/jira/browse/SPARK-1995
#936
As for removing overwritten functions, maybe it would've been better to make a separate PR, but the first fix did require removing them. When I changed the approach it seemed reasonable to keep the change, since the problem was obvious and easy to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you not add the column name support because Scala side has this signautre below?:
That's not allowed in Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I initially expected when I asked to whitelist them, @asmello
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're saying here.
trim
with 2 arguments was never supported in PySpark, and that's a separate issue. What this patch changes is thattrim("value")
is now supported, when it wasn't previously.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was saying this because I didn't get why you excluded string functions.