-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-16281][SQL][FOLLOW-UP] Add parse_url to functions.scala #22249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gatorsmile @cloud-fan @HyukjinKwon @mgaido91 Could you please review this PR and start tests? This is my first PR in Spark, so please guide me in case of any mistakes or additional requirements |
|
Can one of the admins verify this patch? |
|
I remember that @rxin was against adding much of these functions here through the various programmatic APIs: only the most used ones should have been exposed and I am not sure this is a frequently used one. So I am not sure about this PR. Anyway, I think we should add to the Python API too if we decide to add this at all. |
| * @group string_funcs | ||
| * @since 2.4.0 | ||
| */ | ||
| def parse_url(url: Column, partToExtract: String): Column = withExpr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding the functions into object functions, we need to check how useful they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is created after this StackOverflow question: https://stackoverflow.com/questions/52041342/how-to-parse-url-in-spark-sqlscala/52043771
I'm not sure how often it is used, however most of functions are available in functions object to make Scala and SQL interfaces similar. If you think it's useless - please let me know, I'll just close this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In long term, I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url"). Now spark.udf.register returns a handler for UDF, something similar for getting any kind of registered function may be helpful. However, it's a lot more work, so now I've just proposed to add this function :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomaszGaweda This sounds a good idea by returning a handler for built-in functions. cc @rxin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, tomorrow I will create a Jira and start working on it. Thanks for your comments! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use expr instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Thanks for the suggestion, however now users are complaining about stringly-typed system in Spark, there are libs like Frameless from Typelevel to archieve a bit more type safety. expr is springly-typed, while functions in functions object or accessed via UserDefinedFunction are a bit more type safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean,
I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url")
Can this support strongly typed one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon I see now. Yeah, wrapping in the Column will be necessary, at least no string concatenation will be required
What changes were proposed in this pull request?
Add
parse_urlfunction tofunctions.scala. This will allow users to use this functions without callingselectExprorspark.sqlHow was this patch tested?
testUrlfunction was changed to test also this changePlease review http://spark.apache.org/contributing.html before opening a pull request.