Skip to content

Conversation

@TomaszGaweda
Copy link

What changes were proposed in this pull request?

Add parse_url function to functions.scala. This will allow users to use this functions without calling selectExpr or spark.sql

How was this patch tested?

testUrl function was changed to test also this change

Please review http://spark.apache.org/contributing.html before opening a pull request.

@TomaszGaweda
Copy link
Author

TomaszGaweda commented Aug 27, 2018

@gatorsmile @cloud-fan @HyukjinKwon @mgaido91 Could you please review this PR and start tests? This is my first PR in Spark, so please guide me in case of any mistakes or additional requirements

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mgaido91
Copy link
Contributor

I remember that @rxin was against adding much of these functions here through the various programmatic APIs: only the most used ones should have been exposed and I am not sure this is a frequently used one. So I am not sure about this PR.

Anyway, I think we should add to the Python API too if we decide to add this at all.

* @group string_funcs
* @since 2.4.0
*/
def parse_url(url: Column, partToExtract: String): Column = withExpr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding the functions into object functions, we need to check how useful they are.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is created after this StackOverflow question: https://stackoverflow.com/questions/52041342/how-to-parse-url-in-spark-sqlscala/52043771

I'm not sure how often it is used, however most of functions are available in functions object to make Scala and SQL interfaces similar. If you think it's useless - please let me know, I'll just close this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In long term, I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url"). Now spark.udf.register returns a handler for UDF, something similar for getting any kind of registered function may be helpful. However, it's a lot more work, so now I've just proposed to add this function :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomaszGaweda This sounds a good idea by returning a handler for built-in functions. cc @rxin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, tomorrow I will create a Jira and start working on it. Thanks for your comments! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use expr instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Thanks for the suggestion, however now users are complaining about stringly-typed system in Spark, there are libs like Frameless from Typelevel to archieve a bit more type safety. expr is springly-typed, while functions in functions object or accessed via UserDefinedFunction are a bit more type safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean,

I would suggest method that return handler for any registered function. So that you can write: SqlFunction something = spark.(...?).getFunction("parse_url")

Can this support strongly typed one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon I see now. Yeah, wrapping in the Column will be necessary, at least no string concatenation will be required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants