New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39916][SQL][MLLIB][REFACTOR] Merge ml SchemaUtils to SQL #37336
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the GitHub Action CI happy, @deshanxiao ?
Sure, seem that all checks have passed. |
Can one of the admins verify this patch? |
@@ -297,4 +296,117 @@ private[spark] object SchemaUtils { | |||
.replaceAll("\u000B", "\\\\v") | |||
.replaceAll("\u0007", "\\\\a") | |||
} | |||
|
|||
/** | |||
* Check whether the given schema contains a column of the required data type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but it seems a bit weird to define methods in SQL but only call them in MLlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but it seems a bit weird to define methods in SQL but only call them in MLlib.
Yes, but it seems more weird to have a SchemaUtils in the MLiib too. Do you have any good advice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1
also feel weird since those utils are only dedicated for ML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can rename the SchemaUtils
in MLlib so that the name is more ML specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about it. WDYT @srowen @WeichenXu123 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm not sure it simplifies things, just moves them around. I'd be neutral on this
Very happy to receive these suggestions. Please let me know if anyone have any questions. Will close this PR. |
What changes were proposed in this pull request?
Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. This pr is try to remove SchemaUtils in mllib.
Why are the changes needed?
Two SchemaUtils are often confused for us. MLlib SchemaUtils add a TODO flag and now we can do it.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
exist UT