Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39916][SQL][MLLIB][REFACTOR] Merge ml SchemaUtils to SQL #37336

Closed
wants to merge 5 commits into from
Closed

[SPARK-39916][SQL][MLLIB][REFACTOR] Merge ml SchemaUtils to SQL #37336

wants to merge 5 commits into from

Conversation

deshanxiao
Copy link
Contributor

What changes were proposed in this pull request?

Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. This pr is try to remove SchemaUtils in mllib.

Why are the changes needed?

Two SchemaUtils are often confused for us. MLlib SchemaUtils add a TODO flag and now we can do it.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

exist UT

@deshanxiao
Copy link
Contributor Author

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the GitHub Action CI happy, @deshanxiao ?

@deshanxiao
Copy link
Contributor Author

Could you make the GitHub Action CI happy, @deshanxiao ?

Sure, seem that all checks have passed.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@@ -297,4 +296,117 @@ private[spark] object SchemaUtils {
.replaceAll("\u000B", "\\\\v")
.replaceAll("\u0007", "\\\\a")
}

/**
* Check whether the given schema contains a column of the required data type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but it seems a bit weird to define methods in SQL but only call them in MLlib.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but it seems a bit weird to define methods in SQL but only call them in MLlib.

Yes, but it seems more weird to have a SchemaUtils in the MLiib too. Do you have any good advice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1

also feel weird since those utils are only dedicated for ML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rename the SchemaUtils in MLlib so that the name is more ML specific?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly about it. WDYT @srowen @WeichenXu123 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm not sure it simplifies things, just moves them around. I'd be neutral on this

@deshanxiao
Copy link
Contributor Author

Very happy to receive these suggestions. Please let me know if anyone have any questions. Will close this PR.

@deshanxiao deshanxiao closed this Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants