Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42169] Implement code generation for to_csv function (StructsToCsv) #39097

Closed
wants to merge 6 commits into from

Conversation

NarekDW
Copy link
Contributor

@NarekDW NarekDW commented Dec 16, 2022

What changes were proposed in this pull request?

This PR enhances StructsToCsv class with doGenCode function instead of extending it from CodegenFallback trait (performance improvement).

Why are the changes needed?

It will improve performance.

Does this PR introduce any user-facing change?

No

How was this patch tested?

by existing tests

@github-actions github-actions bot added the SQL label Dec 16, 2022
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon HyukjinKwon changed the title [SPARK-41049][SQL] make to_csv function deterministic [SPARK-41049][SQL] Make to_csv function deterministic Dec 18, 2022
@HyukjinKwon
Copy link
Member

Hm, is the issue from non-codegen part from Rand?

@HyukjinKwon
Copy link
Member

Would be easier to review if you describe how it happens and how this PR fixes.

@NarekDW
Copy link
Contributor Author

NarekDW commented Dec 18, 2022

Hi @HyukjinKwon I've added 2 sections in the header description - Debugging the root cause and How this PR changes such a behaviour with my point of view, how it happens and how this PR could improve the behaviour.
Could you review, please.

@HyukjinKwon
Copy link
Member

cc @cloud-fan FYI

@gboo-infa
Copy link

While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic child nodes; the problem is not that to_csv happens to be CodegenFallback. Other nodes are also CodegenFallback, and future nodes will be as well, and this change doesn't address that...

@NarekDW
Copy link
Contributor Author

NarekDW commented Dec 19, 2022

While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic child nodes; the problem is not that to_csv happens to be CodegenFallback. Other nodes are also CodegenFallback, and future nodes will be as well, and this change doesn't address that...

Yep, good point. This is just a partial solution, for to_csv function in particular.
But I don't know, should we rename this PR or create another task/bug in JIRA...

@NarekDW
Copy link
Contributor Author

NarekDW commented Dec 20, 2022

@HyukjinKwon any suggestion on this?
Should I keep this PR open?

@cloud-fan
Copy link
Contributor

I'm fixing the root cause at #39248

@cloud-fan
Copy link
Contributor

Can we refine the PR description? It still good to have codegen for to_csv, but it's only for performance, as the correctness issue has been fixed by #39248

@NarekDW NarekDW changed the title [SPARK-41049][SQL] Make to_csv function deterministic Implement code generation for to_csv function (doGenCode) Jan 1, 2023
@NarekDW
Copy link
Contributor Author

NarekDW commented Jan 1, 2023

Can we refine the PR description? It still good to have codegen for to_csv, but it's only for performance, as the correctness issue has been fixed by #39248

@cloud-fan Sure, I've renamed this PR (also changed the description). Unfortunately, I don't have an account in the JIRA, and I can't create a separate JIRA task for this.
Also, I left a test case in this PR:

test("SPARK-41049: make to_csv function deterministic") {
...

I think it might be useful, let me know if it is redundant.

@cloud-fan
Copy link
Contributor

Unfortunately, I don't have an account in the JIRA, and I can't create a separate JIRA task for this.

Can you sign up and create an account? The JIRA system allows anyone to register.

@NarekDW
Copy link
Contributor Author

NarekDW commented Jan 6, 2023

Unfortunately, I don't have an account in the JIRA, and I can't create a separate JIRA task for this.

Can you sign up and create an account? The JIRA system allows anyone to register.

Unfortunately, it is disabled
image

I mailed to private@spark.apache.org to get an account created, about 2 weeks ago, but didn't get any response...
If you can suggest anybody, whom I can address with such a question, it will be great.

@NarekDW NarekDW changed the title Implement code generation for to_csv function (doGenCode) [SPARK-42169] Implement code generation for to_csv function (StructsToCsv) Jan 24, 2023
@NarekDW NarekDW closed this Jan 24, 2023
@NarekDW NarekDW deleted the SPARK-41049 branch January 24, 2023 15:07
@NarekDW
Copy link
Contributor Author

NarekDW commented Jan 24, 2023

@cloud-fan finally I've got my JIRA account and created a separate ticket for this.
I've closed this PR as I've renamed my branch in accordance with new JIRA ticket.
Please check the new PR here - #39719

@weicm
Copy link

weicm commented Feb 1, 2023

@cloud-fan finally I've got my JIRA account and created a separate ticket for this. I've closed this PR as I've renamed my branch in accordance with new JIRA ticket. Please check the new PR here - #39719

Are you getting a JIRA account by sending an email to private@spark.apache.org?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants