New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42169] Implement code generation for to_csv function (StructsToCsv) #39097
Conversation
Can one of the admins verify this patch? |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/csvExpressions.scala
Outdated
Show resolved
Hide resolved
Hm, is the issue from non-codegen part from |
Would be easier to review if you describe how it happens and how this PR fixes. |
Hi @HyukjinKwon I've added 2 sections in the header description - Debugging the root cause and How this PR changes such a behaviour with my point of view, how it happens and how this PR could improve the behaviour. |
cc @cloud-fan FYI |
While this change may be good to have for its own reasons, it doesn't really address the problem in the JIRA. The problem is that CodegenFallback is incompatible with nondeterministic child nodes; the problem is not that to_csv happens to be CodegenFallback. Other nodes are also CodegenFallback, and future nodes will be as well, and this change doesn't address that... |
Yep, good point. This is just a partial solution, for |
@HyukjinKwon any suggestion on this? |
I'm fixing the root cause at #39248 |
Can we refine the PR description? It still good to have codegen for |
@cloud-fan Sure, I've renamed this PR (also changed the description). Unfortunately, I don't have an account in the JIRA, and I can't create a separate JIRA task for this. test("SPARK-41049: make to_csv function deterministic") {
... I think it might be useful, let me know if it is redundant. |
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
Can you sign up and create an account? The JIRA system allows anyone to register. |
I mailed to private@spark.apache.org to get an account created, about 2 weeks ago, but didn't get any response... |
@cloud-fan finally I've got my JIRA account and created a separate ticket for this. |
Are you getting a JIRA account by sending an email to private@spark.apache.org? |
What changes were proposed in this pull request?
This PR enhances
StructsToCsv
class withdoGenCode
function instead of extending it fromCodegenFallback
trait (performance improvement).Why are the changes needed?
It will improve performance.
Does this PR introduce any user-facing change?
No
How was this patch tested?
by existing tests