Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented May 1, 2025

What changes were proposed in this pull request?

This PR aims to improve DataFrame.collect to return the original values.

Note that this PR provides simple value types first. More types like Decimal will be added later.

Why are the changes needed?

The initial implementation has a limitation to return rows of String values.

Does this PR introduce any user-facing change?

No, because there is no released versions yet.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @viirya ? Now, DataFrame.collect returned Row with the original values instead of String.

case ArrowType.ArrowBinary:
values.append((array as! AsString).asString(i).utf8)
case .complexInfo(.strct):
values.append((array as! AsString).asString(i))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support nested types for now, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all complex types are still under development.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . Merged to main.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-51971 branch May 1, 2025 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants