Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39722] [SQL] getString API for Dataset #40553

Closed
wants to merge 1 commit into from

Conversation

VindhyaG
Copy link

What changes were proposed in this pull request?

Add getString API for Dataset for usecases where string representation needs to be used other than stdout(println)
More details in the bug

Why are the changes needed?

But there are a lot of cases where we might need to get a String representation of the show output. For example

logging framework to which we need to push the representation of a df
send the string over a REST call from the driver
send the string to stderr instead of stdout

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Tested locally using spark-shell

@VindhyaG
Copy link
Author

@rxin @ScrapCodes @maropu can you please review . thanks!

Copy link
Contributor

@jaceklaskowski jaceklaskowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one tiny nit

* @param numRows
* Number of rows to show
* @group action
* @since 3.4.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why is this 3.4.0 while the latter changes are marked as 3.5.0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed changing version number . will change once i can figure out what is the current release cycle is on? Can you please point me where i can get that info .thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaceklaskowski updated to 3.5.0 based on recently(hours ago) closed bug fix version of SQL and Java API component

* will be truncated, and all cells will be aligned right.
*
* @group action
* @since 3.5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why is this 3.5.0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing @jaceklaskowski . I took the version from one of the recent bugs but i realised i took the affected version instead of fix version which is wrong. I am trying to get details on how to get current release version to add but not able to find any links i can be confident on .

@ScrapCodes
Copy link
Member

Hi @VindhyaG, this might be useful - may be we can benefit from the usecase you have for this. Is it just for logging?
Not sure what others think, it might be good to limit the API surface.

@VindhyaG
Copy link
Author

Hi @VindhyaG, this might be useful - may be we can benefit from the usecase you have for this. Is it just for logging? Not sure what others think, it might be good to limit the API surface.

@ScrapCodes thanks for reviewing. Bug lists three ways API could be used which i have added in PR description above. Do you have any suggestions to limit API surface?

@ScrapCodes
Copy link
Member

I see this as developer facing API, So just having

def getString(numRows: Int, truncate: Int): String =
     getString(numRows, truncate, vertical = false)

would suffice.

@ScrapCodes
Copy link
Member

Do you think, a more interesting way can be returning a JSON representation?

@VindhyaG
Copy link
Author

I see this as developer facing API, So just having

def getString(numRows: Int, truncate: Int): String =
     getString(numRows, truncate, vertical = false)

would suffice.

my intention was to keep it consistent with show() where if numrows and truncate have default values

@VindhyaG
Copy link
Author

VindhyaG commented Mar 30, 2023

Do you think, a more interesting way can be returning a JSON representation?
@ScrapCodes If use case is to send the it via rest api(as listed in use case above) JSON would make more sense but for logging i suppose string in tabular form is more useful? may be have a separate API for that ?

@VindhyaG
Copy link
Author

VindhyaG commented Apr 5, 2023

@ScrapCodes can you pls suggest how can we go ahead on this?
@jaceklaskowski have updated the versions. can you pls review again.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jul 15, 2023
@github-actions github-actions bot closed this Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants