Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrettyPrint Improvements #33527

Open
asfimport opened this issue Nov 17, 2022 · 3 comments
Open

PrettyPrint Improvements #33527

asfimport opened this issue Nov 17, 2022 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Nov 17, 2022

We have some pretty printing capabilities, but we may want to think at a high level about the design first.

Reporter: Will Jones / @wjones127

Related issues:

Note: This issue was originally created as ARROW-18359. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Also linking ARROW-14799, as that is another high-level potential change for Tables/RecordBatches

@asfimport
Copy link
Collaborator Author

Dewey Dunnington / @paleolimbot:
I'm not sure if this is covered by one of the subtasks, but really huge binary arrays take forever to print...I am guessing because it tries to convert the entire binary array to a string before selecting the few characters that will actually be shown:

library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp

really_big_raw <- raw(1e9)
really_big_binary <- Array$create(list(really_big_raw), type = binary())
system.time(really_big_binary$ToString())
#>    user  system elapsed 
#>  12.396   1.660  14.269

(I ran into that one because the current encoding for geospatial data in Parquet files is binary() and the elements can be huge)

@asfimport
Copy link
Collaborator Author

David Li / @lidavidm:
I'd say it's sorta related to ARROW-4099 but not quite the same, and worth a new subtask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant