Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38916: [R] Simplify dataset and table print output #38917

Merged
merged 9 commits into from
Mar 13, 2024

Conversation

thisisnic
Copy link
Member

@thisisnic thisisnic commented Nov 28, 2023

Rationale for this change

When printing objects with data with lots of rows, the output is long and unwieldy.

What changes are included in this PR?

  • Truncates long schema print output and adds the number of columns to dataset print output.
  • Add number of columns to output so it's clear how many there are in total

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Before:

library(arrow)
x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> "u": string
#> "v": string
#> "w": string
#> "x": string
#> "y": string
#> "z": string
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>

After:

library(arrow)

x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> 26 columns 
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> ...
#> Use `schema()` to see entire schema
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema

Copy link

⚠️ GitHub issue #38916 has been automatically assigned in GitHub to PR creator.

@paleolimbot
Copy link
Member

I see this still has some tests failing...just making sure you weren't waiting on any of us for a review!

@thisisnic
Copy link
Member Author

I see this still has some tests failing...just making sure you weren't waiting on any of us for a review!

Nah, life has just been busy, but thanks for checking in on it. I'll ping you once I get time to take a look at it again!

@thisisnic
Copy link
Member Author

@paleolimbot This is now ready for review; mind giving this a look over when you get the chance?

r/R/schema.R Outdated Show resolved Hide resolved
print_schema_fields <- function(s, truncate = FALSE, max_fields = 20) {
if (truncate && length(s$fields) > max_fields) {
fields_out <- paste(map_chr(s$fields[1:max_fields], ~ .$ToString()), collapse = "\n")
fields_out <- paste0(fields_out, "\n...\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how many fields we're not showing here? Rather than ..., giving a count of hidden fields would be a nice improvement. I'm thinking about how pillar does it:

> starwars
# A tibble: 87 × 14
   name          height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles
   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
 1 Luke Skywalk…    172    77 blond      fair       blue            19   male  mascu… Tatooine  Human   <chr> <chr>   
 2 C-3PO            167    75 NA         gold       yellow         112   none  mascu… Tatooine  Droid   <chr> <chr>   
 3 R2-D2             96    32 NA         white, bl… red             33   none  mascu… Naboo     Droid   <chr> <chr>   
 4 Darth Vader      202   136 none       white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
 5 Leia Organa      150    49 brown      light      brown           19   fema… femin… Alderaan  Human   <chr> <chr>   
 6 Owen Lars        178   120 brown, gr… light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
 7 Beru Whitesu…    165    75 brown      light      blue            47   fema… femin… Tatooine  Human   <chr> <chr>   
 8 R5-D4             97    32 NA         white, red red             NA   none  mascu… Tatooine  Droid   <chr> <chr>   
 9 Biggs Darkli…    183    84 black      light      brown           24   male  mascu… Tatooine  Human   <chr> <chr>   
10 Obi-Wan Keno…    182    77 auburn, w… fair       blue-gray       57   male  mascu… Stewjon   Human   <chr> <chr>   
# ℹ 77 more rows
# ℹ 1 more variable: starships <list>
# ℹ Use `print(n = ...)` to see more rows

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

r/R/schema.R Outdated Show resolved Hide resolved
@amoeba
Copy link
Member

amoeba commented Mar 8, 2024

Thanks @thisisnic. I see this as a nice improvement and don't see any issues with it. I left a few notes.

Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I have a few minor comments, but nothing major.

I'm not at all opposed to merging this since it's an improvement, but I think #32110 will be the big win here for most end users.

r/R/record-batch-reader.R Outdated Show resolved Hide resolved
r/tests/testthat/test-schema.R Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Mar 9, 2024
@github-actions github-actions bot added Component: Documentation awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Mar 10, 2024
@thisisnic
Copy link
Member Author

Thanks @jonkeane, I agree we should take a look at #32110 soon, but merge this as an interim thing in the meantime.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 10, 2024
@thisisnic thisisnic merged commit ac1708c into apache:main Mar 13, 2024
10 checks passed
@thisisnic thisisnic removed the awaiting change review Awaiting change review label Mar 13, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit ac1708c.

There was 1 benchmark result with an error:

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 19 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Simplify dataset and table print output
4 participants