Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Simplify dataset and table print output #38916

Closed
thisisnic opened this issue Nov 28, 2023 · 0 comments · Fixed by #38917
Closed

[R] Simplify dataset and table print output #38916

thisisnic opened this issue Nov 28, 2023 · 0 comments · Fixed by #38917

Comments

@thisisnic
Copy link
Member

Describe the enhancement requested

When we print a dataset, we get a short description of the dataset and then the full schema with one column on each line. This looks fine for datasets with few columns, but can grow unwieldy and messy. An example from a recent dataset I've been working with:

> pums_person
FileSystemDataset with 832 Parquet files
SPORDER: int32
RT: dictionary<values=string, indices=int32, ordered>
SERIALNO: string
PUMA: string
ST: string
ADJUST: int32
PWGTP: int32
AGEP: int32
CIT: dictionary<values=string, indices=int32, ordered>
COW: dictionary<values=string, indices=int32, ordered>
DDRS: dictionary<values=string, indices=int32, ordered>
DEYE: dictionary<values=string, indices=int32, ordered>
DOUT: dictionary<values=string, indices=int32, ordered>
DPHY: dictionary<values=string, indices=int32, ordered>
DREM: dictionary<values=string, indices=int32, ordered>
DWRK: dictionary<values=string, indices=int32, ordered>
ENG: dictionary<values=string, indices=int32, ordered>
FER: dictionary<values=string, indices=int32, ordered>
GCL: dictionary<values=string, indices=int32, ordered>
GCM: dictionary<values=string, indices=int32, ordered>
GCR: dictionary<values=string, indices=int32, ordered>
INTP: int32
JWMNP: int32
JWRIP: dictionary<values=string, indices=int32, ordered>
JWTR: dictionary<values=string, indices=int32, ordered>
LANX: dictionary<values=string, indices=int32, ordered>
MAR: dictionary<values=string, indices=int32, ordered>
MIG: dictionary<values=string, indices=int32, ordered>
MIL: dictionary<values=string, indices=int32, ordered>
MILY: dictionary<values=string, indices=int32, ordered>
MLPA: dictionary<values=string, indices=int32, ordered>
MLPB: dictionary<values=string, indices=int32, ordered>
MLPC: dictionary<values=string, indices=int32, ordered>
MLPD: dictionary<values=string, indices=int32, ordered>
MLPE: dictionary<values=string, indices=int32, ordered>
MLPF: dictionary<values=string, indices=int32, ordered>
MLPG: dictionary<values=string, indices=int32, ordered>
MLPH: dictionary<values=string, indices=int32, ordered>
MLPI: dictionary<values=string, indices=int32, ordered>
MLPJ: dictionary<values=string, indices=int32, ordered>
MLPK: dictionary<values=string, indices=int32, ordered>
NWAB: dictionary<values=string, indices=int32, ordered>
NWAV: dictionary<values=string, indices=int32, ordered>
NWLA: dictionary<values=string, indices=int32, ordered>
NWLK: dictionary<values=string, indices=int32, ordered>
NWRE: dictionary<values=string, indices=int32, ordered>
OIP: int32
PAP: int32
REL: string
RETP: int32
SCH: dictionary<values=string, indices=int32, ordered>
SCHG: string
SCHL: string
SEMP: int32
SEX: dictionary<values=string, indices=int32, ordered>
SSIP: int32
SSP: int32
WAGP: int32
WKHP: int32
WKL: dictionary<values=string, indices=int32, ordered>
WKW: dictionary<values=string, indices=int32, ordered>
YOEP: string
UWRK: dictionary<values=string, indices=int32, ordered>
ANC: dictionary<values=string, indices=int32, ordered>
ANC1P: string
ANC2P: string
DECADE: dictionary<values=string, indices=int32, ordered>
DRIVESP: dictionary<values=string, indices=int32, ordered>
DS: dictionary<values=string, indices=int32, ordered>
ESP: dictionary<values=string, indices=int32, ordered>
ESR: dictionary<values=string, indices=int32, ordered>
HISP: string
INDP: string
JWAP: string
JWDP: string
LANP: string
MIGPUMA: string
MIGSP: string
MSP: dictionary<values=string, indices=int32, ordered>
NAICSP: string
NATIVITY: dictionary<values=string, indices=int32, ordered>
OC: dictionary<values=string, indices=int32, ordered>
OCCP: string
PAOC: dictionary<values=string, indices=int32, ordered>
PERNP: int32
PINCP: int32
POBP: string
POVPIP: int32
POWPUMA: string
POWSP: string
QTRBIR: dictionary<values=string, indices=int32, ordered>
RAC1P: dictionary<values=string, indices=int32, ordered>
RAC2P: string
RAC3P: string
RACAIAN: dictionary<values=string, indices=int32, ordered>
RACASN: dictionary<values=string, indices=int32, ordered>
RACBLK: dictionary<values=string, indices=int32, ordered>
RACNHPI: dictionary<values=string, indices=int32, ordered>
RACNUM: int32
RACSOR: dictionary<values=string, indices=int32, ordered>
RACWHT: dictionary<values=string, indices=int32, ordered>
RC: dictionary<values=string, indices=int32, ordered>
SFN: dictionary<values=string, indices=int32, ordered>
SFR: dictionary<values=string, indices=int32, ordered>
SOCP: string
VPS: string
WAOB: dictionary<values=string, indices=int32, ordered>
FAGEP: dictionary<values=string, indices=int32, ordered>
FANCP: dictionary<values=string, indices=int32, ordered>
FCITP: dictionary<values=string, indices=int32, ordered>
FCOWP: dictionary<values=string, indices=int32, ordered>
FDDRSP: dictionary<values=string, indices=int32, ordered>
FDEYEP: dictionary<values=string, indices=int32, ordered>
FDOUTP: dictionary<values=string, indices=int32, ordered>
FDPHYP: dictionary<values=string, indices=int32, ordered>
FDREMP: dictionary<values=string, indices=int32, ordered>
FDWRKP: dictionary<values=string, indices=int32, ordered>
FENGP: dictionary<values=string, indices=int32, ordered>
FESRP: dictionary<values=string, indices=int32, ordered>
FFERP: dictionary<values=string, indices=int32, ordered>
FGCLP: dictionary<values=string, indices=int32, ordered>
FGCMP: dictionary<values=string, indices=int32, ordered>
FGCRP: dictionary<values=string, indices=int32, ordered>
FHISP: dictionary<values=string, indices=int32, ordered>
FINDP: dictionary<values=string, indices=int32, ordered>
FINTP: dictionary<values=string, indices=int32, ordered>
FJWDP: dictionary<values=string, indices=int32, ordered>
FJWMNP: dictionary<values=string, indices=int32, ordered>
FJWRIP: dictionary<values=string, indices=int32, ordered>
FJWTRP: dictionary<values=string, indices=int32, ordered>
FLANP: dictionary<values=string, indices=int32, ordered>
FLANXP: dictionary<values=string, indices=int32, ordered>
FMARP: dictionary<values=string, indices=int32, ordered>
FMIGP: dictionary<values=string, indices=int32, ordered>
FMIGSP: dictionary<values=string, indices=int32, ordered>
FMILPP: dictionary<values=string, indices=int32, ordered>
FMILSP: dictionary<values=string, indices=int32, ordered>
FMILYP: dictionary<values=string, indices=int32, ordered>
FOCCP: dictionary<values=string, indices=int32, ordered>
FOIP: dictionary<values=string, indices=int32, ordered>
FPAP: dictionary<values=string, indices=int32, ordered>
FPOBP: dictionary<values=string, indices=int32, ordered>
FPOWSP: dictionary<values=string, indices=int32, ordered>
FRACP: dictionary<values=string, indices=int32, ordered>
FRELP: dictionary<values=string, indices=int32, ordered>
FRETP: dictionary<values=string, indices=int32, ordered>
FSCHGP: dictionary<values=string, indices=int32, ordered>
FSCHLP: dictionary<values=string, indices=int32, ordered>
FSCHP: dictionary<values=string, indices=int32, ordered>
FSEMP: dictionary<values=string, indices=int32, ordered>
FSEXP: dictionary<values=string, indices=int32, ordered>
FSSIP: dictionary<values=string, indices=int32, ordered>
FSSP: dictionary<values=string, indices=int32, ordered>
FWAGP: dictionary<values=string, indices=int32, ordered>
FWKHP: dictionary<values=string, indices=int32, ordered>
FWKLP: dictionary<values=string, indices=int32, ordered>
FWKWP: dictionary<values=string, indices=int32, ordered>
FYOEP: dictionary<values=string, indices=int32, ordered>
PWGTP1: int32
PWGTP2: int32
PWGTP3: int32
PWGTP4: int32
PWGTP5: int32
PWGTP6: int32
PWGTP7: int32
PWGTP8: int32
PWGTP9: int32
PWGTP10: int32
PWGTP11: int32
PWGTP12: int32
PWGTP13: int32
PWGTP14: int32
PWGTP15: int32
PWGTP16: int32
PWGTP17: int32
PWGTP18: int32
PWGTP19: int32
PWGTP20: int32
PWGTP21: int32
PWGTP22: int32
PWGTP23: int32
PWGTP24: int32
PWGTP25: int32
PWGTP26: int32
PWGTP27: int32
PWGTP28: int32
PWGTP29: int32
PWGTP30: int32
PWGTP31: int32
PWGTP32: int32
PWGTP33: int32
PWGTP34: int32
PWGTP35: int32
PWGTP36: int32
PWGTP37: int32
PWGTP38: int32
PWGTP39: int32
PWGTP40: int32
PWGTP41: int32
PWGTP42: int32
PWGTP43: int32
PWGTP44: int32
PWGTP45: int32
PWGTP46: int32
PWGTP47: int32
PWGTP48: int32
PWGTP49: int32
PWGTP50: int32
PWGTP51: int32
PWGTP52: int32
PWGTP53: int32
PWGTP54: int32
PWGTP55: int32
PWGTP56: int32
PWGTP57: int32
PWGTP58: int32
PWGTP59: int32
PWGTP60: int32
PWGTP61: int32
PWGTP62: int32
PWGTP63: int32
PWGTP64: int32
PWGTP65: int32
PWGTP66: int32
PWGTP67: int32
PWGTP68: int32
PWGTP69: int32
PWGTP70: int32
PWGTP71: int32
PWGTP72: int32
PWGTP73: int32
PWGTP74: int32
PWGTP75: int32
PWGTP76: int32
PWGTP77: int32
PWGTP78: int32
PWGTP79: int32
PWGTP80: int32
NOP: dictionary<values=string, indices=int32, ordered>
ADJINC: double
CITWP: string
DEAR: dictionary<values=string, indices=int32, ordered>
DRAT: dictionary<values=string, indices=int32, ordered>
DRATX: dictionary<values=string, indices=int32, ordered>
HINS1: dictionary<values=string, indices=int32, ordered>
HINS2: dictionary<values=string, indices=int32, ordered>
HINS3: dictionary<values=string, indices=int32, ordered>
HINS4: dictionary<values=string, indices=int32, ordered>
HINS5: dictionary<values=string, indices=int32, ordered>
HINS6: dictionary<values=string, indices=int32, ordered>
HINS7: dictionary<values=string, indices=int32, ordered>
MARHD: dictionary<values=string, indices=int32, ordered>
MARHM: dictionary<values=string, indices=int32, ordered>
MARHT: dictionary<values=string, indices=int32, ordered>
MARHW: dictionary<values=string, indices=int32, ordered>
MARHYP: string
DIS: dictionary<values=string, indices=int32, ordered>
HICOV: dictionary<values=string, indices=int32, ordered>
PRIVCOV: dictionary<values=string, indices=int32, ordered>
PUBCOV: dictionary<values=string, indices=int32, ordered>
FCITWP: dictionary<values=string, indices=int32, ordered>
FDEARP: dictionary<values=string, indices=int32, ordered>
FDRATP: dictionary<values=string, indices=int32, ordered>
FDRATXP: dictionary<values=string, indices=int32, ordered>
FHINS1P: dictionary<values=string, indices=int32, ordered>
FHINS2P: dictionary<values=string, indices=int32, ordered>
FHINS3P: dictionary<values=string, indices=int32, ordered>
FHINS4P: dictionary<values=string, indices=int32, ordered>
FHINS5P: dictionary<values=string, indices=int32, ordered>
FHINS6P: dictionary<values=string, indices=int32, ordered>
FHINS7P: dictionary<values=string, indices=int32, ordered>
FMARHDP: dictionary<values=string, indices=int32, ordered>
FMARHMP: dictionary<values=string, indices=int32, ordered>
FMARHTP: dictionary<values=string, indices=int32, ordered>
FMARHWP: dictionary<values=string, indices=int32, ordered>
FMARHYP: dictionary<values=string, indices=int32, ordered>
WRK: dictionary<values=string, indices=int32, ordered>
FOD1P: string
FOD2P: string
SCIENGP: dictionary<values=string, indices=int32, ordered>
SCIENGRLP: dictionary<values=string, indices=int32, ordered>
FFODP: dictionary<values=string, indices=int32, ordered>
FHINS3C: dictionary<values=string, indices=int32, ordered>
FHINS4C: dictionary<values=string, indices=int32, ordered>
FHINS5C: dictionary<values=string, indices=int32, ordered>
RELP: string
FWRKP: dictionary<values=string, indices=int32, ordered>
FDISP: dictionary<values=string, indices=int32, ordered>
FPERNP: dictionary<values=string, indices=int32, ordered>
FPINCP: dictionary<values=string, indices=int32, ordered>
FPRIVCOVP: dictionary<values=string, indices=int32, ordered>
FPUBCOVP: dictionary<values=string, indices=int32, ordered>
RACNH: dictionary<values=string, indices=int32, ordered>
RACPI: dictionary<values=string, indices=int32, ordered>
SSPA: dictionary<values=string, indices=int32, ordered>
MLPCD: dictionary<values=string, indices=int32, ordered>
MLPFG: dictionary<values=string, indices=int32, ordered>
FHICOVP: dictionary<values=string, indices=int32, ordered>
DIVISION: dictionary<values=string, indices=int32, ordered>
REGION: dictionary<values=string, indices=int32, ordered>
HIMRKS: dictionary<values=string, indices=int32, ordered>
JWTRNS: dictionary<values=string, indices=int32, ordered>
RELSHIPP: string
WKWN: int32
FHIMRKSP: dictionary<values=string, indices=int32, ordered>
FJWTRNSP: dictionary<values=string, indices=int32, ordered>
FRELSHIPP: dictionary<values=string, indices=int32, ordered>
FWKWNP: dictionary<values=string, indices=int32, ordered>
MLPIK: dictionary<values=string, indices=int32, ordered>
year: int32
location: string

We could do something like the tibble preview, with instructions to call schema() to view the full schema. The tibble preview for the dataset, for contrast:

> head(pums_person, 0) %>% collect()
# A tibble: 0 × 311
# ℹ 311 variables: SPORDER <int>, RT <ord>, SERIALNO <chr>, PUMA <chr>, ST <chr>, ADJUST <int>, PWGTP <int>, AGEP <int>,
#   CIT <ord>, COW <ord>, DDRS <ord>, DEYE <ord>, DOUT <ord>, DPHY <ord>, DREM <ord>, DWRK <ord>, ENG <ord>, FER <ord>,
#   GCL <ord>, GCM <ord>, GCR <ord>, INTP <int>, JWMNP <int>, JWRIP <ord>, JWTR <ord>, LANX <ord>, MAR <ord>, MIG <ord>,
#   MIL <ord>, MILY <ord>, MLPA <ord>, MLPB <ord>, MLPC <ord>, MLPD <ord>, MLPE <ord>, MLPF <ord>, MLPG <ord>, MLPH <ord>,
#   MLPI <ord>, MLPJ <ord>, MLPK <ord>, NWAB <ord>, NWAV <ord>, NWLA <ord>, NWLK <ord>, NWRE <ord>, OIP <int>, PAP <int>,
#   REL <chr>, RETP <int>, SCH <ord>, SCHG <chr>, SCHL <chr>, SEMP <int>, SEX <ord>, SSIP <int>, SSP <int>, WAGP <int>,
#   WKHP <int>, WKL <ord>, WKW <ord>, YOEP <chr>, UWRK <ord>, ANC <ord>, ANC1P <chr>, ANC2P <chr>, DECADE <ord>, …
# ℹ Use `colnames()` to see all variable names

Component(s)

R

@thisisnic thisisnic changed the title [R] Simplify dataset print output [R] Simplify dataset and table print output Nov 28, 2023
thisisnic added a commit that referenced this issue Mar 13, 2024
### Rationale for this change

When printing objects with data with lots of rows, the output is long and unwieldy.

### What changes are included in this PR?

* Truncates long schema print output and adds the number of columns to dataset print output.
* Add number of columns to output so it's clear how many there are in total

### Are these changes tested?

Yes

### Are there any user-facing changes?

Yes

Before:

``` r
library(arrow)
x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> "u": string
#> "v": string
#> "w": string
#> "x": string
#> "y": string
#> "z": string
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>
```

After:

``` r
library(arrow)

x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> 26 columns 
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> ...
#> Use `schema()` to see entire schema
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema
```

* Closes: #38916

Lead-authored-by: Nic Crane <thisisnic@gmail.com>
Co-authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
@thisisnic thisisnic added this to the 16.0.0 milestone Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant