New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various enhancements to print.data.table #1523

Open
MichaelChirico opened this Issue Feb 6, 2016 · 48 comments

Comments

Projects
None yet
7 participants
@MichaelChirico
Contributor

MichaelChirico commented Feb 6, 2016

Current task list:

  • 1. Add .Rd file for print.data.table
  • 2. Ability to turn off row numbers [1) from #645/R-F#1957 - Yike Lu; handled in this commit, Nov. 12, 2013]
  • 3. Ability to turn off smart table wrapping [2) from #645/R-F#1957 - Yike Lu]
  • 4. Ability to force-print all entries [3) from #645/R-F#1957 - Yike Lu; handled in this commit, Sep. 14, 2012]
  • 5. Ability to demarcate by-groupings [4) from #645/R-F#1957 - Yike Lu]
  • 6. Demarcation of table border [part of 5) from #645/R-F#1957 - Yike Lu]
  • 7. Demarcation of key columns [part of 5) from #645/R-F#1957 - Yike Lu]
  • 8. Fungible option for whether row numbers are printed [#1097 - @smcinerney]
  • 9. Options for whether/which registers of column names to print [#1482 - Oleg Bondar on SO]
  • 10. Option for dplyr-like printing [see below - @MichaelChirico]
  • 11. Facilities for compact glance at data a la dplyr tbl_df [#1497 - @nverno; #2608 - @vlulla]
  • 12. Option for specifying a truncation character [#1374 - @jangorecki]
  • 13. Handling of empty-named data.table [#545/R-F#5253 - @arunsrinivasan]
  • 14. Improve printing of list/non-atomic columns [see below - @franknarf1 via SO; also #605; handled in #2562]
  • 15. POSIXct columns with timezones should include that information in printed output [#2842 - @MichaelChirico]

Some Notes

3 (tabled pending clarification)

As I understand it, this issue is a request to prevent the console output from wrapping around (i.e., to force all columns to appear parallel, regardless of how wide the table is).

If that's the case, this is (AFAICT) impossible, since that's something done by RStudio/R itself. I for one certainly don't know of any way to alter this behavior.

If someone does know of a way to affect this, or if they think I'm mis-interpreting, please pipe up and we can have this taken care of.

7

As I see it there are two options here. One is to treat all key columns the same; the other is to treat secondary, tertiary, etc. keys separately.

Example output:

set.seed(01394)
DT <- data.table(key1 = rep(c("A","B"), each = 4),
                 key2 = rep(c("a","b"), 4),
                 V1 = nrorm(8), key = c("key1","key2"))

# Only demarcate key columns
DT
#    | key1 | | key2 |         V1
#1: |    A | |    a |  0.5994579
#2: |    A | |    a | -1.0898775
#3: |    A | |    b | -0.2285326
#4: |    A | |    b | -1.7858472
#5: |    B | |    a | -0.6269875
#6: |    B | |    a | -0.6633084
#7: |    B | |    b |  1.0367084
#8: |    B | |    b |  0.7364276

# Separately "emboss" keys based on key order
DT
#    | key1 | || key2 ||         V1
#1: |    A | ||    a ||  0.5994579
#2: |    A | ||    a || -1.0898775
#3: |    A | ||    b || -0.2285326
#4: |    A | ||    b || -1.7858472
#5: |    B | ||    a || -0.6269875
#6: |    B | ||    a || -0.6633084
#7: |    B | ||    b ||  1.0367084
#8: |    B | ||    b ||  0.7364276

And of course, add an option for deciding whether to demarcate with | or some other user's-choice character (*, +, etc.)

9 [DONE]

Some feedback from a closed PR that was a first stab at solving this:

From Arun regarding preferred options:

col.names = c("auto", "top", "none")

"auto": current behaviour

"top": only on top, data.frame-like

"none": no column names -- exclude rows in which column names would have been printed.

10 [DONE]

It would be nice to have an option to print a row under the row of column names which gives each column's stored type, as is currently (I understand) the default for the output of dplyr operations.

Example from dplyr:

library(dplyr)
DF <- data.frame(n = numeric(1), c1 = complex(1), i = integer(1),
                 f = factor(1), D = as.Date("2016-02-06"), c2 = character(1),
                 stringsAsFactors = FALSE)
tbl_df(DF)
# Source: local data frame [1 x 6]
#
#       n     c1     i      f          D    c2
#   (dbl) (cmpl) (int) (fctr)     (date) (chr) # <- this row
#1     0   0+0i     0      1 2016-02-06      

Current best alternative is to do sapply(DF, class), but it's nice to have a preview of the data wit this extra information.

11

This seems closely related to 3. Current plan is to implement this as an alternative to 3 since it seems more tangible/doable.

Via @nverno:

Would it be useful for head.data.table to have an option to print only the head of columns that fit the screen width, and summarise the rest? I was imagining something like the printed output from the head of a tbl_df in dplyr. I think it is nice for tables with many columns.

and the guiding example from Arun:

require(data.table)
dt = setDT(lapply(1:100, function(x) 1:3))
dt
dplyr::tbl_dt(dt)

12

Currently covered by @jangorecki's PR #1448; Jan, assuming #1529 is merged first, could you edit the print.data.table man page for your PR?

@MichaelChirico MichaelChirico changed the title from FR: option for dplyr-like data.table printing to Various enhancements to print.data.table Feb 8, 2016

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Feb 8, 2016

Member

Just brilliant!

Member

arunsrinivasan commented Feb 8, 2016

Just brilliant!

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Feb 8, 2016

Member

No idea about 3 and 5 (as to what they mean).
I think a PR for 6 would be nice (seems straightforward from what Jan wrote there). Perhaps ?print.data.table is the time consuming part? Do you think you'd be up for this, @MichaelChirico ?
No idea as to what 7 means either..
8 is another great idea. PR would be great!

Member

arunsrinivasan commented Feb 8, 2016

No idea about 3 and 5 (as to what they mean).
I think a PR for 6 would be nice (seems straightforward from what Jan wrote there). Perhaps ?print.data.table is the time consuming part? Do you think you'd be up for this, @MichaelChirico ?
No idea as to what 7 means either..
8 is another great idea. PR would be great!

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Feb 8, 2016

Member

It'd be really nice if Github would allow assigning tasks to project who aren't necessarily members :-(.

Member

arunsrinivasan commented Feb 8, 2016

It'd be really nice if Github would allow assigning tasks to project who aren't necessarily members :-(.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Feb 8, 2016

Member

There's also #1497

Member

arunsrinivasan commented Feb 8, 2016

There's also #1497

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Feb 9, 2016

Contributor

@arunsrinivasan should I try and PR this one issue at a time? Or in a fell swoop? I've got 8 basically taken care of, just need to add tests.

Contributor

MichaelChirico commented Feb 9, 2016

@arunsrinivasan should I try and PR this one issue at a time? Or in a fell swoop? I've got 8 basically taken care of, just need to add tests.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Feb 9, 2016

Member

Michael, separate PRs.

Member

arunsrinivasan commented Feb 9, 2016

Michael, separate PRs.

@nverno

This comment has been minimized.

Show comment
Hide comment
@nverno

nverno Feb 10, 2016

Very nice! Sorry to get back to you late on this, but Arun provided a nice example. It is just a nice convenience when interactively looking at tables with lots columns so your console isn't engulfed by a huge data dump when you take a look at the head. Ill close that other one.

nverno commented Feb 10, 2016

Very nice! Sorry to get back to you late on this, but Arun provided a nice example. It is just a nice convenience when interactively looking at tables with lots columns so your console isn't engulfed by a huge data dump when you take a look at the head. Ill close that other one.

arunsrinivasan added a commit that referenced this issue Mar 4, 2016

Merge pull request #1529 from MichaelChirico/print.data.table
#1523 progress: adds option for dplyr-inspired column class summary with printing

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 4, 2016

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 4, 2016

Closes #1097 (progress towards #1523), creates option for printing ro…
…w names (TRUE by default)

adding notes about default

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 5, 2016

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 5, 2016

Closes #1097 (progress towards #1523), creates option for printing ro…
…wnames

adding test

whoops set wrong default

fixing test

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 6, 2016

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 6, 2016

arunsrinivasan added a commit that referenced this issue Mar 6, 2016

Merge pull request #1570 from MichaelChirico/print_rownames
Closes #1097 (progress towards #1523), creates option for printing row names

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 6, 2016

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 6, 2016

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

It'd be also nice to print:

primary key:
secondary indices: , etc..
<data.table>

by default. It's definitely informative to know what the keys and secondary indices are..

Member

arunsrinivasan commented Mar 9, 2016

It'd be also nice to print:

primary key:
secondary indices: , etc..
<data.table>

by default. It's definitely informative to know what the keys and secondary indices are..

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

Also, I think this is better output for:

print(DT, class=TRUE)
   <char> <int> <num>
     site  date     x
1:      A     1    10
2:      A     2    20
3:      A     3    30
4:      B     1    10
5:      B     2    20
6:      B     3    30

It's easier to copy/paste the data.table without the classes in the way. If we can do that, we can turn on printing classes by default.

Thoughts?

Member

arunsrinivasan commented Mar 9, 2016

Also, I think this is better output for:

print(DT, class=TRUE)
   <char> <int> <num>
     site  date     x
1:      A     1    10
2:      A     2    20
3:      A     3    30
4:      B     1    10
5:      B     2    20
6:      B     3    30

It's easier to copy/paste the data.table without the classes in the way. If we can do that, we can turn on printing classes by default.

Thoughts?

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 9, 2016

Contributor

@arunsrinivasan about printing keys:

  • Isn't that the point of tables()? (though TBH I almost never use this function) BTW tables, to the extent that it's useful, could go for an update to add a secondary_indices column...
  • You don't consider this subsumed by point # 7 here? See this chat (interrupted in the middle) b/w Frank and I about possibilities for filling # 7. Or perhaps you'd like to replace point # 7 with your idea. What do you think?

About class:

This can be done, but will require a step of wrangling -- basically toprint <- rbind(rownames(toprint), toprint); rownames(toprint) <- abbs. Which is fine, I'm just curious why you're thinking of easier copy-pasting as a clear advantage? Not sure the cost of including class info in copy-pasted output. Happy to hear feedback.

Contributor

MichaelChirico commented Mar 9, 2016

@arunsrinivasan about printing keys:

  • Isn't that the point of tables()? (though TBH I almost never use this function) BTW tables, to the extent that it's useful, could go for an update to add a secondary_indices column...
  • You don't consider this subsumed by point # 7 here? See this chat (interrupted in the middle) b/w Frank and I about possibilities for filling # 7. Or perhaps you'd like to replace point # 7 with your idea. What do you think?

About class:

This can be done, but will require a step of wrangling -- basically toprint <- rbind(rownames(toprint), toprint); rownames(toprint) <- abbs. Which is fine, I'm just curious why you're thinking of easier copy-pasting as a clear advantage? Not sure the cost of including class info in copy-pasted output. Happy to hear feedback.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

About class: -- copy pasting from SO, for example to provide input to fread(). I also find it easier without the separation between column name and value (just used to seeing it).

On printing keys:

  • Yes, but it gives it for all tables, which is useful in itself. But if I'd like to see just the keys retained after a join operation, I don't necessarily want to have a look at all the tables' key.
  • I don't think point 7 (drawing lines) would work well.. since it can not (AFAICT) tell the order of key columns.. But stating:

primary key: <a, b>

clearly tells the first key column is "a", then "b"..

Does this clarify things a bit?

Member

arunsrinivasan commented Mar 9, 2016

About class: -- copy pasting from SO, for example to provide input to fread(). I also find it easier without the separation between column name and value (just used to seeing it).

On printing keys:

  • Yes, but it gives it for all tables, which is useful in itself. But if I'd like to see just the keys retained after a join operation, I don't necessarily want to have a look at all the tables' key.
  • I don't think point 7 (drawing lines) would work well.. since it can not (AFAICT) tell the order of key columns.. But stating:

primary key: <a, b>

clearly tells the first key column is "a", then "b"..

Does this clarify things a bit?

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

I agree tables() could use an update.

Member

arunsrinivasan commented Mar 9, 2016

I agree tables() could use an update.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 9, 2016

Contributor

@arunsrinivasan OK, I think I can get on board with that. Can ditch point # 7 then. I agree distinguishing key order at a glance was going to be tough. So how about:

  • If a table has a key, say c("key1", "key2"), print the following above the output of print.data.table:

    keys: key1, key2
    
  • If there is no key, print:

    keys: <unkeyed>
    
  • Secondary index printing is optional, but if activated will come below keys a la:

    Secondary indices: key2.1, key2.2, ...
                       key3.1, key3.2, ...
    

Lastly, I propose sending this output through message to help distinguish it from the data.table itself visually.

Contributor

MichaelChirico commented Mar 9, 2016

@arunsrinivasan OK, I think I can get on board with that. Can ditch point # 7 then. I agree distinguishing key order at a glance was going to be tough. So how about:

  • If a table has a key, say c("key1", "key2"), print the following above the output of print.data.table:

    keys: key1, key2
    
  • If there is no key, print:

    keys: <unkeyed>
    
  • Secondary index printing is optional, but if activated will come below keys a la:

    Secondary indices: key2.1, key2.2, ...
                       key3.1, key3.2, ...
    

Lastly, I propose sending this output through message to help distinguish it from the data.table itself visually.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

My suggestion would be this:

  1. If either of these attributes are not present, don't print them. I think people will quickly learn that no keys are set (if it isn't displayed).
  2. Since there can be more than 1 secondary index, I suggest the format be:

Keys: <col1, col2> (only one)
Secondary Indices: , , <col1, col2>, ...
If there are more than 'x' (=5 to begin with?) indices, use a "...". They can always access it using key2().

I don't mind "<>" being replaced with "" if that'd be more aesthetically pleasing.. e.g., "col1,col2", "col1" etc..

Last proposal: seems nice, but I wonder if it might create issues wth knitr when people suppress 'messages' in chunk.. and print the output?

Member

arunsrinivasan commented Mar 9, 2016

My suggestion would be this:

  1. If either of these attributes are not present, don't print them. I think people will quickly learn that no keys are set (if it isn't displayed).
  2. Since there can be more than 1 secondary index, I suggest the format be:

Keys: <col1, col2> (only one)
Secondary Indices: , , <col1, col2>, ...
If there are more than 'x' (=5 to begin with?) indices, use a "...". They can always access it using key2().

I don't mind "<>" being replaced with "" if that'd be more aesthetically pleasing.. e.g., "col1,col2", "col1" etc..

Last proposal: seems nice, but I wonder if it might create issues wth knitr when people suppress 'messages' in chunk.. and print the output?

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

It'd be great to have this and class=TRUE default for v1.9.8 already.. we'll see.

Member

arunsrinivasan commented Mar 9, 2016

It'd be great to have this and class=TRUE default for v1.9.8 already.. we'll see.

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

One other thought:

Many people use "numeric" type when an integer type would suffice, and when "integer64" would fit the bill better. How about marking those columns somehow while printing?

instead of , perhaps >num< ?? that'll allow people to be aware of such optimisations as well..

Member

arunsrinivasan commented Mar 9, 2016

One other thought:

Many people use "numeric" type when an integer type would suffice, and when "integer64" would fit the bill better. How about marking those columns somehow while printing?

instead of , perhaps >num< ?? that'll allow people to be aware of such optimisations as well..

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

OR "!num!"? There's a function isReallyReal that checks this. But this'll perhaps be too time consuming to run on all rows every time..

Member

arunsrinivasan commented Mar 9, 2016

OR "!num!"? There's a function isReallyReal that checks this. But this'll perhaps be too time consuming to run on all rows every time..

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 9, 2016

Contributor

@arunsrinivasan Hmm I think it's definitely not something to be used as a part of print.data.table default.

Some initial musings:

  • Could add an option to do so, and a companion function (check_num_cols or the like) which runs this on an input table and spits out the candidate columns.
  • Could do this the first time only -- have some sort of global variable associated with each data.table in memory which we use to trigger the evaluation
  • Could have this as part of the standard (or verbose) output of fread (since I imagine that's where most data.tables are created in general. I guess setDT is the other big source.

Are you thinking of pushing 1.9.8 soon?

Oh, one more thing, what do you think about porting print.data.table to its own .R file?

Contributor

MichaelChirico commented Mar 9, 2016

@arunsrinivasan Hmm I think it's definitely not something to be used as a part of print.data.table default.

Some initial musings:

  • Could add an option to do so, and a companion function (check_num_cols or the like) which runs this on an input table and spits out the candidate columns.
  • Could do this the first time only -- have some sort of global variable associated with each data.table in memory which we use to trigger the evaluation
  • Could have this as part of the standard (or verbose) output of fread (since I imagine that's where most data.tables are created in general. I guess setDT is the other big source.

Are you thinking of pushing 1.9.8 soon?

Oh, one more thing, what do you think about porting print.data.table to its own .R file?

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 9, 2016

Member

Hm, yes, let's forget the marking of columns for now.

On pushing 1.9.8: trying as much as possible to wrap the other issues marked as quick as possible. I'd like to work on non-equi joins for this release.

On print.data.table to separate file, sure, sounds good.

Member

arunsrinivasan commented Mar 9, 2016

Hm, yes, let's forget the marking of columns for now.

On pushing 1.9.8: trying as much as possible to wrap the other issues marked as quick as possible. I'd like to work on non-equi joins for this release.

On print.data.table to separate file, sure, sounds good.

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 10, 2016

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 10, 2016

Contributor

@arunsrinivasan just a heads up that setting class = TRUE as the default is causing 100s of errors in the tests

Contributor

MichaelChirico commented Mar 10, 2016

@arunsrinivasan just a heads up that setting class = TRUE as the default is causing 100s of errors in the tests

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 10, 2016

Member

Okay thanks, will take a look.

Member

arunsrinivasan commented Mar 10, 2016

Okay thanks, will take a look.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 10, 2016

Contributor

@arunsrinivasan nvm, on second glance, it's a lot, but manageable. Have to fix ~ 25 tests. Working now...

Contributor

MichaelChirico commented Mar 10, 2016

@arunsrinivasan nvm, on second glance, it's a lot, but manageable. Have to fix ~ 25 tests. Working now...

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 10, 2016

Member

@MichaelChirico can we make the 'keys' argument FALSE for this release? Perhaps we can turn it on in the next one seeing how this one goes.

Member

arunsrinivasan commented Mar 10, 2016

@MichaelChirico can we make the 'keys' argument FALSE for this release? Perhaps we can turn it on in the next one seeing how this one goes.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 10, 2016

Contributor

@arunsrinivasan sure. Will handle this after we iron out the update to class.

I agree with Frank that having it by default may be somewhat information overload... perhaps there's a middle ground (only print class if there's been a change in class for some column, e.g.).

Anyway happy to give setting class = TRUE as default a whirl.

Contributor

MichaelChirico commented Mar 10, 2016

@arunsrinivasan sure. Will handle this after we iron out the update to class.

I agree with Frank that having it by default may be somewhat information overload... perhaps there's a middle ground (only print class if there's been a change in class for some column, e.g.).

Anyway happy to give setting class = TRUE as default a whirl.

@jangorecki

This comment has been minimized.

Show comment
Hide comment
@jangorecki

jangorecki Mar 11, 2016

Member

Do we have any script that can be run to check packages that depends on data.table? Asking because potentially any package that tests output with Rout - Rout.save (or capture.output - I have 2 such non-CRAN pkgs) could be broken after changing default print. It is valuable to run such tests before and after to see the impact precisely. Then depending on the percentage of affected CRAN package would be best to decide.

Member

jangorecki commented Mar 11, 2016

Do we have any script that can be run to check packages that depends on data.table? Asking because potentially any package that tests output with Rout - Rout.save (or capture.output - I have 2 such non-CRAN pkgs) could be broken after changing default print. It is valuable to run such tests before and after to see the impact precisely. Then depending on the percentage of affected CRAN package would be best to decide.

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 13, 2016

updating class argument to print.data.table, #1523
whoops

whoops2

fix

Closes #1442. Added setindex, and warning to set2key. Will deprecate in next release.

More replacements of set2key to setindex.

fixing tests

fixing more tests

fix remaining tests

reverting print.data.table add

quick fix head print

fixing tests

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 13, 2016

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 13, 2016

print.data.table gains print.keys argument, #1523
adding tests, option to onload
@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan Mar 14, 2016

Member

@jangorecki, good point. class=FALSE then for now. I'll come back to these issues later. Not important for now.

Member

arunsrinivasan commented Mar 14, 2016

@jangorecki, good point. class=FALSE then for now. I'll come back to these issues later. Not important for now.

arunsrinivasan added a commit that referenced this issue Mar 14, 2016

Merge branch 'print_null' of https://github.com/MichaelChirico/data.t…
…able into MichaelChirico-print_null

* 'print_null' of https://github.com/MichaelChirico/data.table:
  Closes #545, with progress towards #1523 -- print handles blank-named tables properly

# Conflicts:
#	R/data.table.R

arunsrinivasan added a commit that referenced this issue Mar 14, 2016

Merge branch 'MichaelChirico-print_null'
* MichaelChirico-print_null:
  Closes #545, with progress towards #1523 -- print handles blank-named tables properly

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Mar 14, 2016

@jangorecki

This comment has been minimized.

Show comment
Hide comment
@jangorecki

jangorecki Mar 22, 2016

Member

Any plans for minimalistic version of print key with * star prefix? or other nice ascii symbol? something like:

setkey(DT, site, date)
options("datatable.key.note"=TRUE)
print(DT)
#    *site *date     x
#1:      A     1    10
#2:      A     2    20

It would be my preferred one.

Member

jangorecki commented Mar 22, 2016

Any plans for minimalistic version of print key with * star prefix? or other nice ascii symbol? something like:

setkey(DT, site, date)
options("datatable.key.note"=TRUE)
print(DT)
#    *site *date     x
#1:      A     1    10
#2:      A     2    20

It would be my preferred one.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Mar 23, 2016

Contributor

@jangorecki I'm fine with any way, but the resistance that cropped up with an approach like that is some people preferred to see key order as well, e.g.:

#    *site **date     x

In any case, if implemented, I would: set * as the default, and leave an option for making it whatever you want.

Contributor

MichaelChirico commented Mar 23, 2016

@jangorecki I'm fine with any way, but the resistance that cropped up with an approach like that is some people preferred to see key order as well, e.g.:

#    *site **date     x

In any case, if implemented, I would: set * as the default, and leave an option for making it whatever you want.

@jangorecki

This comment has been minimized.

Show comment
Hide comment
@jangorecki

jangorecki Mar 23, 2016

Member

@MichaelChirico On one hand multiple starts are OK but if you would have on 20 columns in key? Maybe single star only if the order of key columns is the same as data columns, for me that would be in ~99% cases.

up to 3 elements there are ascii numbers:

#    ¹*site ²*date     x
Member

jangorecki commented Mar 23, 2016

@MichaelChirico On one hand multiple starts are OK but if you would have on 20 columns in key? Maybe single star only if the order of key columns is the same as data columns, for me that would be in ~99% cases.

up to 3 elements there are ascii numbers:

#    ¹*site ²*date     x
@mbacou

This comment has been minimized.

Show comment
Hide comment
@mbacou

mbacou Jul 2, 2016

@MichaelChirico about 3) above, one can use R global options:

width.user <- options("width")
options(width=as.integer(howWideIsDT)) # temporarily resize the output console
print(DT)
options(width=width.user) # reset to user's preferences

mbacou commented Jul 2, 2016

@MichaelChirico about 3) above, one can use R global options:

width.user <- options("width")
options(width=as.integer(howWideIsDT)) # temporarily resize the output console
print(DT)
options(width=width.user) # reset to user's preferences
@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Jul 13, 2016

Contributor

@mbacou thanks for the input!

In RStudio, at least, I don't see a difference in output having done that.

Contributor

MichaelChirico commented Jul 13, 2016

@mbacou thanks for the input!

In RStudio, at least, I don't see a difference in output having done that.

@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Jul 13, 2016

@MichaelChirico You should see a difference. Try

library(data.table)
options(width=500)
(DT = data.table(matrix(1:1e3,1)))

RStudio wraps console output and offers no option to disable this "feature"; while base R console overflows with no wrapping until options()$width. Either way you should see a difference. Try resizing your console window to see the wrapping in action.

franknarf1 commented Jul 13, 2016

@MichaelChirico You should see a difference. Try

library(data.table)
options(width=500)
(DT = data.table(matrix(1:1e3,1)))

RStudio wraps console output and offers no option to disable this "feature"; while base R console overflows with no wrapping until options()$width. Either way you should see a difference. Try resizing your console window to see the wrapping in action.

@mbacou

This comment has been minimized.

Show comment
Hide comment
@mbacou

mbacou Jul 30, 2016

Might be useful to add an optional format argument similar to knitr::kable() or type in ascii::print() to generate markdown, pandoc, rst, textile, (etc.) and org-mode compatible table formats?

I often use snippets like these to paste results into e-mails and org or markdown documents:

print(ascii(x, digits=2), type="org")
# |   | ISO3 | ADM0_NAME                   | ELEVATION     | whea_h   |
# |---+------+-----------------------------+---------------+----------|
# | 1 | TZA  | United Republic of Tanzania |               | 19.00    |
# | 2 | TZA  | United Republic of Tanzania | (3e+02,5e+02] | 0.00     |
# | 3 | TZA  | United Republic of Tanzania | (5e+02,9e+02] | 743.00   |
# | 4 | TZA  | United Republic of Tanzania | (9e+02,1e+03] | 9519.00  |
# | 5 | TZA  | United Republic of Tanzania | (1e+03,2e+03] | 29814.00 |
# | 6 | TZA  | United Republic of Tanzania | (2e+03,5e+03] | 894.00   |

knitr::kable(x, format="markdown")
# |ISO3 |ADM0_NAME                   |ELEVATION     | whea_h|
# |:----|:---------------------------|:-------------|------:|
# |TZA  |United Republic of Tanzania |NA            |     19|
# |TZA  |United Republic of Tanzania |(3e+02,5e+02] |      0|
# |TZA  |United Republic of Tanzania |(5e+02,9e+02] |    743|
# |TZA  |United Republic of Tanzania |(9e+02,1e+03] |   9519|
# |TZA  |United Republic of Tanzania |(1e+03,2e+03] |  29814|
# |TZA  |United Republic of Tanzania |(2e+03,5e+03] |    894|

mbacou commented Jul 30, 2016

Might be useful to add an optional format argument similar to knitr::kable() or type in ascii::print() to generate markdown, pandoc, rst, textile, (etc.) and org-mode compatible table formats?

I often use snippets like these to paste results into e-mails and org or markdown documents:

print(ascii(x, digits=2), type="org")
# |   | ISO3 | ADM0_NAME                   | ELEVATION     | whea_h   |
# |---+------+-----------------------------+---------------+----------|
# | 1 | TZA  | United Republic of Tanzania |               | 19.00    |
# | 2 | TZA  | United Republic of Tanzania | (3e+02,5e+02] | 0.00     |
# | 3 | TZA  | United Republic of Tanzania | (5e+02,9e+02] | 743.00   |
# | 4 | TZA  | United Republic of Tanzania | (9e+02,1e+03] | 9519.00  |
# | 5 | TZA  | United Republic of Tanzania | (1e+03,2e+03] | 29814.00 |
# | 6 | TZA  | United Republic of Tanzania | (2e+03,5e+03] | 894.00   |

knitr::kable(x, format="markdown")
# |ISO3 |ADM0_NAME                   |ELEVATION     | whea_h|
# |:----|:---------------------------|:-------------|------:|
# |TZA  |United Republic of Tanzania |NA            |     19|
# |TZA  |United Republic of Tanzania |(3e+02,5e+02] |      0|
# |TZA  |United Republic of Tanzania |(5e+02,9e+02] |    743|
# |TZA  |United Republic of Tanzania |(9e+02,1e+03] |   9519|
# |TZA  |United Republic of Tanzania |(1e+03,2e+03] |  29814|
# |TZA  |United Republic of Tanzania |(2e+03,5e+03] |    894|
@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Aug 1, 2016

Contributor

@mbacou not quite convinced of the utility of adding this to print.data.table when ascii::print and knitr::kable already seem to do a fine job...

Contributor

MichaelChirico commented Aug 1, 2016

@mbacou not quite convinced of the utility of adding this to print.data.table when ascii::print and knitr::kable already seem to do a fine job...

@mbacou

This comment has been minimized.

Show comment
Hide comment
@mbacou

mbacou Aug 2, 2016

Agreed. I'd vote for minimal output as well, but if you plan to provide more fancy printing options, then using a table format that pandoc can process would make sense.

mbacou commented Aug 2, 2016

Agreed. I'd vote for minimal output as well, but if you plan to provide more fancy printing options, then using a table format that pandoc can process would make sense.

@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Mar 22, 2017

A minor thing, but it might be a good idea to export print.data.table. I only noticed it was hidden when typing args(print.data.table) just now.

franknarf1 commented Mar 22, 2017

A minor thing, but it might be a good idea to export print.data.table. I only noticed it was hidden when typing args(print.data.table) just now.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Aug 10, 2017

Contributor

@franknarf1 any other reason? we have ?print.data.table now and args(data.table:::print.data.table) have that covered. was just about to file the export in a PR, but stopped myself. i don't think it's uncommon for print methods to be hidden (see print.lm/print.glm in base, e.g.)

Contributor

MichaelChirico commented Aug 10, 2017

@franknarf1 any other reason? we have ?print.data.table now and args(data.table:::print.data.table) have that covered. was just about to file the export in a PR, but stopped myself. i don't think it's uncommon for print methods to be hidden (see print.lm/print.glm in base, e.g.)

MichaelChirico added a commit to MichaelChirico/data.table that referenced this issue Aug 10, 2017

@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Aug 10, 2017

@MichaelChirico Nope. Not a problem unexported as you say; thanks for asking.

franknarf1 commented Aug 10, 2017

@MichaelChirico Nope. Not a problem unexported as you say; thanks for asking.

mattdowle added a commit that referenced this issue Aug 10, 2017

part of #1523 -- split print.data.table to its own file (#2291)
Moved print.data.table to its own file with a few closely associated methods, part of #1523
@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Aug 19, 2017

Another idea: an option dput = TRUE, that will write reproducible code (since dput(DT) doesn't work). Something like

dtput = function(DT){
  d0 = capture.output(dput(setattr(data.table:::shallow(DT), ".internal.selfref", NULL)))
  cat("data.table::alloc.col(", d0, ")\n", sep="\n")
}

# example
library(data.table)
DT = as.data.table(as.list(1:10))
dtput(DT)
# which writes...
data.table::alloc.col(
structure(list(V1 = 1L, V2 = 2L, V3 = 3L, V4 = 4L, V5 = 5L, V6 = 6L, 
    V7 = 7L, V8 = 8L, V9 = 9L, V10 = 10L), .Names = c("V1", "V2", 
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), row.names = c(NA, 
-1L), class = c("data.table", "data.frame"))
)

... except less hacky and embedded in print.data.table. I guess if dput = TRUE, all the others can be ignored. Getting fancy, maybe allow dput = "file.txt" like dput() does. I figure it makes enough sense to put it in print, and it's not worth it to add a new function.

franknarf1 commented Aug 19, 2017

Another idea: an option dput = TRUE, that will write reproducible code (since dput(DT) doesn't work). Something like

dtput = function(DT){
  d0 = capture.output(dput(setattr(data.table:::shallow(DT), ".internal.selfref", NULL)))
  cat("data.table::alloc.col(", d0, ")\n", sep="\n")
}

# example
library(data.table)
DT = as.data.table(as.list(1:10))
dtput(DT)
# which writes...
data.table::alloc.col(
structure(list(V1 = 1L, V2 = 2L, V3 = 3L, V4 = 4L, V5 = 5L, V6 = 6L, 
    V7 = 7L, V8 = 8L, V9 = 9L, V10 = 10L), .Names = c("V1", "V2", 
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"), row.names = c(NA, 
-1L), class = c("data.table", "data.frame"))
)

... except less hacky and embedded in print.data.table. I guess if dput = TRUE, all the others can be ignored. Getting fancy, maybe allow dput = "file.txt" like dput() does. I figure it makes enough sense to put it in print, and it's not worth it to add a new function.

@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Dec 6, 2017

Another idea similar to those in #645 : turn off smart truncation of list column display: example from SO.

I see this truncation pretty frequently, and in some cases it'd be nice to see printing as if list column v was sapply(v, toString) instead.

franknarf1 commented Dec 6, 2017

Another idea similar to those in #645 : turn off smart truncation of list column display: example from SO.

I see this truncation pretty frequently, and in some cases it'd be nice to see printing as if list column v was sapply(v, toString) instead.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico Jan 10, 2018

Contributor

@franknarf1 i think a very easy fix would be here:

paste(c(format(head(x,6), justify=justify, ...), if(length(x)>6)""),collapse=",")

change "" to "...". What do you think? I like toString, but should also come with a default width parameter, I'm not sure how to do that robustly.


actually, re-reading toString.default:

function (x, width = NULL, ...) 
{
    string <- paste(x, collapse = ", ")
    if (missing(width) || is.null(width) || width == 0) 
        return(string)
    if (width < 0) 
        stop("'width' must be positive")
    if (nchar(string, type = "w") > width) {
        width <- max(6, width)
        string <- paste0(strtrim(string, width - 4), "....")
    }
    string
}

It seems the default way of handling width is similar to what's currently implemented. I think limiting output based on on-screen width rather than truncating to the first few elements is better, no?

This approach also allows better user interaction since toString is S3-registered -- we (or end users) could write/customize toString.* methods for any use cases that arise. Perhaps add a colWidth parameter to print.data.table which would be dropped into width of toString.default...

Contributor

MichaelChirico commented Jan 10, 2018

@franknarf1 i think a very easy fix would be here:

paste(c(format(head(x,6), justify=justify, ...), if(length(x)>6)""),collapse=",")

change "" to "...". What do you think? I like toString, but should also come with a default width parameter, I'm not sure how to do that robustly.


actually, re-reading toString.default:

function (x, width = NULL, ...) 
{
    string <- paste(x, collapse = ", ")
    if (missing(width) || is.null(width) || width == 0) 
        return(string)
    if (width < 0) 
        stop("'width' must be positive")
    if (nchar(string, type = "w") > width) {
        width <- max(6, width)
        string <- paste0(strtrim(string, width - 4), "....")
    }
    string
}

It seems the default way of handling width is similar to what's currently implemented. I think limiting output based on on-screen width rather than truncating to the first few elements is better, no?

This approach also allows better user interaction since toString is S3-registered -- we (or end users) could write/customize toString.* methods for any use cases that arise. Perhaps add a colWidth parameter to print.data.table which would be dropped into width of toString.default...

@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Jan 10, 2018

@MichaelChirico One point in favor of the trailing "," over a ",..." is that it saves horizontal space. Nonetheless, that seems like a good change, since most users won't guess what the "," means.

Rather than that change, I was more interested in was printing a higher number of entries in place of 6 in head(x, 6), like your colWidth idea.

Re methods, I'd find an argument like formatters = list(character = function(x) toString(x), lm = function(x) x$qr$tol) easy to use (to be used for list columns provided every element matches the named class or is NULL). Not sure if that's what you meant.

franknarf1 commented Jan 10, 2018

@MichaelChirico One point in favor of the trailing "," over a ",..." is that it saves horizontal space. Nonetheless, that seems like a good change, since most users won't guess what the "," means.

Rather than that change, I was more interested in was printing a higher number of entries in place of 6 in head(x, 6), like your colWidth idea.

Re methods, I'd find an argument like formatters = list(character = function(x) toString(x), lm = function(x) x$qr$tol) easy to use (to be used for list columns provided every element matches the named class or is NULL). Not sure if that's what you meant.

@jsams

This comment has been minimized.

Show comment
Hide comment
@jsams

jsams May 23, 2018

Contributor

Thought I would drop a mention of #2893 here as the two seem closely related.

Contributor

jsams commented May 23, 2018

Thought I would drop a mention of #2893 here as the two seem closely related.

@franknarf1

This comment has been minimized.

Show comment
Hide comment
@franknarf1

franknarf1 Aug 17, 2018

(Similar to my last comment...) Having a data.table like...

library(data.table)
(DT <- data.table(id = 1:2, v = numeric_version("0.0.0")))
#   id                 v
# 1:  1 <numeric_version>
# 2:  2 <numeric_version>

I cannot really read the contents of my list column, even though there is a print method for it.

It would be nice to have a way to tell data.table how I want a list column of a certain class printed, like ...

library(magrittr)

formatters = list(numeric_version = as.character)

printDT = data.table:::shallow(DT)
left_cols = which(sapply(DT, is.list))
for (i in seq_along(formatters)){
    if (length(left_cols) == 0L) break 
    alt_cols = left_cols[ sapply(DT[, ..left_cols], inherits, names(formatters)[i]) ]    
    if (length(alt_cols)){
      printDT[, (alt_cols) := lapply(.SD, formatters[[i]]), .SDcols = alt_cols][]
      left_cols = setdiff(left_cols, alt_cols)
    }
}
print(printDT)

   id     v
1:  1 0.0.0
2:  2 0.0.0

Could have that list passed by the user in options(datatable.print.formatters = formatters). To reduce the computational burden, I guess this would be done after filtering with nrows= and topn=.

franknarf1 commented Aug 17, 2018

(Similar to my last comment...) Having a data.table like...

library(data.table)
(DT <- data.table(id = 1:2, v = numeric_version("0.0.0")))
#   id                 v
# 1:  1 <numeric_version>
# 2:  2 <numeric_version>

I cannot really read the contents of my list column, even though there is a print method for it.

It would be nice to have a way to tell data.table how I want a list column of a certain class printed, like ...

library(magrittr)

formatters = list(numeric_version = as.character)

printDT = data.table:::shallow(DT)
left_cols = which(sapply(DT, is.list))
for (i in seq_along(formatters)){
    if (length(left_cols) == 0L) break 
    alt_cols = left_cols[ sapply(DT[, ..left_cols], inherits, names(formatters)[i]) ]    
    if (length(alt_cols)){
      printDT[, (alt_cols) := lapply(.SD, formatters[[i]]), .SDcols = alt_cols][]
      left_cols = setdiff(left_cols, alt_cols)
    }
}
print(printDT)

   id     v
1:  1 0.0.0
2:  2 0.0.0

Could have that list passed by the user in options(datatable.print.formatters = formatters). To reduce the computational burden, I guess this would be done after filtering with nrows= and topn=.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment