Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting data.table column outputs whole table to screen #109

Open
abielr opened this issue May 8, 2015 · 51 comments

Comments

Projects
None yet
8 participants
@abielr
Copy link
Contributor

commented May 8, 2015

If I run the code below, the second line will cause the entire dat object to be output, whereas at an R console it wouldn't return anything. The syntax used is the special syntax for setting columns with the popular data.table package. I'm using the 1.9.5 devel version of data.table.

library(data.table)
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]
@takluyver

This comment has been minimized.

Copy link
Member

commented May 8, 2015

withVisible(d[, x2:=1:10]) tells me that the result should be visible. I thought that was how R determined whether or not to print it. Clearly there's something else going on.

@abielr

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2015

Just a note if testing this: the same behavior you see in the notebook is also cropping up at the console in R 3.2.0, as described at Rdatatable/data.table#1122. So if trying to fix you will likely want to use R < 3.2.0 until they get this bug fixed.

@abielr

This comment has been minimized.

Copy link
Contributor Author

commented May 12, 2015

Fixing this on the IRkernel side is probably not reasonable. Under the hood, when you run a statement with the data.table := syntax, it sets a global variable equal to address(x), which is then checked inside of the print.data.table command to see if the output should be suppressed. In other words, when you run the command

dat[, x2 := 1:10]

It is triggering [.data.table followed by print.data.table, where print.data.table sees that it should not print this time, after which it resets the state of the global variable so that other print statements will run. However, when IRkernel is running evaluate(), the print.data.table statement never gets called, so all you see is the regular data.table object.

@takluyver

This comment has been minimized.

Copy link
Member

commented May 12, 2015

Ah, R ;-)

@abielr

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2015

Unfortunately because [ is a R primitive function, it cannot return an invisible object, see here. This is why the developers of data.table have resorted to the workaround of using an internal global variable to alter the behavior of print.data.table.

A fix on the IRkernel side is to check of the length of data[['text/plain']] inside the execution.R/handle_value() function. If the length is zero, nchar(data[['text/plain']]==0, then don't send back a response because if you were running this at the console then you would be getting nothing, and in general we would expect the printing of text objects to mimic the console in the notebook. This still allows the notebook to work properly with functions that return a blank string or NULL, which will print properly.

On a related note, it would be desirable to have a repr option that controls the maximum number of rows to print for a generic data.frame or matrix, similar to what RStudio does. By default this should be set to something not too large. Otherwise the user who accidentally prints their 10 million element matrix to screen ends up waiting a long time for the HTML to be built and displayed.

@takluyver

This comment has been minimized.

Copy link
Member

commented May 15, 2015

@flying-sheep : the idea of checking whether there's any text output, and suppressing all output if there isn't, sounds basically reasonable to me. Do you see any problems with that?

@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented May 16, 2015

Generally yes, but the question is what that means.

Will print per default do something unless you override it?

Because if it is overridden to not output anything, then it will usually do something unrelated, right? Like plotting or something.

@abielr

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2015

In general I would say an overridden print statement would not do anything else, though to be frank this use in data.table is the only place I've seen it. If the intention was to print graphics, the more canonical form would be to have a plot function, so that the user is typing plot(myobj).

But regardless, if a graphics command is called inside the print statement, the graphics callback used with the output handler in evaluate will still pick it up, allowing you to send back the plot even if there is no console output. For example,

library(evaluate)

mat <- function(x) {
  class(x) <- "mat"
  x
}

print.mat <- function(x, ...) {
  plot(rnorm(10))
  return(invisible())
}

oh <- new_output_handler(
  value = function(obj) {
    print("VALUE")
    val <- capture.output(print(obj))
    # Check the length of val to see if we should send text output back
  },

  graphics = function(plotobj) {
    print("GRAPHICS") # This will always get run
  }
)

m1 <- mat(matrix(1:4, 2, 2))
evaluate("m1", output_handler = oh)
@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented May 16, 2015

true. as said: i think we should do it. this was just a side thought :)

@takluyver

This comment has been minimized.

Copy link
Member

commented May 16, 2015

OK, @abielr, do you want to make a pull request?

@mattdowle

This comment has been minimized.

Copy link

commented Jun 19, 2015

Very much appreciate the kind language in this thread. Yes all correct.

Have just fixed Rdatatable/data.table#1122. About to release v1.9.6 to CRAN.

Note new wording of bug fixes in https://github.com/Rdatatable/data.table/blob/master/README.md :

if (TRUE) DT[,LHS:=RHS] no longer prints, #869 and #1122. Tests added. To get this to work we've had to live with one downside: if a := is used inside a function with no DT[] before the end of the function, then the next time DT or print(DT) is typed at the prompt, nothing will be printed. A repeated DT or print(DT) will print. To avoid this: include a DT[] after the last := in your function. If that is not possible (e.g., it's not a function you can change) then DT[] at the prompt is guaranteed to print. As before, adding an extra [] on the end of a := query is a recommended idiom to update and then print; e.g. > DT[,foo:=3L][]. Thanks to Jureiss and Jan Gorecki for reporting.

DT[FALSE,LHS:=RHS] no longer prints either, #887. Thanks to Jureiss for reporting.

:= no longer prints in knitr for consistency with behaviour at the prompt, #505. Output of a test knit("knitr.Rmd") is now in data.table's unit tests. Thanks to Corone for the illustrated report.

We had to add a workaround in data.table for knitr. Obviously ugly and not ideal. But := by reference is so fundamental in the DT[where, select|update|do, by] general form, that it was worth this hassle, so far. Could add a similar workaround for IRkernel too if that helps - let me know.

@ericwatt

This comment has been minimized.

Copy link

commented Dec 1, 2015

This seems to be the same or a related issue I am seeing. I found a simple reproducible example to show it.

The following data.table creation and := works correctly (though it does output to the screen).

DT1 = data.table(x=rep(c("a","b","c", "d"),each=15), 
                 y=c(1,3,NA,9), 
                 v=c(1:6,NA,NA,NA,NA,NA,NA), 
                 z=1:12)
DT1[,min:=pmin(y, v, na.rm=TRUE)]

When I make the data.table a bit larger by increasing each, I get a warning/error

DT2 = data.table(x=rep(c("a","b","c", "d"),each=18), 
                 y=c(1,3,NA,9), 
                 v=c(1:6,NA,NA,NA,NA,NA,NA), 
                 z=1:12)
DT2[,min:=pmin(y, v, na.rm=TRUE)]

Error in rbindlist(l, use.names, fill, idcol): Item 2 of list input is not a data.frame, data.table or list

If I separate the commands, the warning is from the line DT2[,min:=pmin(y, v, na.rm=TRUE)] However, the resulting DT2 prints to screen, and it is modified correctly with column min added. But even this is a bit strange.

print(DT2)

Gives no error, and outputs all 72 rows and 4 columns, just like in RStudio.

DT2

Gives the same error

Error in rbindlist(l, use.names, fill, idcol): Item 2 of list input is not a data.frame, data.table or list

but then outputs the same 72x4 data.table.

In my actual script, which has a much larger data.table, where I am doing several new columns with :=, this causes multiple of these errors to be reported, and the data.table to be output, but the end result seems to be the same as what I get in RStudio with no errors. The assignment seems to work. It does seem to run MUCH slower than in RStudio. I'm not sure if this is because it's outputing the table at each step, or if it's not assigning by reference in place.

Session info with versions is below:

sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Workstation release 6.7 (Santiago)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.9.6

loaded via a namespace (and not attached):
[1] magrittr_1.5 IRdisplay_0.3 tools_3.2.2 base64enc_0.1-3
[5] uuid_0.1-2 stringi_1.0-1 rzmq_0.7.7 IRkernel_0.5
[9] jsonlite_0.9.17 stringr_1.0.0 digest_0.6.8 chron_2.3-47
[13] repr_0.4 evaluate_0.8

jupyter --version
4.0.6

python --version
Python 2.7.10 :: Anaconda 2.4.0 (64-bit)

@takluyver

This comment has been minimized.

Copy link
Member

commented Dec 1, 2015

I would guess that when you see the error followed by the table, the error comes from the code in repr that attempts to generate an HTML version of that table. That's failing, so it falls back to showing the plain text table.

@takluyver

This comment has been minimized.

Copy link
Member

commented Dec 1, 2015

You can check this by doing:

repr::repr_html(DT2)
@ericwatt

This comment has been minimized.

Copy link

commented Dec 1, 2015

You're right, repr::repr_html(DT2) gives the same error, without showing the table. Perhaps the error is because repr_html() has a size limit for the resulting table?

I also get the same error with a much taller table, perhaps in this case it's the --- causing the issue:

      x  y  v  z min
   1: a  1  1  1   1
   2: a  3  2  2   2
   3: a NA  3  3   3
   4: a  9  4  4   4
   5: a  1  5  5   1
  ---               
7196: d  9 NA  8   9
7197: d  1 NA  9   1
7198: d  3 NA 10   3
7199: d NA NA 11  NA
7200: d  9 NA 12   9
@ericwatt

This comment has been minimized.

Copy link

commented Dec 1, 2015

head(DT2, 60) gives no error and outputs as a nicely formatted table, head(DT2, 61) gives error and outputs the table as plain text.

@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2015

why ‘---’? we insert and !

@takluyver

This comment has been minimized.

Copy link
Member

commented Dec 2, 2015

If it goes wrong at 61 rows, it seems likely that it's something going wrong when we truncate the table and insert ellipses.

@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2015

jup, that’s why i said that it’s strange to see “---” there

@ericwatt

This comment has been minimized.

Copy link

commented Dec 2, 2015

The --- is inserted by data.table if the number of rows are greater than 100 (by default). With >100 rows, it will print the first 5 and the last 5, separated by ---, just like my comment above shows.

It seems like there are two issues happening. If the data.table has nrows where 60 < nrows < 101, data.table will show all of the rows, but Jupyter is having trouble rendering this to an html table and giving an error as takluyver found when suggesting I check repr::repr_html(DT2). Above 100 rows, data.table itself is trying to print a summary, and the --- in the middle of the table it prints may be causing a similar issue with repr::repr_html(DT2). With a tall table (say 7200 rows), if I do a print(DT2) there is no error, and I get:

      x  y  v  z min
   1: a  1  1  1   1
   2: a  3  2  2   2
   3: a NA  3  3   3
   4: a  9  4  4   4
   5: a  1  5  5   1
  ---               
7196: d  9 NA  8   9
7197: d  1 NA  9   1
7198: d  3 NA 10   3
7199: d NA NA 11  NA
7200: d  9 NA 12   9

Which looks exactly like it does in the console. If the command is instead DT2 I get the error I reported above, and then the same output as I just showed for print(DT2).

When I try to print a 61 row data.table, no ellipses are inserted. It just prints 61 rows but in text format, not as a table.

Of course, in my case I don't want the data.table printed as all, as I'm using a := assignment which doesn't output to console usually, as mentioned in the comments before mine.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2016

This is affected by #285:

Currently a table does not output anything. IMO this needs a fix in data.table itself to put in the same workaround as for knit_print.

@breschke

This comment has been minimized.

Copy link

commented May 18, 2016

This problem appears to be causing performance issues for me. A simple assignment by reference dt[,newvar:=1] on a data.table of 30 million+ rows takes less than a second in my console, but endlessly hangs in a jupyter notebook running from the same machine. I may have stumbled on a workaround: when I tried wrapping the assignment by reference in system.time() system.time(dt[,newvar:=1]) to try to compare console and notebook behavior, the command executes as fast on the notebook as on the console.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

@breschke Could be that it prints the thing and then sends it over to the frontend (=Browser) which crashes?

Does this also work:

{
dt[,newvar:=1]
NULL
}

[The {...} is making this treated as one statement in evaluate and the NULL is returned, so nothing to print]

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

@breschke Which version of data.table and which version of IRkernel, IRdisplay, repr, and data.table? sessionInfo() should print these.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

There is also this: https://github.com/Rdatatable/data.table/blob/9d2d71098d849c99e6eebb0e0b539eb58d723b05/R/cedta.R#L4

Maybe repr should be added there as well?

You may try this:

assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode,"repr"), "data.table")

and then execute the assignment in a new cell?

But I still suspect that we need to get repr into this here:
https://github.com/Rdatatable/data.table/blob/d3567006b7b1d4cbb3a29ff22f8576e948e9c3e9/R/data.table.R#L35

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

See also Rdatatable/data.table#933 where I just commented...

@breschke

This comment has been minimized.

Copy link

commented May 19, 2016

@janschulz This works without hanging:

{
dt[,newvar:=1]
NULL
}

R version 3.2.2
data.table_1.9.6
IRkernel_0.5
IRdisplay_0.3
repr_0.7

Running on Ubuntu 15.10.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

Then this looks like Rdatatable/data.table#933 :-(

@breschke

This comment has been minimized.

Copy link

commented May 19, 2016

@janschulz I attempted your other suggestion:

assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode,"repr"), "data.table")

then assign by reference in a new cell. It appears to hang endlessly (doesn't execute immediately as it would wrapped as above). I'm not familiar enough with repr to comment on whether it should be included.

Note: it doesn't kill the kernel---I can interrupt and continue the session.

@mattdowle

This comment has been minimized.

Copy link

commented May 19, 2016

I'm almost following this one. The general root of this problematic area is explained in FAQ 2.22.

Thanks for finding and trying the assignInNamepspace() test @breschke. If that didn't work then adding repr to the white list in data.table isn't going to help unfortunately then. That test is a manual way to add to the whitelist. IRKernel is already present in the whitelist.

Can someone debug the hanging-endlessly point and establish what is happening there? Is it transferring the entire data.table between the processes for some reason or is it hanging for another reason? Might be able to tell using htop or other monitoring tools.

@mattdowle

This comment has been minimized.

Copy link

commented May 19, 2016

I now see @janschulz's suggestion in Rdatatable/data.table#933. Making the change now.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

Can someone debug the hanging-endlessly point and establish what is happening there? Is it transferring the entire data.table between the processes for some reason or is it hanging for another reason?

I suspect that there are multiple reasons:

I suspect we shouldn't transform it into a data.frame?

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

from the FAQ:

To solve this problem, the key was to stop trying to stop the print method running after a :=. Instead, inside := we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.

What is this flag? If it is useable outside of print, then we could use that as well?

Eg:

if (is.data.table(obj)) {
   if (data.table.should.print(obj)) { # does such a function exist?
       # do our converting to the right representation...
   }
}
@mattdowle

This comment has been minimized.

Copy link

commented May 19, 2016

@janschulz Might work. See first line of data.table:::print.data.table.
From your package would need to prefix with ::: to get to .global e.g.
if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print)
If that works, we could export .global for you.

Edit: if that works we should export a function for you to isolate you from data.table internals; e.g. data.table.should.print as you suggest.

@mattdowle

This comment has been minimized.

Copy link

commented May 19, 2016

And yes, running as.data.frame on it doesn't sound right as that will copy. If you are willing to import or depend on data.table then you could setDF() on it to save the copy. But that's only necessary because IRKernel has been adding to the whitelist since you mimic user running code at the prompt. The user code may be data.table-aware or not but the data.table calls are coming from your package rather than the global environment. Another way is perhaps when you eval(), pass the global environment to eval() rather than eval()ing in your own environment, to truly mimic what happens in the global environment. I'm just guessing now and haven't looked at your code.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

Another way is perhaps when you eval(), pass the global environment to eval() rather than eval()ing in your own environment, to truly mimic what happens in the global environment. I'm just guessing now and haven't looked at your code.

That we already do: https://github.com/IRkernel/IRkernel/blob/master/R/execution.r#L268-L272 :-)

Edit: if that works we should export a function for you to isolate you from data.table internals; e.g. data.table.should.print as you suggest.

If we should take the workaround, then this is definitely prefered... We already have our share of workarounds around CRANs "no ::: usage" policy :-)

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

Just to ask: if it is a global and we would use any data.table function, this global would be reset?

Eg what happens if a repr_text.data.table would subset the dt to print a shorter version (head and tail) and then repr_html.data.table would do the same? On the other hand, if it signals "not print", then it wouldn't be touched as we would never alter the dt (e.g. not call a functions which sets this global flag)...

Is this the right idea about this flag?

@mattdowle

This comment has been minimized.

Copy link

commented May 19, 2016

That we already do: https://github.com/IRkernel/IRkernel/blob/master/R/execution.r#L268-L272 :-)

Great. Then maybe IRKernel shouldn't be in the whitelist after all. Perhaps that's the problem. Just to quickly test, try assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table") and then try again.

If that works then it simplifies a lot as you don't need to be data.table-aware at all.

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

I'm pretty sure we need to: we are basically doing the same as knitr does with evaluate and knit_print: we evaluate code via evaluate and then print returned values (i.e. everything which is not invisible) via the repr_xxx functions.

I tried this:

library(data.table)
assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

and it printed the table...

@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 19, 2016

This is basically our implementation if there would be an exported function to get the flag: IRkernel/IRkernel#343 [That it errors is a bug on our side...]. You could probably implement something similar on your side for knit_print.data.table (or also taking the above functions into your package :-)) instead of using the hack with the the callstack in https://github.com/Rdatatable/data.table/blob/d3567006b7b1d4cbb3a29ff22f8576e948e9c3e9/R/data.table.R#L34-L39.

@mattdowle

This comment has been minimized.

Copy link

commented May 19, 2016

I just added mimicsAutoPrint to the callstack hack:
Rdatatable/data.table@689b624
If you could fetch that, add your function name using assignInNamespace() and test please. Then if it works let me know the function name I should add: repr_print.default ?

I also added and exported shouldPrint() to expose the flag (Rdatatable/data.table@3ec2d61). It resets the flag within it so it is a read-once function. If you need it twice in your logic, store the value from the first call.

Does this resolve everything? Should we continue to keep IRkernel on the whitelist or remove it? Maybe we can remove knitr from the whitelist too since the comment there is "knitr's eval is passed envir=globalenv() so doesn't need to be listed here currently, but we include it in case it decides to change that."

@breschke

This comment has been minimized.

Copy link

commented May 25, 2016

I updated to current dev versions of data.table and IRkernel and ran the following described in #343:

assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
repr_html.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_latex.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_text.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}

Now assignment by reference no longer displays output:

dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

but calling the object no longer prints (a summary of) the data.table--i.e., this does nothing:

dat

Personally, I'm fine with that, though I suspect others will dislike this behavior.

sessionInfo():

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.7

loaded via a namespace (and not attached):
 [1] R6_2.1.2        magrittr_1.5    IRdisplay_0.3   pbdZMQ_0.2-3   
 [5] tools_3.2.2     base64enc_0.1-3 uuid_0.1-2      stringi_1.0-1  
 [9] IRkernel_0.6    jsonlite_0.9.20 stringr_1.0.0   digest_0.6.9   
[13] repr_0.7        evaluate_0.9 
@jankatins

This comment has been minimized.

Copy link
Contributor

commented May 26, 2016

Ok, I tried this:

library(data.table)
old = data.table:::mimicsAutoPrint
old
reprs = c("repr_text.data.table", "repr_latex.data.table", "repr_markdown.data.table", "repr_html.data.table", "repr_text.default")
assignInNamespace("mimicsAutoPrint", c(old, reprs), "data.table")
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

and it still prints because we do have a repr_text.data.frame which is used and prints the dat because it is also a data.frame.

I also tried to remove the repr_text.data.frame method, but then repr ran into this callstack:

[...]
[[20]]
withCallingHandlers(withVisible(value_fun(ev$value, ev$visible)), 
    warning = wHandler, error = eHandler, message = mHandler)

[[21]]
withVisible(value_fun(ev$value, ev$visible))

[[22]]
value_fun(ev$value, ev$visible)

[[23]]
value_handler(x)

[[24]]
prepare_mimebundle(obj, .self$handle_display_error)

[[25]]
repr_text(obj)

[[26]]
repr_text.data.frame(obj)

[[27]]
NextMethod()

[[28]]
repr_text.default(obj)

[[29]]
paste(utils::capture.output(print(obj)), collapse = "\n")

[[30]]
utils::capture.output(print(obj))

[[31]]
evalVis(expr)

[[32]]
withVisible(eval(expr, pf))

[[33]]
eval(expr, pf)

[[34]]
eval(expr, envir, enclos)

[[35]]
print(obj)

[[36]]
print.data.table(obj)

which seems not to match ( length(SYS) > 3L && as.character(SYS[[length(SYS)-3L]][[1L]]) %chin% mimicsAutoPrint )

So Rdatatable/data.table@689b624 seems to be not working here because the callstack is just too different than the one from knit_print :-(

So we do have to go the repr_text.data.table with shouldPrint() way.

repr_text.data.table <- function(obj, ...){
    if (!data.table::shouldPrint(obj)) {
        invisible(NULL) # in IRkernel, will prevent any other repr_xx methods from being called
    } else {
        NextMethod() # fallsback to `repr_text.default`, which uses print(obj)
    }
}
# No need for repr_html/... as the only reason we have the above method 
# is to return null, which indicates to the IRkernel that nothing else should be printed.
# the shouldPrint() actually resets the flag, so it can't be used twice anyway...

This works as intended:

dat <- data.table(x1=1:10)
dat[, x2 := 1:10] # does not print
dat # prints

But now we have a different problem:

Currently we implicitly assume that each repr_* is independent of the other, so by calling shouldPrint() once in repr_text.data.table() we do not prevent printing with the other methods (e.g. a new repr_html.data.table) because the flag is reset. The actual situation is a bit different, because when repr_text returns an empty string, the rest of the mimetypes are not called in irkernel.

So maybe we should make this explicit in the documentation of repr?

The alternative is adding a repr_should_represent method which irkernel could then use and which has a special repr_should_represent.data.table which uses the flag.

I'm currently would prefere the former because it's basically what we do and it will only confuse users if they test in irkernel and it works and then (in the future) in another lib it works differently. On the other hand, for performance reasons, weh should probably add a repr_get_shorter_version() so that we don't do subseting 4 times and do not convert such big data.tables to data.frames... @flying-sheep @takluyver ?

@mattdowle: would you be able to include the repr_text.data.table function in the data.table? That way we do not need to guard against data.table being loaded and against older versions of the data.table package.

(In general, we would like to have packages exporting repr_xxx implementations for their data structures instead of having these implementations in the repr packages for this reasons...)

As a bonus, you could probably replace the knit_print specific callstack lookup (|| ( length(SYS) > 3L && as.character(SYS[[length(SYS)-3L]][[1L]]) %chin% mimicsAutoPrint ) with

knit_print.data.table <- function(x, ...) {
    if (!data.table::shouldPrint(x)) {
        invisible(NULL)
    } else {
        NextMethod() # which will fall back to your normal print, which will now see `shouldPrint() == T`
    }

This would prevent a problem if knitr would ever change it's knit_print.default implementation so that the above callstack would be different.

@mattdowle

This comment has been minimized.

Copy link

commented May 26, 2016

Thanks all! This is great info. Ok yes I see what you mean that adding the print methods to data.table might be best. Happy to give that a go. Will do.

@breschke

This comment has been minimized.

Copy link

commented Sep 21, 2016

Any update on this issue? Same behavior with data.table_1.9.6. If the data.table is very large, I find this causes intolerable lags in performance (hanging while trying to print).

The following prints the data.table during assignment by reference:

library(data.table)
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

but when I run this first:

assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
repr_html.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_latex.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_text.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}

printing is suppressed during assignment by reference, but:

dat

prints no output.

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.6

loaded via a namespace (and not attached):
 [1] R6_2.1.2           magrittr_1.5       IRdisplay_0.4.9000 pbdZMQ_0.2-3      
 [5] tools_3.2.2        crayon_1.3.2       uuid_0.1-2         stringi_1.0-1     
 [9] IRkernel_0.7       jsonlite_1.1       stringr_1.0.0      digest_0.6.10     
[13] chron_2.3-47       repr_0.9.9000      evaluate_0.9 
@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented Sep 21, 2016

Hi, looks like it works fine, but it needs the as-of-now unreleased shouldPrint

data table

@jeffwong-nflx

This comment has been minimized.

Copy link

commented Nov 22, 2016

Hi, this still seems to be an issue with the new notebook feature in the recent Rstudio 1.0 release. Any time a data table is modified with := it will inline the output in the notebook. I was reading this thread and saw that @mattdowle did something for knitr to avoid this behavior, can something be done with IRkernel too? The issue is very visible now that notebooks are so mainstream inside Rstudio 1.0

@takluyver

This comment has been minimized.

Copy link
Member

commented Nov 22, 2016

Is the Rstudio notebook using IRkernel? I know nothing of this.

@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2016

i don’t think so. i think it has nothing to do with us.

@abielr

This comment has been minimized.

Copy link
Contributor Author

commented Feb 19, 2017

Are there any plans to integrate the repr_XXX.data.table functions described above into the repr package so that this issue is fixed by default for anyone running a recent version of data.table? They work, but at the moment I end up copying and pasting them into the top of every notebook.

@flying-sheep

This comment has been minimized.

Copy link
Contributor

commented Feb 19, 2017

PRs welcome!

@flying-sheep flying-sheep transferred this issue from IRkernel/IRkernel Jan 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.