Improved show for DataFrames #995

bkamins · 2016-06-12T05:47:18Z

A proposal to solve #760.

Summary of changes:

show and showall get a new argument onechunk which limits the number of printed chunks to 1 if splitchunks is true (with appropriate message in the summary)
showcols gets two parameters allcols (if all columns of an AbstractDataFrame should be printed or only those fitting on the screen) and values (if sample of values of an AbstractDataFrame should be printed)
showcompact is defined for AbstractDataFrame and GroupedDataFrame

nalimilan · 2016-06-12T09:21:35Z

Thanks. These positional arguments are really getting out of hand. We should probably get rid of splitchunks (whose name is really confusing) and tell people to use showcols instead. Then onechunk could be renamed to allcols. What do you think?

I don't think showcompact is intended for this kind of use: as the docs say, it's mainly for scalar values to provide a short representation without type information to be used inside arrays.

nalimilan · 2016-06-12T09:23:06Z

src/abstractdataframe/show.jl

 #'
 #' @returns o::Void A `nothing` value.
 #'
 #' @examples
 #'
 #' df = DataFrame(A = 1:3, B = ["x", "y", "z"])
-#' showcols(df, true)
-function showcols(io::IO, df::AbstractDataFrame) # -> Void
+#' showcols(STDOUT, df)


The example was correct, STDOUT is implicit.

I thould it should stay as the example with implicit STDOUT is given below for definition of function showcols(df::AbstractDataFrame, allcols::Bool=false, values::Bool=true).

OK. Anyway until we turn these into real docstrings this is pretty abstract.

bkamins · 2016-06-13T22:19:28Z

@nalimilan Thx for the comments. I will correct the PR.

For the record let me add that this kind of output also should be fixed:

df = DataFrame(x=["a", "\t", "\\", "\n", "\$", "z"], y=1:6)
6×2 DataFrames.DataFrame
│ Row │ x   │ y │
├─────┼─────┼───┤
│ 1   │ "a" │ 1 │
│ 2   │ "\t"  │ 2 │
│ 3   │ "\\" │ 3 │
│ 4   │ "\n"  │ 4 │
│ 5   │ "\$" │ 5 │
│ 6   │ "z" │ 6 │

bkamins · 2016-06-14T20:55:05Z

@nalimilan I hope I have covered all your comments correctly.

I have left splitchunks internally to differentiate the behavior of show and showall (similarly to arrays).

Additionally I have changed the formula calculating the width of the string so that DataFrames render correctly with escaped strings.

quinnj · 2017-09-07T18:03:28Z

I think this would be a great fix; care to rebase?

bkamins · 2017-09-07T19:29:34Z

Sure - I thought it was rejected.

quinnj · 2017-09-07T19:31:30Z

Sorry, the package has struggled w/ maintenance over the last year or so as development moved over to DataTables.jl, but all that work has now been backported here and all development will now resume here.

nalimilan · 2017-09-07T20:03:53Z

Sorry, I think I didn't finish the review because I wanted to get a complete understanding of the design space, and I didn't find the time for that at that point.

bkamins · 2017-09-10T21:57:12Z

I have started to update the PR and have one issue to clarify.
There is a change how DataFrames performs show between latest release and master.

Consider running the following line in REPL:

df = DataFrame(A = 1:4,
               B = ["x\"", "y\n", "z\$", "ABC"],
               C = Float32[1.0, 2.0, 3.0, 4.0],
               D = Symbol[:ABC,Symbol("x\""),Symbol("y\n"),Symbol("z\$")])

on master it shows:

4×4 DataFrames.DataFrame
│ Row │ A │ B   │ C   │ D   │
├─────┼───┼─────┼─────┼─────┤
│ 1   │ 1 │ x"  │ 1.0 │ ABC │
│ 2   │ 2 │ y
   │ 2.0 │ x"  │
│ 3   │ 3 │ z$  │ 3.0 │ y
  │
│ 4   │ 4 │ ABC │ 4.0 │ z$  │

and latest release it shows:

4×4 DataFrames.DataFrame
│ Row │ A │ B     │ C   │ D   │
├─────┼───┼───────┼─────┼─────┤
│ 1   │ 1 │ "x\""  │ 1.0 │ ABC │
│ 2   │ 2 │ "y\n"   │ 2.0 │ x"  │
│ 3   │ 3 │ "z\$"  │ 3.0 │ y
  │
│ 4   │ 4 │ "ABC" │ 4.0 │ z$  │

and if we cast df to an array we get yet another output (it could be reproduced by show of DataFrame with column names and pipes added - here I want to concentrate on how the field values are printed):

julia> convert(Matrix, df)
4×4 Array{Any,2}:
 1  "x\""  1.0  :ABC
 2  "y\n"  2.0  Symbol("x\"")
 3  "z\$"  3.0  Symbol("y\n")
 4  "ABC"  4.0  Symbol("z$")

Which is the preferred target printing style? Or maybe yet some other option?
I personally would feel comfortable with the third one (show values like show for arrays) as it would be consistent.

quinnj · 2017-09-11T03:34:31Z

I definitely think we should be consistent w/ array printing (last example). Show strings as quoted strings, as well as symbol that way. I think that's the only real solution if we want to be able to show unicode + control characters/whitespace and maintain the correct column widths.

nalimilan · 2017-09-11T07:27:46Z

Agreed, the Array output looks like a good reference. Though printing quotes around strings is a bit verbose, and we could get rid of it if we printed the column eltype in a header (like tibbles in R).

bkamins · 2017-09-11T20:54:10Z

Thank you for the comments. Regarding refactoring of show I have some additional thoughts:

if we omit " in strings how do we visually distinguish "NA" string from true NA (in R this is a problem with "<NA>")
how do we handle columns of custom types (normal and Nullable - whatever they will be eventually called) - in particular when their representation is very long (should some truncation be applied); in particular current show has problem for calculation of width of such structures, e.g.:

julia>     df = DataFrame(A=[[1:25;],"sdf"])
2×1 DataFrames.DataFrame
│ Row │ A                                                                                           │
├─────┼─────────────────────────────────────────────────────────────────────────────────────────────┤
│ 1   │ [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  16, 17, 18, 19, 20, 21, 22, 23, 24, 25] │
│ 2   │ "sdf"                                                                                       │

In short - the problem is complex so I will try to give it some thought and will open a separate issue and write down my recommendation. I will strip this PR from field width calculation changes and leave only improved display features.

nalimilan · 2017-09-11T21:15:35Z

if we omit " in strings how do we visually distinguish "NA" string from true NA (in R this is a problem with "")

Yes, that would be a problem unless we add a header with the eltype of each column. But we can keep the quotes for now and discuss that possibility later.

how do we handle columns of custom types (normal and Nullable - whatever they will be eventually called) - in particular when their representation is very long (should some truncation be applied); in particular current show has problem for calculation of width of such structures, e.g.:

Some truncation should probably be applied (I think you can do that by setting a property on IOContext now). But that can also be improved later, no need to fix everything in a single PR. It's not that common to have fields like that in DataFrames anyway.

nalimilan · 2017-09-11T21:38:07Z

@bkamins I think you need to rebase on the latest master. If there are still unrelated commits, use git rebase -i master and remove them.

bkamins · 2017-09-11T21:46:33Z

Agreed - that is why in this PR I have left only changes to show global behavior and left other changes for the future.

One question regarding git: I have made a merge not a rebase and now I can see the bad consequences that all the intermediate commits got included. Is there any simple and safe way to fix it?

coveralls · 2017-09-11T21:47:45Z

Changes Unknown when pulling a58e9d6 on bkamins:newshow into ** on JuliaData:master**.

nalimilan · 2017-09-11T21:48:55Z

Hmm... I guess the easiest solution would be to start a new branch from master, cherry-pick your commits into it, and then force push to this branch using git push --force bkamins :newshow.

coveralls · 2017-09-12T08:19:22Z

Coverage increased (+1.03%) to 88.15% when pulling c59bbf9 on bkamins:newshow into 885078a on JuliaData:master.

bkamins · 2017-09-13T17:38:45Z

Just as a comment: I believe that the build failed on Julia latest is unrelated to this PR.

nalimilan · 2017-09-13T18:25:20Z

src/abstractdataframe/show.jl

@@ -315,21 +317,30 @@ function showrows(io::IO,
                  rowindices2::AbstractVector{Int},
                  maxwidths::Vector{Int},
                  splitchunks::Bool = false,
-                  rowlabel::Symbol = :Row,
+                  allcols::Bool = true,
+                  rowlabel::Symbol = Symbol("Row"),


Shouldn't change this line. Same below (twice).

nalimilan · 2017-09-13T18:27:51Z

src/abstractdataframe/show.jl

+#' @param allcols::Bool If `false` (default), only a subset of columns
+#'        fitting on the screen is printed.
+#' @param values::Bool If `true` (default), first and last value of
+#'        each column is printed.


"are". Maybe also add "the" (not a native speaker here).

nalimilan · 2017-09-13T18:28:27Z

src/abstractdataframe/show.jl

-#' showcols(df, true)
-function showcols(io::IO, df::AbstractDataFrame) # -> Void
+#' showcols(STDOUT, df)
+function showcols(io::IO, df::AbstractDataFrame, allcols::Bool = false, values::Bool = true) # -> Void


Keep rows below 92 chars (same elsewhere).

nalimilan · 2017-09-13T18:29:34Z

src/abstractdataframe/show.jl

+    nrows, ncols = size(df)
+    if values && nrows > 0
+        if nrows == 1
+            metadata[:Values] = [Symbol(sprint(showcompact, df[1, i])) for i in 1:ncols]


Should this use ourshowcompact? Why do you need Symbol?

nalimilan · 2017-09-13T19:23:08Z

src/abstractdataframe/show.jl

+#' count.
+#'
+#' @param df::AbstractDataFrame An AbstractDataFrame.
+#' @param allcols::Bool If `false` (default), only a subset of columns


Maybe just call this all, since "col" is already clear from the function's name?

This still applies, right?

nalimilan · 2017-09-13T19:24:21Z

test/show.jl


    io = IOBuffer()
    show(io, df)
    show(io, df, true)
    showall(io, df)
-    showall(io, df, true)
+    showall(io, df, false)


Could you test the actual output? You can use triple-quoted strings for that.

bkamins · 2017-09-14T21:46:43Z

@nalimilan Inline comments got removed so I reply here:

changed Symbol("Row") to :Row
@param values comment string corrected
all lines are below 92 chars
I have removed Symbol in :Values formatting, but if we change the way DataFrame columns containing strings are printed it might have to be revised
allcols changed to all (but one has to remember it clashes with all function)
added testing of actual output in tests

coveralls · 2017-09-15T03:36:18Z

Coverage increased (+1.0%) to 88.08% when pulling 14c321c on bkamins:newshow into 885078a on JuliaData:master.

nalimilan

Thanks! Sorry for bothering you with tests, but that's the only way to ensure somebody doesn't break your improvements in the future.

nalimilan · 2017-09-15T07:33:47Z

src/abstractdataframe/show.jl

@@ -297,6 +297,8 @@ end
 #'        required to render each column.
 #' @param splitchunks::Bool Should the printing of the AbstractDataFrame
 #'        be done in chunks? Defaults to `false`.
+#' @param allcols::Bool Should only one chunk be printed if printing in
+#'        chunks? Defaults to `false`.


Defaults to false.

changed to true

nalimilan · 2017-09-15T07:37:57Z

src/abstractdataframe/show.jl

    if isempty(rowindices1)
+        if displaysummary
+            println(io, summary(df))
+        end
        return
    end

    rowmaxwidth = maxwidths[ncols + 1]
    chunkbounds = getchunkbounds(maxwidths, splitchunks, displaysize(io)[2])
    nchunks = length(chunkbounds) - 1


Would be clearer to do nchunks = allcols ? length(chunkbounds) - 1 : min(nchunks, 1).

changed (with a bit different code as nchunks is undefined before this line)

nalimilan · 2017-09-15T07:41:07Z

src/abstractdataframe/show.jl

-    showall(io, metadata, true, Symbol("Col #"), false)
+    nrows, ncols = size(df)
+    if values && nrows > 0
+        # type of Values column is now String; it might need to be changed


I don't think this comment is needed: tests will (or should) catch this and people will figure out what needs to be changed anyway.

nalimilan · 2017-09-15T07:42:37Z

src/abstractdataframe/show.jl

+#' count.
+#'
+#' @param df::AbstractDataFrame An AbstractDataFrame.
+#' @param allcols::Bool If `false` (default), only a subset of columns


This still applies, right?

nalimilan · 2017-09-15T07:42:53Z

src/abstractdataframe/show.jl

+        # type of Values column is now String; it might need to be changed
+        # if the way strings are printed in data frames changes
+        if nrows == 1
+            metadata[:Values] = [sprint(showcompact, df[1, i]) for i in 1:ncols]


Also, what about using ourshowcompact?

Changed - but it creates problems in corner cases (described in TODO for getmaxwidths)

nalimilan · 2017-09-15T07:45:11Z

test/show.jl

+    4×3 DataFrames.DataFrame
+    │ Row │ A │ B             │ C   │
+    ├─────┼───┼───────────────┼─────┤
+    │ 1   │ 1 │ x\"            │ 1.0 │


I suppose the fact that vertical lines are not aligned is a bug elsewhere? Then better leave a TODO somewhere to make it clear.

They are aligned when the string is printed, but " needs to be escaped in string literal which breaks alignment in the code.

Ah, of course!

Actually there is a problem - in the next line │ 2 │ 2 │ ∀ε⫺0: x+ε⫺x │ 2.0 │ which is not aligned properly and it is a TODO do be added for getmaxwidths function. Sorry for confusion

nalimilan · 2017-09-15T07:47:08Z

test/show.jl

-    show(io, df, true)
-    showall(io, df)
-    showall(io, df, true)
+    show(io, df_big)


Could you also test the output of these functions (even if that's verbose, it's fine)? Else the case when there are too many rows won't be covered. You should probably pass a custom IOContext to control the size of the display. Also better define df_big here rather than above, where it isn't used.

nalimilan · 2017-09-15T07:48:22Z

test/show.jl

-    df = DataFrame(A = 1:3, B = ["x", "y", "z"])
+    # In the future newline characte \n should be added to this test case
+    df = DataFrame(A = 1:4, B = ["x\"", "∀ε⫺0: x+ε⫺x", "z\$", "ABC"],
+                   C = Float32[1.0, 2.0, 3.0, 4.0])


Could you add a null value somewhere so that this is covered (unless it's done elsewhere already)?

It is already covered I believe in line:

df = DataFrame(Fish = ["Suzy", "Amir"], Mass = [1.5, null])

at the end of the file

Yes, but showcols isn't tested there. Would be worth adding a test.

added showcols test

bkamins · 2017-09-15T12:36:28Z

@nalimilan I hope I have managed to clean up everything.

nalimilan · 2017-09-15T12:49:40Z

test/show.jl

@@ -41,27 +39,181 @@ module TestShow
    refstr = """
    4×3 DataFrames.DataFrame



Just a detail, but we probably don't need an empty line? That would be more consistent with the other format.

nalimilan · 2017-09-15T12:51:31Z

test/show.jl

+    │ 24  │ 0.762276 │ 0.755415 │
+    │ 25  │ 0.339081 │ 0.649056 │"""
+
+    io = IOContext(IOBuffer(), :displaysize=>(10,40))


Maybe set the number of rows to a lower value in order to have smaller test and check what happens when not all rows can be shown in a single page? Can also be done in a later PR if you prefer.

I believe this is what I check here. I assume 10 rows and 40 columns. And you can see the difference between show and showall.
show limits the output to fit page height and showall does not do that.
They also differ in how they handle wide data (not fitting the screen vertically) and we set allcols to true: show does paging and showall prints full table ignoring :displaysize (which could useful, when e.g. we want to dump DataFrame show result to a file).

nalimilan

Thanks, looks good to me! Maybe others have comments?

coveralls · 2017-09-15T15:20:20Z

Coverage increased (+0.9%) to 88.064% when pulling 73c3c8e on bkamins:newshow into 885078a on JuliaData:master.

coveralls · 2017-09-15T16:41:06Z

Coverage increased (+0.9%) to 88.064% when pulling b376700 on bkamins:newshow into 885078a on JuliaData:master.

cjprybol

@bkamins this looks great! Any idea what this error is from? https://ci.appveyor.com/project/nalimilan/dataframes-jl/build/1.0.392/job/0dmu2dx08fjf51ss#L133

edit: looks like an Int32/64 comparison issue

bkamins · 2017-09-19T09:19:07Z

@cjprybol fixed the issue with tests on 32 bit machine.

coveralls · 2017-09-19T19:17:50Z

Coverage increased (+0.9%) to 88.064% when pulling 3d18d7d on bkamins:newshow into 885078a on JuliaData:master.

Issue addressed

nalimilan · 2017-09-19T19:32:51Z

Thanks! Merging since Travis doesn't seem to be willing to run on Mac...

coveralls · 2017-09-19T23:02:14Z

Coverage increased (+0.9%) to 88.064% when pulling 3d18d7d on bkamins:newshow into 885078a on JuliaData:master.

nalimilan reviewed Jun 12, 2016
View reviewed changes

nalimilan mentioned this pull request Sep 5, 2016

Show davidagold/AbstractTables.jl#1

Open

bkamins closed this Oct 21, 2016

bkamins reopened this Oct 21, 2016

improved show

c59bbf9

nalimilan reviewed Sep 13, 2017

View reviewed changes

show code cleanup

14c321c

nalimilan reviewed Sep 15, 2017

View reviewed changes

additional cleanup of show

73c3c8e

nalimilan reviewed Sep 15, 2017

View reviewed changes

remove newline in showcols

b376700

nalimilan approved these changes Sep 15, 2017

View reviewed changes

cjprybol approved these changes Sep 19, 2017

View reviewed changes

cjprybol previously requested changes Sep 19, 2017

View reviewed changes

bkamins added 2 commits September 19, 2017 11:16

fix Int32 error in tests

32e5b30

fix typo in comment

3d18d7d

nalimilan merged commit e06ac96 into JuliaData:master Sep 19, 2017

bkamins deleted the newshow branch September 19, 2017 20:57

cjprybol mentioned this pull request Oct 1, 2017

Don't show entire 1M-column DataFrame by default #760

Closed

		@@ -41,27 +39,181 @@ module TestShow
		refstr = """
		4×3 DataFrames.DataFrame

Improved show for DataFrames #995

Improved show for DataFrames #995

Conversation

bkamins commented Jun 12, 2016

nalimilan commented Jun 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Jun 13, 2016

bkamins commented Jun 14, 2016

quinnj commented Sep 7, 2017

bkamins commented Sep 7, 2017

quinnj commented Sep 7, 2017

nalimilan commented Sep 7, 2017

bkamins commented Sep 10, 2017

quinnj commented Sep 11, 2017

nalimilan commented Sep 11, 2017

bkamins commented Sep 11, 2017

nalimilan commented Sep 11, 2017

nalimilan commented Sep 11, 2017

bkamins commented Sep 11, 2017

coveralls commented Sep 11, 2017

nalimilan commented Sep 11, 2017

coveralls commented Sep 12, 2017 • edited

bkamins commented Sep 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Sep 14, 2017

coveralls commented Sep 15, 2017 • edited

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkamins commented Sep 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

coveralls commented Sep 15, 2017 • edited

coveralls commented Sep 15, 2017 • edited

cjprybol left a comment • edited

Choose a reason for hiding this comment

bkamins commented Sep 19, 2017

coveralls commented Sep 19, 2017 • edited

nalimilan commented Sep 19, 2017

coveralls commented Sep 19, 2017

coveralls commented Sep 12, 2017 •

edited

coveralls commented Sep 15, 2017 •

edited

coveralls commented Sep 15, 2017 •

edited

coveralls commented Sep 15, 2017 •

edited

cjprybol left a comment •

edited

coveralls commented Sep 19, 2017 •

edited