New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved show for DataFrames #995
Conversation
Thanks. These positional arguments are really getting out of hand. We should probably get rid of I don't think |
src/abstractdataframe/show.jl
Outdated
#' | ||
#' @returns o::Void A `nothing` value. | ||
#' | ||
#' @examples | ||
#' | ||
#' df = DataFrame(A = 1:3, B = ["x", "y", "z"]) | ||
#' showcols(df, true) | ||
function showcols(io::IO, df::AbstractDataFrame) # -> Void | ||
#' showcols(STDOUT, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example was correct, STDOUT
is implicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thould it should stay as the example with implicit STDOUT
is given below for definition of function showcols(df::AbstractDataFrame, allcols::Bool=false, values::Bool=true)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Anyway until we turn these into real docstrings this is pretty abstract.
@nalimilan Thx for the comments. I will correct the PR. For the record let me add that this kind of output also should be fixed:
|
@nalimilan I hope I have covered all your comments correctly. I have left Additionally I have changed the formula calculating the width of the string so that |
I think this would be a great fix; care to rebase? |
Sure - I thought it was rejected. |
Sorry, the package has struggled w/ maintenance over the last year or so as development moved over to DataTables.jl, but all that work has now been backported here and all development will now resume here. |
Sorry, I think I didn't finish the review because I wanted to get a complete understanding of the design space, and I didn't find the time for that at that point. |
I have started to update the PR and have one issue to clarify. Consider running the following line in REPL:
on master it shows:
and latest release it shows:
and if we cast
Which is the preferred target printing style? Or maybe yet some other option? |
I definitely think we should be consistent w/ array printing (last example). Show strings as quoted strings, as well as symbol that way. I think that's the only real solution if we want to be able to show unicode + control characters/whitespace and maintain the correct column widths. |
Agreed, the |
Thank you for the comments. Regarding refactoring of show I have some additional thoughts:
In short - the problem is complex so I will try to give it some thought and will open a separate issue and write down my recommendation. I will strip this PR from field width calculation changes and leave only improved display features. |
Yes, that would be a problem unless we add a header with the eltype of each column. But we can keep the quotes for now and discuss that possibility later.
Some truncation should probably be applied (I think you can do that by setting a property on |
@bkamins I think you need to rebase on the latest master. If there are still unrelated commits, use |
Agreed - that is why in this PR I have left only changes to One question regarding git: I have made a merge not a rebase and now I can see the bad consequences that all the intermediate commits got included. Is there any simple and safe way to fix it? |
Changes Unknown when pulling a58e9d6 on bkamins:newshow into ** on JuliaData:master**. |
Hmm... I guess the easiest solution would be to start a new branch from master, cherry-pick your commits into it, and then force push to this branch using |
Just as a comment: I believe that the build failed on Julia latest is unrelated to this PR. |
src/abstractdataframe/show.jl
Outdated
@@ -315,21 +317,30 @@ function showrows(io::IO, | |||
rowindices2::AbstractVector{Int}, | |||
maxwidths::Vector{Int}, | |||
splitchunks::Bool = false, | |||
rowlabel::Symbol = :Row, | |||
allcols::Bool = true, | |||
rowlabel::Symbol = Symbol("Row"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't change this line. Same below (twice).
src/abstractdataframe/show.jl
Outdated
#' @param allcols::Bool If `false` (default), only a subset of columns | ||
#' fitting on the screen is printed. | ||
#' @param values::Bool If `true` (default), first and last value of | ||
#' each column is printed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"are". Maybe also add "the" (not a native speaker here).
src/abstractdataframe/show.jl
Outdated
#' showcols(df, true) | ||
function showcols(io::IO, df::AbstractDataFrame) # -> Void | ||
#' showcols(STDOUT, df) | ||
function showcols(io::IO, df::AbstractDataFrame, allcols::Bool = false, values::Bool = true) # -> Void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep rows below 92 chars (same elsewhere).
src/abstractdataframe/show.jl
Outdated
nrows, ncols = size(df) | ||
if values && nrows > 0 | ||
if nrows == 1 | ||
metadata[:Values] = [Symbol(sprint(showcompact, df[1, i])) for i in 1:ncols] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use ourshowcompact
? Why do you need Symbol
?
src/abstractdataframe/show.jl
Outdated
#' count. | ||
#' | ||
#' @param df::AbstractDataFrame An AbstractDataFrame. | ||
#' @param allcols::Bool If `false` (default), only a subset of columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just call this all
, since "col" is already clear from the function's name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies, right?
test/show.jl
Outdated
|
||
io = IOBuffer() | ||
show(io, df) | ||
show(io, df, true) | ||
showall(io, df) | ||
showall(io, df, true) | ||
showall(io, df, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you test the actual output? You can use triple-quoted strings for that.
@nalimilan Inline comments got removed so I reply here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sorry for bothering you with tests, but that's the only way to ensure somebody doesn't break your improvements in the future.
src/abstractdataframe/show.jl
Outdated
@@ -297,6 +297,8 @@ end | |||
#' required to render each column. | |||
#' @param splitchunks::Bool Should the printing of the AbstractDataFrame | |||
#' be done in chunks? Defaults to `false`. | |||
#' @param allcols::Bool Should only one chunk be printed if printing in | |||
#' chunks? Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaults to false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to true
src/abstractdataframe/show.jl
Outdated
if isempty(rowindices1) | ||
if displaysummary | ||
println(io, summary(df)) | ||
end | ||
return | ||
end | ||
|
||
rowmaxwidth = maxwidths[ncols + 1] | ||
chunkbounds = getchunkbounds(maxwidths, splitchunks, displaysize(io)[2]) | ||
nchunks = length(chunkbounds) - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be clearer to do nchunks = allcols ? length(chunkbounds) - 1 : min(nchunks, 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed (with a bit different code as nchunks
is undefined before this line)
src/abstractdataframe/show.jl
Outdated
showall(io, metadata, true, Symbol("Col #"), false) | ||
nrows, ncols = size(df) | ||
if values && nrows > 0 | ||
# type of Values column is now String; it might need to be changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is needed: tests will (or should) catch this and people will figure out what needs to be changed anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/abstractdataframe/show.jl
Outdated
#' count. | ||
#' | ||
#' @param df::AbstractDataFrame An AbstractDataFrame. | ||
#' @param allcols::Bool If `false` (default), only a subset of columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies, right?
src/abstractdataframe/show.jl
Outdated
# type of Values column is now String; it might need to be changed | ||
# if the way strings are printed in data frames changes | ||
if nrows == 1 | ||
metadata[:Values] = [sprint(showcompact, df[1, i]) for i in 1:ncols] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what about using ourshowcompact
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - but it creates problems in corner cases (described in TODO for getmaxwidths
)
4×3 DataFrames.DataFrame | ||
│ Row │ A │ B │ C │ | ||
├─────┼───┼───────────────┼─────┤ | ||
│ 1 │ 1 │ x\" │ 1.0 │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the fact that vertical lines are not aligned is a bug elsewhere? Then better leave a TODO
somewhere to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are aligned when the string is printed, but "
needs to be escaped in string literal which breaks alignment in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there is a problem - in the next line │ 2 │ 2 │ ∀ε⫺0: x+ε⫺x │ 2.0 │
which is not aligned properly and it is a TODO do be added for getmaxwidths
function. Sorry for confusion
test/show.jl
Outdated
show(io, df, true) | ||
showall(io, df) | ||
showall(io, df, true) | ||
show(io, df_big) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test the output of these functions (even if that's verbose, it's fine)? Else the case when there are too many rows won't be covered. You should probably pass a custom IOContext
to control the size of the display. Also better define df_big
here rather than above, where it isn't used.
test/show.jl
Outdated
df = DataFrame(A = 1:3, B = ["x", "y", "z"]) | ||
# In the future newline characte \n should be added to this test case | ||
df = DataFrame(A = 1:4, B = ["x\"", "∀ε⫺0: x+ε⫺x", "z\$", "ABC"], | ||
C = Float32[1.0, 2.0, 3.0, 4.0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a null
value somewhere so that this is covered (unless it's done elsewhere already)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already covered I believe in line:
df = DataFrame(Fish = ["Suzy", "Amir"], Mass = [1.5, null])
at the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but showcols
isn't tested there. Would be worth adding a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added showcols
test
@nalimilan I hope I have managed to clean up everything. |
test/show.jl
Outdated
@@ -41,27 +39,181 @@ module TestShow | |||
refstr = """ | |||
4×3 DataFrames.DataFrame | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a detail, but we probably don't need an empty line? That would be more consistent with the other format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
│ 24 │ 0.762276 │ 0.755415 │ | ||
│ 25 │ 0.339081 │ 0.649056 │""" | ||
|
||
io = IOContext(IOBuffer(), :displaysize=>(10,40)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe set the number of rows to a lower value in order to have smaller test and check what happens when not all rows can be shown in a single page? Can also be done in a later PR if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is what I check here. I assume 10 rows and 40 columns. And you can see the difference between show
and showall
.
show
limits the output to fit page height and showall
does not do that.
They also differ in how they handle wide data (not fitting the screen vertically) and we set allcols
to true
: show
does paging and showall
prints full table ignoring :displaysize
(which could useful, when e.g. we want to dump DataFrame
show result to a file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me! Maybe others have comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkamins this looks great! Any idea what this error is from? https://ci.appveyor.com/project/nalimilan/dataframes-jl/build/1.0.392/job/0dmu2dx08fjf51ss#L133
edit: looks like an Int32/64
comparison issue
@cjprybol fixed the issue with tests on 32 bit machine. |
Thanks! Merging since Travis doesn't seem to be willing to run on Mac... |
A proposal to solve #760.
Summary of changes:
show
andshowall
get a new argumentonechunk
which limits the number of printed chunks to 1 ifsplitchunks
istrue
(with appropriate message in the summary)showcols
gets two parametersallcols
(if all columns of anAbstractDataFrame
should be printed or only those fitting on the screen) andvalues
(if sample of values of anAbstractDataFrame
should be printed)showcompact
is defined forAbstractDataFrame
andGroupedDataFrame