Skip to content

Conversation

@dgkf
Copy link

@dgkf dgkf commented Jan 20, 2020

Addresses #1779

Specifically,

  • For MIME"text/plain"
    • When allcols=true and any column would cause the display to overflow the IOContext, prints "mxn DataFrame. Display too narrow: Omitted printing of $ncols columns"
    • When allcols=false and no columns are displayed, only print the header (avoiding situations where all the row numbers continue to print despite all columns being omitted)
  • For MIME"text/html"
    • This was just an off-by-one error using searchsortedfirst
    • Also short-circuit the printing if no columns get printed as to not print a table of only row indices
  • For MIME"text/latex"
    • Added same off-by-one error fix for searchsortedfirst
    • I think this is all that's needed. It definitely prevents the inclusion of the column, but I think that it still prints out columnless rows of a table. I didn't render any documents to fully test this. I'm not sure what the expected behavior would be.

Added Tests:

  • Added tests for all MIME types and parameterized variants of behavior.
  • Latex was tested against a generated table that lacked the column text, but I haven't tried to render it and would appreciate guidance on expected result.

@dgkf dgkf changed the title Clean up column omission Clean up column omission when all columns overflow IOContext width Jan 20, 2020
@dgkf dgkf changed the title Clean up column omission when all columns overflow IOContext width Clean up column omission when columns overflow IOContext width Jan 20, 2020
header = displaysummary ? summary(df) : ""

# when allcols=true and any column overflows display, print omission message
if allcols && any(colwidths[end] .+ colwidths[1:end-1] .> displaysize(io)[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when allcols=true we should disregard displaysize and just print everything if splitcols=false. What you propose should be done only if splitcols=true (splitcols=false means that the user wants to force printing everything as it is).

@bkamins
Copy link
Member

bkamins commented Jan 20, 2020

Thank you for the PR. I have left some minor comments.

I'm not sure what the expected behavior would be.

For LaTeX I think when we do not display any column we should not print any rows. Do you think the same?

@dgkf
Copy link
Author

dgkf commented Jan 20, 2020

For LaTeX I think when we do not display any column we should not print any rows. Do you think the same?

I agree. I should have added more description about why I think this is ambiguous. Unlike the "text/plain" and "text/html" outputs, the "text/latex" output doesn't currently print any sort of message about omitted columns. Should I add that message, or should we just output nothing?

@bkamins
Copy link
Member

bkamins commented Jan 20, 2020

Should I add that message, or should we just output nothing?

I would add a message (like in HTML).

Now I think you see that even simplest PRs are usually lead to many small decisions, as we have a lot of legacy code written several years ago in the package that no one touched since 😄 (and for big PRs it is even more complex as you have to also consider the whole JuliaData ecosystem).

@dgkf
Copy link
Author

dgkf commented Jan 21, 2020

I addressed all of your notes. Specifically:

  • now uses textlength
  • removed superfluous column condition (and redid a lot of the logic for splitcols anyways)
  • columns now get omitted after the first column that overflows the display when splitcols=false
  • all columns get omitted if any column individually overflows the display when splitcols=true
  • latex output now considers io display limits, printing an omission message below the table when columns aren't displayed.

A note, the existing latex overflow is compared is against io display width, which probably isn't indicative of when an outputted table will overflow the render target. I also couldn't find a way of reasonably generating a latex table in a limited io display, so I'm not sure what situations this arises in. Regardless, I don't think this is worth trying to fix in this pull request, and probably would be more appropriately solved in PrettyTables.jl.

@bkamins
Copy link
Member

bkamins commented Jan 21, 2020

When allcols==false then splitcols value should be ignored (see the docstring of show).

So I understand the logic should be:

  1. allcols==true && splitcols==true => stop printing at first over-wide column (this is how I understand it works in your PR)
  2. allcols==true && splitcols==false => always print everything disregarding display width (this is how I understand it works in your PR) also docstring should be updated to cover this case more precisely
  3. allcols==false => ignore splitcols and print only as much as fits in a single page (this should be fixed as currently you vary the behavior depending on splitcols value, and we do not split cols at all if allcols==false)

@dgkf
Copy link
Author

dgkf commented Jan 21, 2020

Thanks @bkamins. I based this on the docstring as is, so let me know if the desired functionality should change. This is how it works currently, and to my understanding is the desired functionality.

  • allcols == true: (always prints all columns, regardless of whether it would overflow display width)
    • splitcols == true or splitcols == false: when any chunk overflows, display all output, even if it overflows the display width.
  • allcols == false: (omit column as needed)
    • splitcols == true: when any chunk overflows, omit all output.
    • splitcols == false: when any chunk (in this case only the first, which would only overflow when the first column is too wide) overflows display, omit all output.

The issue this stemmed from raised the allcols == false && splitcols == false case. Should we also omit columns when allcols == true? My thinking is that we shouldn't.

@bkamins
Copy link
Member

bkamins commented Jan 21, 2020

Well - we have an inconsistency here in the docstrings, as https://github.com/JuliaData/DataFrames.jl/blob/master/src/abstractdataframe/show.jl#L582 says something different 😢.
We should make them consistent (especially that only https://github.com/JuliaData/DataFrames.jl/blob/master/src/abstractdataframe/show.jl#L582 is shown to the user as the other docstring is added to an internal function).

So this is my understanding what is a right functionality:

  • allcols == true: (always prints all columns, regardless of whether it would overflow display width)
    • splitcols == true or splitcols == false: when any chunk overflows, display all output, even if it overflows the display width. (bkamins: AGREED)
  • allcols == false: (omit column as needed) (bkamins: Here I would ignore splitcols following https://github.com/JuliaData/DataFrames.jl/blob/master/src/abstractdataframe/show.jl#L582 description and just print as many columns as fit the display width - I do not think there is a case when the user would prefer not to have any output if some output would be possible to be displayed)
    • splitcols == true: when any chunk overflows, omit all output.
    • splitcols == false: when any chunk (in this case only the first, which would only overflow when the first column is too wide) overflows display, omit all output.

chunkoverflows = chunkwidths .> displaysize(io)[2]

if !allcols && any(chunkoverflows)
chunkedcols = chunkedcols[1:(findfirst(chunkoverflows) - 1)]
Copy link
Member

@bkamins bkamins Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not this and omit the next condition?

Suggested change
chunkedcols = chunkedcols[1:(findfirst(chunkoverflows) - 1)]
chunkedcols = chunkedcols[1:min(1, findfirst(chunkoverflows) - 1)]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't cover the case where no chunk overflows the display, but because allcols == false, only the first chunk is to be displayed anyways. I've reworked the code a bit to make it a bit clearer and will include changes in an upcoming commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I have meant min not max - fixed above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I understood what you were getting at. This still misses the case that I mentioned above.

If there's an alternative to findfirst(...) with a default value if nothing is found then we can clean this up quite a bit more, but as is we need to avoid findfirst returning nothing and throwing an error before we collapse the two conditionals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no matter if you changed it and it works. To avoid producing nothing use something(findfirst(...), the_value_you_want_on_nothing)

will fit within the width of the REPL window, this function will return an
array of chunk column indices that should be included in each chunk. For
example, if printing is divided into two chunks containing columns 1 through 3
and 4 through 5 respectively, `getchunkedcols` will return [[1, 2, 3], [4, 5]].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after reviewing the code it seems that it is better to return a vector of UnitRange - this is not a big deal as this method does not have to be super efficient, but it seems cleaner to return [1:3, 4:5] in your example - for sure it will allocate less (so it would be a better style).

│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │"""
@test str == """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if we had tests for all cases of combinations of allcols and splitcols and overwide column in different placements you implement (maybe we have it already, but I am just asking to be sure).

@bkamins
Copy link
Member

bkamins commented Jan 23, 2020

For the future - a standard thing to do for CI is to define column as e.g. DataFrame(a=Int64[1,2,3]) so that it is consistent across architectures.

@dgkf
Copy link
Author

dgkf commented Jan 23, 2020

Thanks, keep the tips coming. I got a lot to learn in the Julia space. I did take a look at your other tests using data from rand where Int64 was specified, but figured String was a pretty safe bet and the exact type isn't important for the test.

@bkamins bkamins added the breaking The proposed change is breaking. label Feb 12, 2020
@bkamins
Copy link
Member

bkamins commented Feb 12, 2020

@dgkf - do you have time to work on it to mark it for 1.0 release? If not it can wait

@bkamins
Copy link
Member

bkamins commented Feb 15, 2020

I have resolved the comments I think are resolved. Three unresolved are left - could you please have a look at them. Thank you!

@bkamins bkamins added display and removed breaking The proposed change is breaking. labels Aug 7, 2020
@bkamins
Copy link
Member

bkamins commented Aug 31, 2020

Closing as we ended up with #2403 approach. Thank you for working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants