-
Notifications
You must be signed in to change notification settings - Fork 373
Make omitting columns more obvious with ellipses #2330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The current printing has frequently fooled me that my dataset is missing columns. This simple change adds a few more visual indicators that it's an incomplete printing in the place where you'd first look, and avoids the "complete"-looking `┤` character. Before: ``` julia> DataFrame([i .+ rand(100) for i in 1:20]) 100×20 DataFrame. Omitted printing of 14 columns │ Row │ x1 │ x2 │ x3 │ x4 │ x5 │ x6 │ │ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ 1 │ 1.25535 │ 2.23703 │ 3.81429 │ 4.68575 │ 5.67015 │ 6.87798 │ │ 2 │ 1.77179 │ 2.36252 │ 3.24647 │ 4.38293 │ 5.93563 │ 6.21075 │ │ 3 │ 1.33585 │ 2.40766 │ 3.7644 │ 4.1805 │ 5.81954 │ 6.65656 │ │ 4 │ 1.35647 │ 2.52656 │ 3.24724 │ 4.41236 │ 5.32153 │ 6.99919 │ ⋮ │ 96 │ 1.607 │ 2.82297 │ 3.295 │ 4.25163 │ 5.17801 │ 6.1482 │ │ 97 │ 1.7333 │ 2.71543 │ 3.41232 │ 4.36542 │ 5.80653 │ 6.7187 │ │ 98 │ 1.72051 │ 2.01457 │ 3.95139 │ 4.89867 │ 5.08942 │ 6.79844 │ │ 99 │ 1.59108 │ 2.09934 │ 3.24291 │ 4.727 │ 5.83118 │ 6.80525 │ │ 100 │ 1.79802 │ 2.86081 │ 3.53143 │ 4.61471 │ 5.90259 │ 6.05322 │ ``` After: ``` julia> DataFrame([i .+ rand(100) for i in 1:20]) 100×20 DataFrame. Omitted printing of 14 columns │ Row │ x1 │ x2 │ x3 │ x4 │ x5 │ x6 │ ⋯ │ │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ Float64 │ ├─────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼ ⋯ │ 1 │ 1.02564 │ 2.27613 │ 3.30434 │ 4.26961 │ 5.96806 │ 6.42394 │ │ 2 │ 1.8685 │ 2.56581 │ 3.54104 │ 4.2908 │ 5.72022 │ 6.42958 │ │ 3 │ 1.56869 │ 2.03671 │ 3.03344 │ 4.14143 │ 5.60765 │ 6.72854 │ │ 4 │ 1.8403 │ 2.06023 │ 3.76093 │ 4.59431 │ 5.71541 │ 6.80651 │ ⋮ ⋮ ⋯ │ 96 │ 1.30477 │ 2.59928 │ 3.64041 │ 4.968 │ 5.46243 │ 6.16648 │ │ 97 │ 1.00585 │ 2.5669 │ 3.65184 │ 4.55873 │ 5.07416 │ 6.53845 │ │ 98 │ 1.98973 │ 2.17375 │ 3.62001 │ 4.72003 │ 5.33069 │ 6.44367 │ │ 99 │ 1.35354 │ 2.50724 │ 3.8579 │ 4.0234 │ 5.6262 │ 6.81527 │ │ 100 │ 1.31846 │ 2.78889 │ 3.34216 │ 4.43111 │ 5.18149 │ 6.27506 │ ```
|
I see there's a bigger refactor pending in #2087. In contrast, this is a very minimal patch that only addresses the plaintext output. |
|
#2087 is not really active. |
| totalwidth += maxwidths[j] + 3 | ||
| if totalwidth > availablewidth | ||
| # Ensure there'd also be enough space to print the " ⋯" ellipses if needed | ||
| if totalwidth + (j < ncols ? 2 : 0) > availablewidth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please test the case when the width is exactly as the number of columns in the terminal. This is a problematic corner case on some terminals (not all; in particular the default Widnows terminal has this problem, but e.g. the new Windows terminal does not have this problem):

The strategy that should be used is to make it 1 char narrower, so that even the terminals that do not correctly handle the newline when they reach full width display the result correctly.
Maybe the fix you propose covers it but I would prefer to be sure :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the existing tests do a good job of covering this case and I added a very minimal extension. Printing this data frame:
3fcdf6e#diff-21ef36be0213824dd286b390e3590742R62
uses exactly 40 chars for the first 3 columns without the ellipses. Thus with a displaysize of 40, it'll only print 2 columns if the output is elided, and it'll print 3 if all columns are printed (split). I also added a test with a display size of 42 to ensure it prints 3 columns with the ellipses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch, though, doesn't give us that one extra buffer character. I can easily add it here now if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't give us that one extra buffer character
This is what I would prefer please (otherwise we will be getting the problem with display as on the screenshot above on some terminals). This was a long standing issue, but we have not worked on improving display in DataFrames.jl for a while, but I think we should fix it if we touch this part of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in ed3464e
We'll see if my blind attempt at adjusting the tests worked...
|
Thank you. This is a good proposal. Fixing HTML should be easy also. |
|
@ronisbr - in general do you think that PrettyTables.jl is now stable enough to think of switching to it as a backend for DataFrames.jl? (this will be a major effort to do, but I think it is worthwhile to try it) |
|
Hi @bkamins ! PrettyTables.jl is stable enough to be used. I have been using it everyday for quite some time without issues! However, and this is the real problem, it is not ready to crop tables the way DataFrames does today in text mode. Since PrettyTables is suppose to handle any kind of entry, including cells with multiple lines, I could not develop yet a good algorithm to print, for example, the beginning and the end of the table. Thus, today, it will always show the beginning of the table. I will see in the next days if I can improve and I will let you know if I can make any progress. |
|
Btw, when I managed to add this feature, count on me to help on migrating DataFrames to use PrettyTables if you want. EDIT: Wait a second, looking at the code, IIUC, DataFrames only do this "crop in the middle" in the vertical direction? This should be much easier to implement in PrettyTables. |
|
So, I believe that it would be better to have one common package for printing Tables.jl compliant values. It would greatly reduce the development effort. Actually what I think be the best way to go would be to allow in DataFrames.jl to switch display backends. In this way we could accept PrettyTables.jl "as is" and let the user decide which backend one wants to use. The benefit of PrettyTables.jl is that it is much faster to render wide tables (something we could work on improving). On the other hand, we provide many kwargs like In general some defaults in PrettyTables.jl probably will have to be tweaked in DataFrames.jl backend as e.g. this does not look very nice: or or the way how As for showing first and last rows - this would be of course excellent to have (per your edit - it should be doable). However, can you show me an output of multi-row data formatting? As in all tests I was doing it was just one line with |
This will be nice since it will give me time to improve PrettyTables.jl for this case! Probably we will need features that are not available yet.
Actually, you can pass
Yes, we need to tweak a little bit for those cases. PrettyTables.jl always uses the output from The second example is harder, but I think it still can be managed by the The same can be applied for julia> df
1×3 DataFrame
│ Row │ a │ b │ c │
│ │ Int64 │ Nothing │ Missing │
├─────┼───────┼─────────┼─────────┤
│ 1 │ 1 │ │ missing │
julia> pretty_table(df,
formatters = ((v,i,j)->begin
if isnothing(v)
""
else
v
end
end),
alignment = :l,
show_row_number = true,
hlines = [:header])
│ Row │ a │ b │ c │
│ │ Int64 │ Nothing │ Missing │
├─────┼───────┼─────────┼─────────┤
│ 1 │ 1 │ │ missing │DataFrames also add colors to
Yes! For this case I think I have a good solution :)
In this case you need the option |
|
In the mean time, I found something that I need to add to PrettyTables.jl. DataFrames does not consider the size of sub-header to define the size of the column. Thus, if the subheader is larger, then it gets cropped. This is very fine and I will add this support :) |
This is excellent. What I wanted to say that we have concrete names for concrete kwargs (there are almost 10 of them I think) and to make things work seamlessly we would have to make sure that they are properly mapped (so it is not only required to have a functionality but also to make sure kwarg names match). Anyway, with "backend switching" approach this is less a problem, as different backends can reasonably accept different kwargs.
What I mean is that usually Also you currently ignore screen width of a character when printing (copy-paste this to a terminal as a browser also might do it incorrectly 😄 - but in DataFrames.jl the table is properly formatted in the terminal):
This is what I assumed (i.e. that we would tweak the default behaviour when providing this backend)
Do you mean this behaviour (I guess yes, and I think it is needed as it is very useful to save screen space): also I just found: which is inconsistent with how Base now works. All in all - I am passing these examples not as bug reports to PrettyTables.jl but to show you what things we considered in DataFrames.jl when designing printing, and you might decide to take some of them into account in PrettyTables.jl. Thank you for your work on that package! |
Thanks @bkamins for pointing this out in JuliaData/DataFrames.jl#2330
Nice! I agree.
Thanks for pointing this out! I corrected this and will tag a new version soon:
Yes! I really liked it because type names can be very big in Julia (like
I though it would be better to print Booleans as
Those were very, very good suggestions, thanks!
Thanks :) I hope it will be useful for DataFrames.jl! |
I am OK with |
I totally agree. I just need to find out how can I do this since I create the strings using: |
|
PrettyTables.jl now uses compact printing by default, as suggested by @bkamins here: JuliaData/DataFrames.jl#2330 If the user wants the old behavior (no compact printing), then the option `compact_printing = false` can be passed to `pretty_tables`. This works in all backends.
|
Indeed, julia> df = DataFrame(:a => [ rand(4), rand(4) ] );
julia> pretty_table(df,
alignment = :l,
show_row_number = true,
hlines = [:header])
│ Row │ a │
│ │ Array{Float64,1} │
├─────┼──────────────────────────────────────────┤
│ 1 │ [0.721149, 0.503844, 0.121839, 0.203209] │
│ 2 │ [0.404156, 0.0868698, 0.9746, 0.931352] │EDIT: Sorry, I did not see your last comment. I have no opinion, feel free to open the issue wherever you want :) |
| ⋮ | ||
| ⋮ ⋮ ⋯ | ||
| │ 24 │ 10000024 │ 10000049 │ 10000074 │ | ||
| │ 25 │ 10000025 │ 10000050 │ 10000075 │""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also print an ellipsis on the last row?
|
@mbauman - do you think you would have time to try finishing this PR? Thank you! |
|
bump |
|
This PR is obsolete given we have switched to PrettyTables.jl backend and now we have: |

The current printing has frequently fooled me that my dataset is missing columns. This simple change adds a few more visual indicators that it's an incomplete printing in the place where you'd first look, and avoids the "complete"-looking
┤character. Before:After: