-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improper use of IEC and SI prefixes for size in size_trunc()/size_trunc_len() #3666
Comments
GitHub branch: https://github.com/michael-o/mc/tree/ticket-3666 |
|
Tested on:
and
with these locales:
Unfortunately, I wasn't able to run autogen.sh on Ubuntu 14.04 LTS due to:
|
Hi michael-o,
Could you please clarify what user facing changes this patch introduces, what user facing bugs it fixes? Also, how come that the "%s byte" and "%s bytes" strings disappear?
What will happen to the "Size" column, which is 7 characters by default? Is it the plan to use pedantic Ki, Mi whatever prefixes there, "wasting" 2 out of the 7 characters, leaving only 5 for the actual number (and hence reduce by a factor of 10 the boundaries before switching to a larger unit)?
Being sloppy rather than pedantic by not using the proper "Ki", "Mi" etc. is not nice, but perhaps justifiable by limited screen real estate. Not being consistent within mc is IMO much worse.
I'm not against modifying mc and making it more technically correct, I'd just like to make sure we're going to do it consistently, and understand and accept the downsides of the changes if any.
Thanks! |
Hi egmont,
hopefully you have read the GitHub issue first? I'd be happy to reply to you:
This is the first patch in a series of patches.
Replying to egmont:
Currently, this patch fixes only the total size calculation for files and directories as well as the directory popup when it is being recursively traversed for size calculation (.e.g, /usr which is usally large).
"%s byte" and so forth has been removed because it is redundant now, properly scaled units are done in size_trunc() already, thus this msg id is not necessary. Additionally, a few translations even had duplicate words in it (German and others). (See the applied diff)
Currently, nothing will happen to "size", "bsize" and alike. This is going to be addressed in a separate ticket. I did not want to introduce a huge change with a single ticket which cannot be easily reviewed. My plan is to fix size calculation which can properly scale between 0 and 9999 <scaled unit>, consuming at most 8 chars.
I am aware of the limited size but accoring to the explanation above, size and bsize will only increase by 1 char if scaling will work as expected. Consisitently is key for me. As I already layed out. This is a separte ticket.
I fully agree and do not have an opposite opinion.
The first reason why I did not patch everything is because I do not waste time for a contribution where people might completely ignore or refuse to merge it. This has happened too often. As long as devs are at least willing to review and comment, I am happy to provide patches.
If you are uncertain, you can branch off master as 'size-display' (umbrella ticket necessary?), apply all of my patches there, see the result and merge it as a whole as soon as work has been completed.
Is that acceptable for you? |
Branch: 3666_iec_si_prefixes
Prefixes should be translatable. In the 2nd (temporary) commit of this branch I made that. All po updates are in the separate (last) commit. |
Replying to andrew_b:
Great thank you. The translation is one of the ideas I already had in mind. I will check that because at least two languages do not use the term byte but octet (fr, ro). Reference, last paragraph.
I will continue with further changes. |
|
Here is another patch for missing bits in Andrey's fixups. I have obstained to translate Bulgarian and Mongolian because I don't know wether they retain the Latin script or have Cyrillic terms. Other Cyrillic-based langs have been fixed, as well as octet (o) users. |
Replying to michael-o:
There is no need to fix *.po here. Translations are made via Transifex by native-speaking translators. |
Replying to andrew_b:
So we simply need to wait for them? |
Replying to michael-o:
Yes. |
Replying to andrew_b:
Fine, I can ignore my commit and will continue on the rest. Thanks for the clarification. |
Let's take a note about a UI change here: what used to be "12,345 bytes in 67 files" on the UI will from now on be "12,345 B in 67 files". Correct?
Especially if we save a few characters this way, could we write out the exact number if it fits? E.g. instead of "12,345,678 KiB" why not say e.g. "12,641,974,273 B"?
Since you said 9999 is the upper limit, I assume you're planning to properly continue it with a space and then the entire unit. E.g. what used to be "1234567" until now will from now on be "1234 kB" or "1205 KiB" (depending on the preferences), am I following you correctly?
How about small files, do you plan to add the " B" suffix there?
Until now, the first suffix appeared at the file size of around 10 megs (that is: "9999999" was displayed as-is, the next value became "9766K"). Except for this first change, all subsequent changes occurred around multiples of 1000 (or 1024), e.g. roughly at 1 gigabyte the display format changed from "K" to "M", and presumably roughly at 1 terabyte it changes from "M" to "G".
If you allow numbers up to 9999, you modify these boundaries to be roughly at 10 megs, 10 gigs etc. E.g. 9 megs would be something like "9000 KiB" whereas 11 megs would be "11 MiB". This I believe makes it much harder to subconsciously "feel" the file size magnitude by looking at it.
I'm wondering if maybe the right approach would be to allow numbers only up to 999 or 1023, showing a bit of decimal fraction digits. This way the suffix would directly give you the feeling about the file size's magnitude. E.g. instead of showing "1234567" or "1205 KiB", we could show "1.17 MiB". What do you think? (I'm not sure this would be the best, just wondering...)
Increase by 1 char, resulting in the filename shrinking from 16 to 15 chars at the default terminal width of 80 columns; and you still lose two digits of information. In turn, a lot of screen real estate would be wasted for those spaces as well as "i" and "B" letters.
Yup, overall, if I understand your proposal correctly, we'll lose 3 character columns per panel. One for an aestethical space, one for a pedantic "i", and one for that "B" carrying no information whatsoever. I'm not happy, even though I'm not sure what would be a better solution. At least I'd like to have some discussion/brainstorming about it.
For starter, how about leaving everything as-is now, except for fixing the upper/lowercase k/m/g letters (if they're broken at all), and adding an "i" to the header if IEC is selected? This one also obviously has downsides (around the translation of that "Size(i)" label) but let's see if this idea can be continued...
By the way, note how much my proposal in ticket #3087 got frowned upon... |
Replying to egmont:
Correct. I do also plan to create a ticket which drops the comma two simple reasons: it is not locale aware and completely confuses people which do not use en_* locales like I do here in German or Russian (which I am a native).
Of course we could, the people do not want to see huge numbers, since humans are bad in reading a lot of digits. Its like reading an IBAN with several consecutive zeroes.
Almost, yes. In the first place, I do not want to fiddle with the scaling logic as in size_trunc_len(). This is too much change in one patch. The logic mostly relies on the len input limited by the buffer which produces for the same size different outputs in different spots, like size column or info panel. This confuses me too.
Yes, it clearly denotes the unit. Suffixes w/o units aren't valid as per definition.
Personally, I completely agree with you. I do dislike the scaling logic which is implemented right now because it scales way too late and shows huge numbers like 120345 kB. This is impossible to read. But as I have said before, I want to fix the units first and then take care of the scaling logic. Though, I won't change it without your consent.
I have applied a similar approach a couple of months ago to a formatter written for Apache Maven, displaying file sizes and download speed. See nested class FileSizeFormat along with test cases.
That is correct. A sacrifice for technical correctness. Consider that proper GUIs/operating systems do use only SI prefixes for storage sizes, Gnome, OS X, etc. Microsoft is too stupid for that. Ultimately, people do not really understand why 1 GB is not 1 000 000 000 B.
In the first place, you will lose 3 chars but add proper, unambigious information. There is no 'aestethical space' because this is always mandatory between number and unit. "B" IS the unit. k, M, G aren't units. In fact, K is Kelvin. This is even more confusing.
You would lose at most 1 char, if you would apply a better scaling logic which I have proposed a few paragraphs above.
I am perfectly fine with a discussion where we can retain space efficiency and proper readability.
From my point of view, size_trunc() and size_trunc_len() can be collapsed into one method with less logic and code. Especially size_trunc_len() performs operations for two different concerns which it shouldn't. During my review, I found several other spots where no/fixed scale is applied and only base 2 is used, regardless of the settings.
I read the entire conversation and it was quite daunting. I follow your point and if I check the shortcuts with my German keyboard, I have absolute no idea what M-h means and how I can even enter some of them because the key combination is impossible without a n-rollover keyboard. |
Replying to michael-o:
[long format]
These two seem to contradict each other. I agree that reading a large unstructured number is very hard. So why make it even harder by dropping the separator? I guess it should be fixed instead: make the thousands-separator locale-dependent.
An unstructured number has IMO two advantages over the one with thousands separators: easy copy-pasting (let's say to "bc"), as well as iTerm 3 beta and gnome-terminal 3.20 showing useful information about the number on right click (gnome-terminal ticket with screenshot: https://bugzilla.gnome.org/show_bug.cgi?id=741728). Despite these two, I'd rather go for mc structuring these numbers properly though.
Another idea to explore might be to use different color/boldness/underlying for every 3 characters, e.g. "1234567890 bytes" or some other visual differentiation.
Or, similarly to the gnome-terminal feature, mc could always show the entire number as well as a Ki / Mi / Gi converted value, e.g. "1234567890 (~1.14 GiB)". (Is there enough room for this? Probably not always.)
[short format]
I have a strong guts feeling that changing the current format to having those " B", " KiB" etc. suffixes and in turn much shorter room for the actual number will be something that many people will strongly dislike, and would prefer to revert to the old display format (which is technically less correct, but shows more information on the available limited space).
At the very least I strongly recommend that whatever you'll be doing with the Size field should have a config option (or a new field type in User-defined Listing mode) to revert to the old behavior. I'd also be happy to hear others' opinion here :) |
|
Replying to egmont:
I agree with copy and paste. This is indeed an issue. Regarding your Gnome issue, GNU ls supports 'ls --si' and you are done. Your screenshot has prefixes only and misses the unit.
I would show the full byte size as well as the scaled one in the info/detail panel only.
An option is always possible and I am not rejecting it. We should though by default use the proper format and have a fallback option.
I have attached a patch (scaling.patch) for scaling for size_trunc(). Apply locally and look how it works. It makes magnitudes instantly visible. This is actually what I would ultimately propose. I haven't yet touched size_trunc_len() but will if you want to see how it might look like at the end. |
|
Replying to michael-o:
That screenshot is just one example, the point was to make it work with any command, not just ls. Unit is missing because gnome-terminal cannot possibly know what the unit of the number is, it only recognizes the number itself (it would be incorrect to assume bytes).
Note, however, how 'ls --si' also happily omits the space and the letters "i" and "B" to save space, even though (unlike mc) it has as much space as it wants to (it can wrap a file's entry into the next terminal row, no problem). I guess I'd rather see mc taking this direction too, rather than being fully pedantic. |
|
Egmont, I have created two more patches (0001-size_trunc-autoscale-size-with-proper-prefixes.patch and 0002-size_trunc_len-drastically-simplify-function-by-appl.patch) which autoscale the size up to gigabytes/gibibytes (TB and TiB can be added later). All static buffer which are too small have been increased. You only need at most 9 chars including NULL. Note that for non-Latin languges more than 9 bytes are necessary (UTF-8 multibyte). This has been added to the small buffers as well as for the size and bsize columns. I cannot tell about the big ones whether they might overflow or not. From my testing, it does not. This is before the patch:
"size"/"bsize" grow by one char only.
What is left:
If you intend to have the "classic" style, we could probably add this option:
Consider that buffers would still need to retain as big as necessary for "modern" style unless you have a better idea how to accomondate both.
WDYT? |
|
|
|
0001-size_trunc-autoscale-size-with-proper-prefixes.patch: I think we shouldn't lose the high precision in display_total_marked_size(). This is single place where such precision is shown unlike other places where scaling is applied. |
Replying to andrew_b:
Interesting point, though I do not understand why. The accumulated byte value can get really large with 9, 10 or more places, barely readable. I would expect to see the raw value *and* the scaled value in the info panel: 102 MB (102124456999 B, 199461830 blocks).
Can you explain why the precision is necessary in the total marked size? |
Replying to michael-o:
And why not? We already have it for many years. Why we should lose it? |
I second andrew_b's opinion here. There might be use cases we don't even think of. Doing arithmetics with that number outside of mc, using as a basic "checksum" to check whether the size is what's expected, etc... There's room to show that information, so let's do it. The info panel is IMO cumbersome to get to. |
Replying to egmont:
I see. So you would prefer to see "xxx MB (yyy B) in n directories"? |
michael-o: and...? I mean, really... |
andrew_b, mooffie, egmont: I suggest the following - I will make an rc this weekend, without this branch, and we will see if we merge it after the release. The skins are now committed, quite some stuff has accumulated in the NEWS file and last release was in fall. I think we should really make one now. What do you think? |
Replying to zaytsev:
Please go ahead with your RC and don't wait for me! Other annoying issues at work held me off doing a verification of the branch. |
Replying to zaytsev:
I agree. This one's a controversial change, and hasn't quite been tested in git master. Let's not commit it last minute before a release, let's stay safe instead (and commit soon after a release, if it's believed to be production ready). |
|
michael-o, are you still alive? |
Replying to zaytsev:
Partially, fighting with angina and 38° fever... |
Ping. |
No reaction from submitter... |
No reaction from submitter... |
|
|
|
|
|
|
|
Important
This issue was migrated from Trac:
michael-o
(@michael-o)egmont
(@egmontkob),zaytsev
(@zyv),mooffie
(@mooffie),info@….net
(@metux)This is a spinoff from https://github.com/MidnightCommander/mc/issues/109
Looking at lib/util.c, I see that virtually all SI prefixes are plain wrong. Additionally, man page talks about SI units, there aren't any. Only SI prefixes are incorrectly used. Bases arent 1024 and 1000, they are 2 and 10.
The best solution is to provide IEC prefixes (binary, KiB, MiB, GiB) and SI prefixes (decimal, kB, MB, GB), though file sizes (opposed to memory) are always calculated with base 10.
Patch is in preparation for this function and its callers.
Note
Original attachments:
michael-o
(@michael-o) onJul 27, 2016 at 15:34 UTC
michael-o
(@michael-o) onJul 29, 2016 at 8:16 UTC
michael-o
(@michael-o) onJul 30, 2016 at 13:31 UTC
michael-o
(@michael-o) onJul 30, 2016 at 13:38 UTC
michael-o
(@michael-o) onJul 30, 2016 at 22:02 UTC
michael-o
(@michael-o) onJul 30, 2016 at 23:15 UTC
michael-o
(@michael-o) onJul 30, 2016 at 23:15 UTC
michael-o
(@michael-o) onAug 25, 2016 at 18:00 UTC
michael-o
(@michael-o) onSep 3, 2016 at 10:56 UTC
michael-o
(@michael-o) onSep 4, 2016 at 20:16 UTC
michael-o
(@michael-o) onNov 1, 2016 at 11:59 UTC
The text was updated successfully, but these errors were encountered: