include/types: format decimal numbers with decimal factor #19117

jan--f · 2017-11-23T11:48:24Z

Until now bytes and objects were formatted using si_t which used 1024 as
the factor to pretty print large numbers. For object counts a factor of
1000 is better.
Fixes: http://tracker.ceph.com/issues/22095

Signed-off-by: Jan Fajerski jfajerski@suse.com

jan--f · 2017-11-23T11:50:24Z

I also changed the following based on @chardan suggestion:

Remove c-style cast to float
remove // cppcheck-suppress noExplicitConstructor and mark constructors explicit

chardan · 2017-11-25T16:20:02Z

src/include/types.h

@@ -337,15 +337,46 @@ inline ostream& operator<<(ostream& out, const prettybyte_t& b)
  return out << b.v << " bytes";
 }

+namespace {


Yay! Anonymous namespaces are generally more expressive than "static".

chardan · 2017-11-25T16:20:48Z

src/include/types.h

+      const int index, const uint64_t mult)
+  {
+    char buffer[32];
+    char u = " KMGTPE"[index];


Is the leading space intentional? (This may deserve a comment.)

The namespace opens a new block so I indented everything in it. Not correct?

Sorry, I should have been specific-- I meant the one in the string in line 345. Looking at it again, I see that you're using the index to select from the array, so it almost certainly is.

gregsfortytwo · 2017-11-28T23:11:28Z

jenkins retest this please

jan--f · 2017-12-12T14:55:52Z

Anything blocking this @gregsfortytwo @liewegas ?

liewegas · 2017-12-12T15:00:11Z

Hmm, should we get pedantic here and add the 'i' (e.g. GiB instead of GB) for the 1024 units?

liewegas · 2017-12-12T15:00:47Z

The thing that worries me about this is that it's totally unclear to the user which is being used unless they've read the source. Currently we're consistently using powers of 2 at least.

jan--f · 2017-12-18T08:40:34Z

Hmm, should we get pedantic here and add the 'i' (e.g. GiB instead of GB) for the 1024 units?

I'd be ok with this.

The thing that worries me about this is that it's totally unclear to the user which is being used unless they've read the source. Currently we're consistently using powers of 2 at least.

Well maybe not totally unclear. This was actually reported by a user who expected bytes to use powers of 2 but counts (like object counts) to use powers of 10. Imho this is a reasonable assumption and could in any case be a reasonable convention. I'm also happy to send an email to the lists to see what people think.

liewegas · 2017-12-18T14:29:21Z

An email to ceph-devel and ceph-users with the specific proposal would be great! Thanks

liewegas · 2018-01-11T23:40:29Z

How about we go with two new macros that entirely replace the old si_t:

dec_unit_t for SI units (k, M, etc.)
bin_unit_t for binary units (Ki, Mi, etc.)

that way it's painfully obvious from the code which the programmer is using. (Suggestions for a more compact name welcome!)

liewegas · 2018-01-11T23:42:12Z

We should also update src/common/strtol.cc helpers to parse Ki Mi etc suffixes (and possibly rename the functions that have "si" in the name since that's not very accurate any more)

jan--f · 2018-01-16T12:05:41Z

Sounds all good to me. Will alter PR accordingly.

jan--f · 2018-01-23T15:36:07Z

Ok I replaced si_t with bin_u_t and dec_u_t. I would still be up for better names.

Also I would propose to replace the pretty print types kb_t, prettybyte_t and pretty_si_t (all in include/types.h) with either bin_u_t or dec_u_t. Afaiu they all serve a similar purpose. There would be a slight change in the resulting messages though, as these last three never print fractions and the techinically incorrect units (say 12.2KiB would be printed instead of 12kB). Not sure about the consequences for anything that consumes logs though.

jan--f · 2018-01-23T15:57:41Z

oh and I forgot about src/common/strtol.cc. Coming up...

liewegas · 2018-01-24T00:25:35Z

aside: why is 12.2KiB incorrect? Because 2/10ths of 1024 isn't a whole number of bytes?

liewegas · 2018-01-24T00:27:59Z

Sounds good! I don't think we need to worry too much about output.. anything that relies on stable formatting should be consuming the json/xml output, not anything rendered.

liewegas · 2018-01-24T02:53:01Z

src/include/types.h

 };

-inline ostream& operator<<(ostream& out, const si_t& b)
+inline ostream& operator<<(ostream& out, const dec_u_t& b)


looks like dec_u and bin_u implementations are swapped

liewegas · 2018-01-24T02:54:27Z

src/include/types.h

+  uint64_t n = b.v;
+  uint64_t mult = 1;
+  int index = 0;
+  const char* u[] = {" ", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB"};


hmm, is this always going to be B (bytes)? I guess so?

yes bin_u_t should only be used for bits and bytes. I haven't encountered a place where bits are printed so adding the unit seemed reasonable. Will add comments to the structs.

jan--f · 2018-01-24T09:46:23Z

aside: why is 12.2KiB incorrect? Because 2/10ths of 1024 isn't a whole number of bytes?

Sorry that was a little unclear. The old pretty* structs always printed say 12kB (without anything past the decimal point) whereas bin_u_t will print something like 12.2KiB if applicable. But as you said...shouldn't be a problem.

jan--f · 2018-01-25T11:59:27Z

Regarding src/common/strtol.cc: I have renamed strict_sistrtoll to strict_iecstrtoll and added the parsing of Ki, Mi and so on. I.e. it parses now both unit prefixes with base 2. I'm aware this is inconsistent but a) I think users would not appreciate if we forced them to use Ki or Mi instead of K or M and b) I found only uses of this parsing when byte sizes where concerned.
The one exception to this is src/tools/rados/rados.cc which defines its own parsing (though based on src/common/strtol.cc) rados_sistrtoll. Here both bytes and object counts are parsed with base 2. While I think this should be fixed, it might be best to consider this a seperate issue.

Also for the pretty print structs I'm wondering if these should be named si_u_t and iec_u_t. While this techincally reflects which unit prefixes and base are used, it might lead to the mistaken use of si_t_u for bytes.

liewegas · 2018-01-25T22:10:59Z

I'm good with si_u_t and iec_u_t. For the latter, though, since it's always bytes, maybe it should just be bytes_u_t or bytes_t?

The parsing change makes sense, though we probably want a different helper that does strict SI units that we use for non-bytes-based config options?

jan--f · 2018-01-26T14:40:07Z

OK I added strict_sistrtoll. I'll also have a look at rados_sistrtoll. This kind of highlights that this number parsing is a bit cumbersome to use (checking err) and most all sites irgnore the actual error message. In this context I'm also looking into simplifying this code using some c++17 features. Will propose something next week (or we treat that as a different issue).

chardan · 2018-01-26T15:07:31Z

src/common/strtol.cc

@@ -16,6 +16,7 @@

 #include <climits>
 #include <limits>
+#include <math.h>


Prefer C++ inclusion with , because it will place things into the std:: namespace.

chardan · 2018-01-26T15:08:08Z

src/common/strtol.cc

 {
  std::string s(str);
  if (s.empty()) {
-    *err = "strict_sistrtoll: value not specified";
+    *err = "strict_iecstrtoll: value not specified";


The name of the function no longer matches.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

As the option represents a byte count, TYPE_SIZE is appropriate and the correct IEC unit prefixes will be parsed. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

tchaikov · 2018-04-15T14:49:23Z

the failure is addressed by #21433

ukernel · 2018-04-19T10:04:47Z

src/common/strtol.cc

+    errStr << "The option value '" << str << "' contains invalid digits";
+    *err =  errStr.str();
+    return 0;
+  }


this code is broken for string of hex number

ukernel · 2018-04-19T10:07:47Z

commit "include/types: format decimal numbers with decimal factor" breaks parsing of hex number string

tchaikov · 2018-04-19T11:23:43Z

#21521

dillaman · 2018-04-19T13:25:38Z

Note: this PR broke RBD test cases (and upgrade:luminous-x test cases in the luminous branch).

dillaman · 2018-04-19T13:26:51Z

... for future reference, if a PR touches something in RBD-land, it really should undergo an RBD suite run

dillaman · 2018-04-20T15:40:53Z

Addressing failing RBD test cases under PR #21564

liewegas added the common label Nov 23, 2017

chardan reviewed Nov 25, 2017

View reviewed changes

jan--f force-pushed the jan-object-counts-decimal branch from 3caf7b4 to 97fb318 Compare January 23, 2018 15:22

jan--f force-pushed the jan-object-counts-decimal branch from 97fb318 to cf96b41 Compare January 23, 2018 15:47

liewegas reviewed Jan 24, 2018

View reviewed changes

jan--f force-pushed the jan-object-counts-decimal branch from cf96b41 to eb6962a Compare January 24, 2018 20:34

jan--f force-pushed the jan-object-counts-decimal branch from 5a669ed to 2fd3850 Compare January 26, 2018 14:37

chardan reviewed Jan 26, 2018

View reviewed changes

jan--f force-pushed the jan-object-counts-decimal branch from 2fd3850 to 059b89e Compare January 26, 2018 15:23

jan--f force-pushed the jan-object-counts-decimal branch from 24f909d to cd566d0 Compare April 13, 2018 07:48

Jan Fajerski added 2 commits April 13, 2018 18:07

qa/workunits/cephtool/test.sh: fix SI unit test, add IEC unit test

61504f1

Signed-off-by: Jan Fajerski <jfajerski@suse.com>

common/options: change mon_data_size_warn type to TYPE_SIZE

f931942

As the option represents a byte count, TYPE_SIZE is appropriate and the correct IEC unit prefixes will be parsed. Signed-off-by: Jan Fajerski <jfajerski@suse.com>

jan--f force-pushed the jan-object-counts-decimal branch from cd566d0 to f931942 Compare April 13, 2018 16:08

jan--f assigned tchaikov Apr 13, 2018

jan--f added the needs-qa label Apr 13, 2018

tchaikov added the wip-kefu-testing label Apr 14, 2018

tchaikov merged commit d4186fb into ceph:master Apr 15, 2018

tchaikov self-requested a review April 15, 2018 14:50

ukernel reviewed Apr 19, 2018

View reviewed changes

include/types: format decimal numbers with decimal factor #19117

include/types: format decimal numbers with decimal factor #19117

Conversation

jan--f commented Nov 23, 2017

jan--f commented Nov 23, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chardan Nov 29, 2017 • edited Loading

Choose a reason for hiding this comment

gregsfortytwo commented Nov 28, 2017

jan--f commented Dec 12, 2017

liewegas commented Dec 12, 2017

liewegas commented Dec 12, 2017

jan--f commented Dec 18, 2017 • edited Loading

liewegas commented Dec 18, 2017

liewegas commented Jan 11, 2018

liewegas commented Jan 11, 2018

jan--f commented Jan 16, 2018

jan--f commented Jan 23, 2018 • edited Loading

jan--f commented Jan 23, 2018

liewegas commented Jan 24, 2018

liewegas commented Jan 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan--f commented Jan 24, 2018 • edited Loading

jan--f commented Jan 25, 2018

liewegas commented Jan 25, 2018

jan--f commented Jan 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaikov commented Apr 15, 2018

Choose a reason for hiding this comment

ukernel commented Apr 19, 2018

tchaikov commented Apr 19, 2018

dillaman commented Apr 19, 2018

dillaman commented Apr 19, 2018

dillaman commented Apr 20, 2018

chardan Nov 29, 2017 •

edited

Loading

jan--f commented Dec 18, 2017 •

edited

Loading

jan--f commented Jan 23, 2018 •

edited

Loading

jan--f commented Jan 24, 2018 •

edited

Loading

jan--f commented Jan 26, 2018 •

edited

Loading