Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include/types: format decimal numbers with decimal factor #19117

Merged
merged 7 commits into from
Apr 15, 2018

Conversation

jan--f
Copy link
Contributor

@jan--f jan--f commented Nov 23, 2017

Until now bytes and objects were formatted using si_t which used 1024 as
the factor to pretty print large numbers. For object counts a factor of
1000 is better.
Fixes: http://tracker.ceph.com/issues/22095

Signed-off-by: Jan Fajerski jfajerski@suse.com

@jan--f
Copy link
Contributor Author

jan--f commented Nov 23, 2017

I also changed the following based on @chardan suggestion:

  • Remove c-style cast to float
  • remove // cppcheck-suppress noExplicitConstructor and mark constructors explicit

@@ -337,15 +337,46 @@ inline ostream& operator<<(ostream& out, const prettybyte_t& b)
return out << b.v << " bytes";
}

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Anonymous namespaces are generally more expressive than "static".

const int index, const uint64_t mult)
{
char buffer[32];
char u = " KMGTPE"[index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the leading space intentional? (This may deserve a comment.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace opens a new block so I indented everything in it. Not correct?

Copy link
Contributor

@chardan chardan Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been specific-- I meant the one in the string in line 345. Looking at it again, I see that you're using the index to select from the array, so it almost certainly is.

@gregsfortytwo
Copy link
Member

jenkins retest this please

@jan--f
Copy link
Contributor Author

jan--f commented Dec 12, 2017

Anything blocking this @gregsfortytwo @liewegas ?

@liewegas
Copy link
Member

Hmm, should we get pedantic here and add the 'i' (e.g. GiB instead of GB) for the 1024 units?

@liewegas
Copy link
Member

The thing that worries me about this is that it's totally unclear to the user which is being used unless they've read the source. Currently we're consistently using powers of 2 at least.

@jan--f
Copy link
Contributor Author

jan--f commented Dec 18, 2017

Hmm, should we get pedantic here and add the 'i' (e.g. GiB instead of GB) for the 1024 units?

I'd be ok with this.

The thing that worries me about this is that it's totally unclear to the user which is being used unless they've read the source. Currently we're consistently using powers of 2 at least.

Well maybe not totally unclear. This was actually reported by a user who expected bytes to use powers of 2 but counts (like object counts) to use powers of 10. Imho this is a reasonable assumption and could in any case be a reasonable convention. I'm also happy to send an email to the lists to see what people think.

@liewegas
Copy link
Member

An email to ceph-devel and ceph-users with the specific proposal would be great! Thanks

@liewegas
Copy link
Member

How about we go with two new macros that entirely replace the old si_t:

  • dec_unit_t for SI units (k, M, etc.)
  • bin_unit_t for binary units (Ki, Mi, etc.)

that way it's painfully obvious from the code which the programmer is using. (Suggestions for a more compact name welcome!)

@liewegas
Copy link
Member

We should also update src/common/strtol.cc helpers to parse Ki Mi etc suffixes (and possibly rename the functions that have "si" in the name since that's not very accurate any more)

@jan--f
Copy link
Contributor Author

jan--f commented Jan 16, 2018

Sounds all good to me. Will alter PR accordingly.

@jan--f
Copy link
Contributor Author

jan--f commented Jan 23, 2018

Ok I replaced si_t with bin_u_t and dec_u_t. I would still be up for better names.

Also I would propose to replace the pretty print types kb_t, prettybyte_t and pretty_si_t (all in include/types.h) with either bin_u_t or dec_u_t. Afaiu they all serve a similar purpose. There would be a slight change in the resulting messages though, as these last three never print fractions and the techinically incorrect units (say 12.2KiB would be printed instead of 12kB). Not sure about the consequences for anything that consumes logs though.

@jan--f
Copy link
Contributor Author

jan--f commented Jan 23, 2018

oh and I forgot about src/common/strtol.cc. Coming up...

@liewegas
Copy link
Member

aside: why is 12.2KiB incorrect? Because 2/10ths of 1024 isn't a whole number of bytes?

@liewegas
Copy link
Member

Sounds good! I don't think we need to worry too much about output.. anything that relies on stable formatting should be consuming the json/xml output, not anything rendered.

};

inline ostream& operator<<(ostream& out, const si_t& b)
inline ostream& operator<<(ostream& out, const dec_u_t& b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like dec_u and bin_u implementations are swapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thx.

uint64_t n = b.v;
uint64_t mult = 1;
int index = 0;
const char* u[] = {" ", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is this always going to be B (bytes)? I guess so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes bin_u_t should only be used for bits and bytes. I haven't encountered a place where bits are printed so adding the unit seemed reasonable. Will add comments to the structs.

@jan--f
Copy link
Contributor Author

jan--f commented Jan 24, 2018

aside: why is 12.2KiB incorrect? Because 2/10ths of 1024 isn't a whole number of bytes?

Sorry that was a little unclear. The old pretty* structs always printed say 12kB (without anything past the decimal point) whereas bin_u_t will print something like 12.2KiB if applicable. But as you said...shouldn't be a problem.

@jan--f
Copy link
Contributor Author

jan--f commented Jan 25, 2018

Regarding src/common/strtol.cc: I have renamed strict_sistrtoll to strict_iecstrtoll and added the parsing of Ki, Mi and so on. I.e. it parses now both unit prefixes with base 2. I'm aware this is inconsistent but a) I think users would not appreciate if we forced them to use Ki or Mi instead of K or M and b) I found only uses of this parsing when byte sizes where concerned.
The one exception to this is src/tools/rados/rados.cc which defines its own parsing (though based on src/common/strtol.cc) rados_sistrtoll. Here both bytes and object counts are parsed with base 2. While I think this should be fixed, it might be best to consider this a seperate issue.

Also for the pretty print structs I'm wondering if these should be named si_u_t and iec_u_t. While this techincally reflects which unit prefixes and base are used, it might lead to the mistaken use of si_t_u for bytes.

@liewegas
Copy link
Member

I'm good with si_u_t and iec_u_t. For the latter, though, since it's always bytes, maybe it should just be bytes_u_t or bytes_t?

The parsing change makes sense, though we probably want a different helper that does strict SI units that we use for non-bytes-based config options?

@jan--f
Copy link
Contributor Author

jan--f commented Jan 26, 2018

OK I added strict_sistrtoll. I'll also have a look at rados_sistrtoll. This kind of highlights that this number parsing is a bit cumbersome to use (checking err) and most all sites irgnore the actual error message. In this context I'm also looking into simplifying this code using some c++17 features. Will propose something next week (or we treat that as a different issue).

@@ -16,6 +16,7 @@

#include <climits>
#include <limits>
#include <math.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer C++ inclusion with , because it will place things into the std:: namespace.

{
std::string s(str);
if (s.empty()) {
*err = "strict_sistrtoll: value not specified";
*err = "strict_iecstrtoll: value not specified";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the function no longer matches.

Jan Fajerski added 2 commits April 13, 2018 18:07
Signed-off-by: Jan Fajerski <jfajerski@suse.com>
As the option represents a byte count, TYPE_SIZE is appropriate and the
correct IEC unit prefixes will be parsed.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
@tchaikov tchaikov merged commit d4186fb into ceph:master Apr 15, 2018
@tchaikov tchaikov self-requested a review April 15, 2018 14:50
errStr << "The option value '" << str << "' contains invalid digits";
*err = errStr.str();
return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is broken for string of hex number

@ukernel
Copy link
Contributor

ukernel commented Apr 19, 2018

commit "include/types: format decimal numbers with decimal factor" breaks parsing of hex number string

@tchaikov
Copy link
Contributor

#21521

@dillaman
Copy link

Note: this PR broke RBD test cases (and upgrade:luminous-x test cases in the luminous branch).

@dillaman
Copy link

... for future reference, if a PR touches something in RBD-land, it really should undergo an RBD suite run

@dillaman
Copy link

Addressing failing RBD test cases under PR #21564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants