-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integer overflow (optional) detection #75
Conversation
Codecov Report
@@ Coverage Diff @@
## master #75 +/- ##
==========================================
+ Coverage 93.96% 94.06% +0.10%
==========================================
Files 53 53
Lines 11449 11359 -90
==========================================
- Hits 10758 10685 -73
+ Misses 691 674 -17
Continue to review full report at Codecov.
|
Thanks for the PR, @fargies . It is very solid, and special thanks for providing also the benchmark plots (by the way, what did you use to make those plots?). Before we go into details, I'm curious to what kind of situation is prompting you to have to detect overflow. I guess this is needed from using rapidyaml to parse data that may overflow. Can you show me example input and the code you are using to get the data to the integers? I ask this because I'm thinking there may be a more high-level way to detect overflow without having to pay for it in the low-level code. To be clear, I really like the previous situation where the overflow silently wraps around just as a native integer does, but I do concede that there should/could be a way to detect the overflow. But maybe that way can be a separate one, ie functions different than I think the conditional macro may be acceptable as a last resort, but if both approaches can be present simultaneously then I will prefer it over the conditional macro. For example, we could maybe have an So again, can you show me the high level code where you're converting to integer? |
I've used this project : https://github.com/gaujay/jomt.git (first time I use it, quite nice and easy)
We don't use RapidYaml yet, I'm still preparing and bundling (testing with compiler in our production env as you noticed), but in the long term we plan to use it to load config and settings but also to encode payload in RPC and REST apis. We are controlling devices and heavy machinery, a simple undetected parameter change can have a very large impact and would definitely be on our back (user provided input may be incorrect but we did not detect and prevented its side-effects). I do get that it's nice to have a full-speed library that copes well with errors, still in some cases being a bit more strict and safe is preferred regarding performance. Note: if you wish to go in that direction, shouldn't all serialization errors be silently discarded (setting a default value) ? as an example this code would throw (bar doesn't exist) : int i;
ryml::parse_in_arena("foo: 1").rootref()["bar"] >> i;
Understood and agreed, we can eventually re-use Then Eventually provide a template method on If you wish to go further (maybe not in this PR), we could add some feature flags to as an example turn off/on Note: another of my concerns is that error handling callbacks are shared for the entire lib, and don't have much info about where the error came from (parent ref, parser-error vs data-extraction error) so we can't really play with this mechanism (enabling errors on parse and disabling on data-extraction is a no go on multi-threaded apps), we've decided to throw on error (abort in a server app when user sends corrupted data is not a good solution), that would be awesome improve this giving more flexibility.
Quick example, I have a device where it's max speed is driven in mm/min and stored in an uint16_t, considering it fits our driver hardware limits, user provides a large number (> 65535 still makes sense physically speaking), everything goes without any warning (current library doesn't offer a way to detect that such thing happened), only once moving the device we may notice that something really bad happened. Json (and by extension Yaml) integer format is quite flexible, so even if we were to extract all binaries in |
Quick one to address this:
You definitely can have per-tree callbacks, different from the global library-scope callbacks. Further, you can also have per-phase callbacks (ie parsing vs data-extraction), by instantiating (prior to parsing) a parser and a tree with different callbacks, and then providing the existing tree to the appropriate method in the existing parser. This second aspect I've not explored much, so if something is not working as I described, then that's definitely a rapidyaml bug which should be addressed; further, if there are indeed problems, they will be very easy to fix by simply ensuring the proper callbacks object is used in the parser code (the data-extraction logic happens in the tree/noderef and is by definition calling into the right callbacks object). HTH. |
No contest. Again, I want to make it clear that I have no problem with having overflow-detecting code; in fact I'd like to have it. It's just that I'd prefer to keep the lean So I'm envisioning something like this (for example, with template<class T> bool atoi(csubstr s, T *var); // the existing function
// ... and then either this:
template<class T> bool atoi_rc(csubstr s, T *var) // range-checked version
{
if(overflows<T>(s))
return false;
return atoi(s, var);
}
// ... or this:
template<class T> bool atoi_rc(csubstr s, T *var) // range-checked version
{
// ... your code
} The first version will have a throughput approximately half of the original So one of the above would be the first step. Then, the second step would be providing ryml::Tree t;
uint16_t a, b;
t["foo"] >> a; // silent overflow
t["foo"] >> ryml::fmt::range_checked(b); // error on overflow, explicit opt-in The explicit opt-in approach would be in keeping with the very correct philosophy that you don't need to pay for what you don't want to use. The tag type implementation would be something like this in namespace fmt {
template<class T>
struct range_checked_
{
static_assert(std::is_integral<T>::value, "range checking only for integral types");
C4_ALWAYS_INLINE range_checked_(T &val_) : val(&val_) {}
T *val;
};
template<class T>
C4_ALWAYS_INLINE range_checked_<T> range_checked(T &val)
{
return range_checked_<T>(val);
}
} // namespace fmt
template<class T>
C4_ALWAYS_INLINE bool from_chars(csubstr s, fmt::range_checked_<T> wrapper)
{
return atox_rc(s, wrapper.val);
}
// or we could use SFINAE to bypass `atox_rc()` and
// call directly `atoi_rc(), atou_rc(), atof_rc()`, etc. |
This sounds like a plan, I'm not sure about the throughput load, it'll depend on the implementation. I think a simple I'll work on that as soon as soon as I can :) |
@fargies exactly, this Just add the commit to the PR or reset the branch (whatever you choose) and I'll add minor reviews inline in the PR. |
Hello 👋
Here are some modifications to add a slightly stricter mode on integer parsing (+ some fixes and minor improvements).
gcc-4.8.hpp
amalgamate (actually started to use it)uint8_t
and comparing with only< '9'
provides a small boost (also applied on oct, similar thing could be done on hex)0xK0
would not touch input when0x0K
would)Note: in graphs version without suffix is before modification, _2 is with feature disabled, _3 is with feature enabled, complete benchmarks results are in the tar.gz archive.
benchmark.tar.gz