api: OIIO_CONTRACT_ASSERT and other hardening improvements#5006
api: OIIO_CONTRACT_ASSERT and other hardening improvements#5006lgritz merged 3 commits intoAcademySoftwareFoundation:mainfrom
Conversation
|
Any comments on this? |
jessey-git
left a comment
There was a problem hiding this comment.
That 20% sounds pretty scary actually and I don't always want to hide behind the disk-IO time monster. All CPU cycles are important because OIIO is used for more than just offline/overnight jobs. Sometimes there's an artist staring at a progress bar or doing a viewport playblast that will appreciate faster pixel processing.
I can't do so tonight but I'd want to check some other scenarios of interest to see how they're affected (if at all). It might turn out we should, in the same release as this hardening, also change some important ImageBufAlgo's to use range-fors if possible or similar if they're impacted enough.
src/include/OpenImageIO/dassert.h
Outdated
|
|
||
| // OIIO_ASSERTION_RESPONSE_DEFAULT defines the default response to failed | ||
| // contract assertions. By default, in NONE hardening mode and in release | ||
| // builds, we do nothing. In all other cases, we abort. But any translation |
There was a problem hiding this comment.
This comment says that by default we'll do nothing, but it looks like we will actually enforce. Which is correct?
There was a problem hiding this comment.
Right, I changed, now by default we enforce. I will update the comment.
| // contract assertions. By default, in NONE hardening mode and in release | ||
| // builds, we do nothing. In all other cases, we abort. But any translation | ||
| // unit (including clients of OIIO) may override this by defining | ||
| // OIIO_ASSERTION_RESPONSE_DEFAULT before including any OIIO headers. But note |
There was a problem hiding this comment.
I'd probably remove this part of the comment since attempting to set the define like this will most likely lead to a lot of surprises. It'll give a false sense of accomplishment and wouldn't be a complete workaround. Probably best to instead update the build documentation with the correct way to set this for folks building on their own.
There was a problem hiding this comment.
I'd probably remove this part of the comment since attempting to set the define like this will most likely lead to a lot of surprises.
Well, what I'm trying to account for here is that these are public headers that might be used by other projects or apps. There's "the behavior of OIIO internals, determined at build time", and there's also "how MY software behaves when I'm using OIIO's utility classes."
src/libutil/span_test.cpp
Outdated
| bench("span operator[]", [&]() { | ||
| int t = 0; | ||
| for (size_t i = 0; i < f.size(); ++i) | ||
| DoNotOptimize(t += f[i]); |
There was a problem hiding this comment.
These are testing std::array rather than our span it looks like.
There was a problem hiding this comment.
ah, will fix and re-time
The 20% is just on the raw array access of the cheapest possible thing (int/float). In the context of doing anything with those values, it will be less. If you are worried about every last cycle, you can build with it turned off. Or argue that the default should be turned off, which I can easily be convinced is reasonable. But there is an important assumption that underlies all of this that I want to re-emphasize: New versions of gcc/libstdc++ and clang/libc++ already have these checks in operator[] of std::array, std::span, std::vector, and as many other places as they can jam them, and the default is to enforce it (and I believe C++26 more or less mandates it?). Whatever complaints we have about perf are going to be found virtually everywhere, unless care is made to turn them off for the standard library as well. So I'm mostly going on the assumption that we're not doing anything worse than C++ std classes moving forward, and also that there will be pressure on compilers to make stronger inferences about those common constructs in loops and whatnot to be even more effective at optimizing them away entirely when it's clear from the loop bounds/etc that it will never violate the contract.
I'm very much in favor of finding places whose performance is impacted and refactoring them in a way that's guaranteed "safe by design" (in the way that range-for is) rather than empirically range checking for every access. It was always my intention to follow up with that as any problems are discovered. One possible approach is to change the default from 'enforce' to 'none' for now, merge this (i.e. it only does enforcement if you build it that way), and benchmark various things, fixing as I go. When we are more confident that it doesn't meaningfully hurt anything we care about, then bump the default back to 'enforce'. |
Review: we have long had two assertion macros: OIIO_ASSERT which
aborts upon failure in Debug builds and prints but continues in
Release builds, and OIIO_DASSERT which aborts in Debug builds and is
completely inactive for Relase builds.
Inspired by C++26 contracts, and increasingly available "hardening
modes" in major compilers (especially with the LLVM/clang project's
libc++), I'm introducing some new verification helpers.
New macro `OIIO_CONTRACT_ASSERT` more closely mimics C++26
contract_assert in many ways, and perhaps will simply wrap C++
contract_assert when C++26 is on our menu.
Important ways that OIIO_CONTRACT_ASSERT differs from OIIO_ASSERT and
OIIO_DASSERT in a few ways, described below.
* Keeping in line with C++ contracts, there are 4 possible responses
to a failed contract assertion: Ignore, Observe (print only),
Enforce (print and abort) and Quick-Enforce (just abort).
* By default, the contract failure response is Ignore for release
builds and Enforce for debug builds. But it's overrideable
(independent of Release/Debug, and optionally on a
per-translation-unit basis) by setting
OIIO_ASSERTION_RESPONSE_DEFAULT before any OIIO headers are
included.
* Also define hardening levels: None, Fast, Extensive, and Debug,
mimicking the levels of libc++. The idea is that maybe there will
be some CONTRACT_ASSERT checks you only want to do for certain
hardening levels.
* Macros for explicit hardening levels: OIIO_HARDENING_ASSERT_FAST(),
EXTENSIVE(), and DEBUG(), which call CONTRACT_ASSERT only when the
hardening level is what's required or stricter.
I also changed the bounds checking in operator[] of string_view, span,
and image_span to use the contract assertions. Note that this adds a
little bit of overhead, since the default is "enforce" for release
builds. I added some benchmarking that proves that the bounds check
adds only about 20% overhead to an element access for a trivial
`span<float>`.
For more complex things, or code that does more than just repeatedly
access elements with bounds checks, I expect this overhead to be
negligible. Since libc++ and upcoming C++ standards do the same for
most container types, I expect the compilers to get better and better
at eliding these checks when they can determine that it's an in-bounds
access.
Also please note that one way to avoid these extra bounds checks
entirely is to change an index-oriented loop like
span s;
for (size_t i = 0; i < s.size(); ++i)
foo(s[i]); // maybe bounds check on each iteration?
to a range based loop:
span s;
for (auto& v : s)
foo(v);
which is inherently safe and requires no in-loop checks at all.
Signed-off-by: Larry Gritz <lg@larrygritz.com>
I didn't immediately realize this when I read it originally, but just to be clear: none of the IBA functions use image_span. They all use ImageBuf::iterator, which are like range-for and know the bound without needing to check each address. Currently, we use image_span as a convenient way to encapsulate ptr + multi-dim sizes + multi-sim strides in one object. But I don't think there's anyplace where we iterate over it element by element and use operator[] every time. For performance, it's span that we should be worried about. But I fixed the tests last night and am being much more careful now, and on Mac/clang at least, I'm seeing barely any measurable penalty at all from the range check. I will confirm on Linux/gcc today and post the results and an updated PR. But unless you can spot something I've done wrong with my benchmarking methodology, I think (surprisingly) that there may not be a perf hit that we should worry about at all. |
… to IGNORE Signed-off-by: Larry Gritz <lg@larrygritz.com>
|
I have pushed an update -- changed some defaults, beefed up the benchmarks and fixed some flaws with them. I totally replaced the PR description, please reread. I include benchmark results. Now that I'm being much more careful, I'm not seeing a 20% penalty, as I originally reported, under any circumstance. I think I was measuring the wrong thing. Now I'm seeing about 1.5% hit for range checking -- just the benchmarked raw span access, not any performance of code overall in context -- on a Linux box I control, no difference on Linux (with a newer gcc) and Windows (with MSVS) that I don't fully control, no statistically discernible difference on a Mac Intel machine I control, and about 4% on a Mac ARM I do not control (I have a new Mac ARM machine arriving this week, I will re-test on that machine after it arrives). In short, I can barely (and sometimes not at all) demonstrate a slowdown when isolating this one operation with the extra check. I'm inclined to keep the default to be to do the range checking all the time, though still overrideable on people's individual builds if you really want to YOLO. |
jessey-git
left a comment
There was a problem hiding this comment.
Alright, I think I'm convinced after considering your reasoning some more and after doing some benchmarks. I also had a good read of Chromium's adventure in this space[1]. Note: To this day the V8 project that raised concerns in that PR is still not using hardened mode (if i understand their build files correctly). Note 2: It's really fun to read through the PRs of other big projects, more company's should open source :)
As for my benchmarks I looked at ImageBufAlgo::channel_sum(), channels(), and computePixelStats(). TL;DR: I have no concerns wrt performance.
Longer story: While the algo's do use the ImageBuf iterators, there's a handful of regular span lookups in the middle of processing. And I think most of them are of a form where the compiler will never be able to elide the checks since it can't prove safety. E.g. See channel_sum_ where a span lookup will occur per-pixel * per-channel. The channel index coming from a ROI will never be proven safe against the span of weights and so we'll always get hardened checks in this loop. From my testing though, it didn't seem to matter.
A general thought:
Do we need to be concerned about, one day, getting rid of our span and using std::span? In that world a few surprising things will happen once we swap it out. The biggest is that, for OIIO to remain safe, we'll be forced to use the standard library's harden mode, which will also get us hardening everywhere else. And downstream consumers will need to do the same lest they suddenly become less secure. Am I understanding things correctly?
| (void)0)) | ||
| #elif OIIO_ASSERTION_RESPONSE_DEFAULT == OIIO_ASSERTION_RESPONSE_ENFORCE | ||
| # define OIIO_CONTRACT_ASSERT_MSG(condition, message) \ | ||
| (OIIO_LIKELY(condition) ? ((void)0) \ |
There was a problem hiding this comment.
If we want to one day move to [[likely]] attributes (and we should so MSVC can get the goodness too) then those attributes do not work with ternary statements. You'll need a proper if conditional.
There was a problem hiding this comment.
Yeah, we can cross that bridge when we come to it. The good thing about using the ternary operator is that it lets use use these ASSERT style macros as if they were statements. It might take some thought about how to structure it using 'if' that won't syntactically break someplace where we've used it or would like to use it.
|
@jessey-git As usual, I really appreciate the extra diligence you're putting into considering this issue, all the way to running some benchmarks on your own. I think we should continue to scrutinize this even after merging. If we identify particular places where it demonstrably impacts performance or is provably safe already, I have no problem with changing isolated spots to use unsafe alternatives such as I think the C++ safety zeitgeist is moving rapidly to trying to make these kinds of accesses checked by default, trapping UB, and shifting the burden to the compilers to be better at identifying where it can optimize the checks away.
I think the std hardening is turning toward being on by default (the "fast" set anyway), and I think that C++26 will expect full support of contracts in std which also would enforce these access constraints. As for std::span, we have no concrete plans, but also no reason not to switch once C++20 is a reliable minimum and we're at one of those places (4.0?) where we can widely break ABI. So we'll probably stick to OIIO::span for a few years, but then maybe we'll switch (and similar for std::string_view). But yeah, I think it will depend on whether we feel the need to customize the safety checks on our side or whether the addition of C++26 contracts and library hardening give us everything we want. |
|
@jessey-git or others, do you have any objections to my merging what we have here? With, obviously, further improvements or course corrections down the road? Are we satisfied that what we have so far is not harming performance in any noticeable way? |
jessey-git
left a comment
There was a problem hiding this comment.
I was waiting to see if anyone else might chime in too. I only have a minor, non-blocking, comment below that I don't have a good solution for and can be looked at in the future if necessary.
non blocking:
The span_test cheats a little bit because the compiler (even MSVC) is able to deduce that the loop iterates from 0..N and N is also used when constructing the span. Because of this it elides the contract check entirely during codegen. If you change the test to iterate from 0..something_equal_to_N_but_not_N, then the compiler is forced to generate the assertion code. In a local test I did a cheeky "iterations / 100" which still equals 1000, the same as N, but the compiler cannot know that.
This does not seem to have any bearing on the performance of this particular loop. It's just an increase in code-size at the call site really.
My comment is mostly that it's a little sly. We have a "benchmark" that on the surface wants to prove / verify something. But because of how it sets things up it's a lot less powerful than it could be. If someone were to copy this while attempting to prove something similar in the future, they would miss that the compiler is just excluding all their code and they've not tested anything.
|
Oh, well, I definitely will fix the benchmark before merging. The whole thing is dependent on it being an essentially undetectable performance hit, and I want the benchmark to be measuring the right thing. I suspect that the reason is that the branch prediction works so well that the conditional is essentially free. |
…ds check Signed-off-by: Larry Gritz <lg@larrygritz.com>
|
@jessey-git I updated the PR. Now the length of the array that it traverses is (potentially, but not in practice) dependent on a command line argument, so the compiler can't statically infer its value. Do you think this is a more helpful benchmark now? (I still can't seem to show a non-negligible speed hit from the assertions.) |
jessey-git
left a comment
There was a problem hiding this comment.
Cool, yes, that is enough to trick MSVC (at least).
…ftwareFoundation#5006) Review: we have long had two assertion macros: OIIO_ASSERT which aborts upon failure in Debug builds and prints but continues in Release builds, and OIIO_DASSERT which aborts in Debug builds and is completely inactive for Relase builds. Inspired by C++26 contracts, and increasingly available "hardening modes" in major compilers (especially with the LLVM/clang project's libc++), I'm introducing some new verification helpers. New macro `OIIO_CONTRACT_ASSERT` more closely mimics C++26 contract_assert in many ways, and perhaps will simply wrap C++ contract_assert when C++26 is on our menu. Important ways that OIIO_CONTRACT_ASSERT differs from OIIO_ASSERT and OIIO_DASSERT: * Keeping in line with C++ contracts, there are 4 possible responses to a failed contract assertion: Ignore, Observe (print only), Enforce (print and abort) and Quick-Enforce (just abort). * Also define hardening levels: None, Fast, Extensive, and Debug, mimicking the levels of libc++. The idea is that maybe there will be some CONTRACT_ASSERT checks you only want to do for certain hardening levels. * By default, the contract failure response is Enforce, unless it's both a release build and the hardening level is set to None (in which case the response will be Ignore). But it's also overrideable optionally on a per-translation-unit basis by setting OIIO_ASSERTION_RESPONSE_DEFAULT before any OIIO headers are included (though obviously that only applies to inline functions or templates, not to any already-compiled code in the library). * Macros for explicit hardening levels: OIIO_HARDENING_ASSERT_FAST(), EXTENSIVE(), and DEBUG(), which call CONTRACT_ASSERT only when the hardening level is what's required or stricter. I also changed the bounds checking in operator[] of string_view, span, and image_span to use the contract assertions. Note that this adds a tiny bit of overhead, since the default is "enforce" for release builds (previously, using OIIO_DASSERT, it did no checks for release builds). But the benchmarks seem to idicate that the perf difference is barely measurable. I added some benchmarking that proves that the bounds check adds a minute overhead to an element access for a trivial `span<float>`, maybe even indescernable. Here are benchmarks comparing raw pointer access, std::array access, span access with the new checks, span access carefully bypassing the tests. Linux workstation, gcc-11, on my work computer: pointer operator[]: 647.8 ns (+/- 0.1ns) std::array operator[]: 647.8 ns (+/- 0.1ns) span operator[] : 657.6 ns (+/- 0.5ns) span unsafe indexing: 648.2 ns (+/- 0.2ns) span range : 648.1 ns (+/- 0.1ns) These are the most stable tests I have, with the least trial-to-trial variation, and show about a 1.5% speed hit on the bounds-checked span access itself, which I think will be truly un-measurable in the context of being interleaved with any other operations that you do with the data you pull from the span. Mac Intel, Apple Clang 17, on my (old) personal laptop: (much more variable timing, probably from MacOS scheduler quirks) pointer operator[]: 929.2 ns (+/- 6.7ns) std::array operator[]: 913.1 ns (+/- 20.6ns) span operator[] : 905.8 ns (+/- 13.3ns) span unsafe indexing: 913.9 ns (+/- 16.6ns) span range : 916.4 ns (+/- 20.3ns) You can see that here there is no obvious penalty, in fact it appears a little faster, but all within the timing uncertainty of the multiple trials, so statistically it's hard to discern any penalty. And a couple more for good measure from our CI, but note that because these are uncontrolled machines somewhere on the GitHub cloud, the timings might not be as reliable: Windows, MSVS 2022: pointer operator[]: 3716.3 ns (+/- 6.3ns) std::array operator[]: 3715.5 ns (+/- 3.4ns) span operator[] : 3715.6 ns (+/- 2.6ns) span unsafe indexing: 3712.1 ns (+/- 0.7ns) span range : 3714.2 ns (+/- 2.9ns) Linux, gcc-14, C++20: pointer operator[]: 1130.9 ns (+/- 0.2ns), 884.2 k/s std::array operator[]: 1132.0 ns (+/- 0.4ns), 883.4 k/s span operator[] : 1133.7 ns (+/- 0.4ns), 882.1 k/s span unsafe indexing: 1134.2 ns (+/- 1.6ns), 881.7 k/s span range : 1133.9 ns (+/- 0.7ns), 881.9 k/s MacOS ARM: pointer operator[]: 3456.6 ns (+/- 7.5ns) std::array operator[]: 3466.8 ns (+/- 12.2ns) span operator[] : 3610.9 ns (+/- 11.0ns) span unsafe indexing: 3607.4 ns (+/- 4.9ns) span range : 3612.4 ns (+/- 12.2ns) Windows with MSVS and Linux with newer g++ don't appear to show any penalty, and the bracketing of trial times indicates that maybe it's consistent enough to be meaningful? I can't think of anything I'm doing wrong here that would throw off the timing or disable the range checking on these tests. For MacOS ARM, the span looks like it has about a 4% penalty versus raw pointers? But OTOH, span bounds-checked vs non-checked vs range-for are all the same, so maybe the speed vs raw pointer is something else entirely? Also please note that a preferred way to avoid these extra bounds checks entirely is to change an index-oriented loop like span s; for (size_t i = 0; i < s.size(); ++i) foo(s[i]); // maybe bounds check on each iteration? to a range based loop: span s; for (auto& v : s) foo(v); which should be inherently safe and require no in-loop checks at all. --------- Signed-off-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
Review: we have long had two assertion macros: OIIO_ASSERT which aborts upon failure in Debug builds and prints but continues in Release builds, and OIIO_DASSERT which aborts in Debug builds and is completely inactive for Relase builds.
Inspired by C++26 contracts, and increasingly available "hardening modes" in major compilers (especially with the LLVM/clang project's libc++), I'm introducing some new verification helpers.
New macro
OIIO_CONTRACT_ASSERTmore closely mimics C++26 contract_assert in many ways, and perhaps will simply wrap C++ contract_assert when C++26 is on our menu.Important ways that OIIO_CONTRACT_ASSERT differs from OIIO_ASSERT and OIIO_DASSERT:
Keeping in line with C++ contracts, there are 4 possible responses to a failed contract assertion: Ignore, Observe (print only), Enforce (print and abort) and Quick-Enforce (just abort).
Also define hardening levels: None, Fast, Extensive, and Debug, mimicking the levels of libc++. The idea is that maybe there will be some CONTRACT_ASSERT checks you only want to do for certain hardening levels.
By default, the contract failure response is Enforce, unless it's both a release build and the hardening level is set to None (in which case the response will be Ignore). But it's also overrideable optionally on a per-translation-unit basis by setting OIIO_ASSERTION_RESPONSE_DEFAULT before any OIIO headers are included (though obviously that only applies to inline functions or templates, not to any already-compiled code in the library).
Macros for explicit hardening levels: OIIO_HARDENING_ASSERT_FAST(), EXTENSIVE(), and DEBUG(), which call CONTRACT_ASSERT only when the hardening level is what's required or stricter.
I also changed the bounds checking in operator[] of string_view, span, and image_span to use the contract assertions. Note that this adds a tiny bit of overhead, since the default is "enforce" for release builds (previously, using OIIO_DASSERT, it did no checks for release builds). But the benchmarks seem to idicate that the perf difference is barely measurable.
I added some benchmarking that proves that the bounds check adds a minute overhead to an element access for a trivial
span<float>, maybe even indescernable. Here are benchmarks comparing raw pointer access, std::array access, span access with the new checks, span access carefully bypassing the tests.Linux workstation, gcc-11, on my work computer:
These are the most stable tests I have, with the least trial-to-trial variation, and show about a 1.5% speed hit on the bounds-checked span access itself, which I think will be truly un-measurable in the context of being interleaved with any other operations that you do with the data you pull from the span.
Mac Intel, Apple Clang 17, on my (old) personal laptop: (much more variable timing, probably from MacOS scheduler quirks)
You can see that here there is no obvious penalty, in fact it appears a little faster, but all within the timing uncertainty of the multiple trials, so statistically it's hard to discern any penalty.
And a couple more for good measure from our CI, but note that because these are uncontrolled machines somewhere on the GitHub cloud, the timings might not be as reliable:
Windows, MSVS 2022:
Linux, gcc-14, C++20:
MacOS ARM:
Windows with MSVS and Linux with newer g++ don't appear to show any penalty, and the bracketing of trial times indicates that maybe it's consistent enough to be meaningful? I can't think of anything I'm doing wrong here that would throw off the timing or disable the range checking on these tests.
For MacOS ARM, the span looks like it has about a 4% penalty versus raw pointers? But OTOH, span bounds-checked vs non-checked vs range-for are all the same, so maybe the speed vs raw pointer is something else entirely?
Also please note that a preferred way to avoid these extra bounds checks entirely is to change an index-oriented loop like
to a range based loop:
which should be inherently safe and require no in-loop checks at all.