Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aabb rewrite #1270

Closed
wants to merge 5 commits into from
Closed

Aabb rewrite #1270

wants to merge 5 commits into from

Conversation

hollasch
Copy link
Collaborator

Revise AABB hit function to use intervals

Increases program run by about 10%, and the code is easier to understand using interval arithmetic. A separate change to ray.origin() and ray.direction() returning const refs adds another 5% speedup.

In addition, I discovered that in many cases using fmin() and fmax() are performance bombs. Likely this is because these functions have special handling for NaNs. Instead, I switched to using ternary expressions like a < b ? a : b, which had a large performance impact. For the aabb::hit() function, this improved performance of the new interval code from 100% worse to about 10% better.

Added a new span() function that returns an interval of two doubles, regardless of their order.

Added new interval::is_empty() function that also returns true when either of the bounds is a NaN.

Added new interval::intersect() function.

Increases program run by about 10%, and the code is easier to understand
using interval arithmetic. A separate change to ray.origin() and
ray.direction() returning const refs adds another 5% speedup.

In addition, I discovered that in many cases using `fmin()` and `fmax()`
are performance bombs. Likely this is because these functions have
special handling for NaNs. Instead, I switched to using ternary
expressions like `a < b ? a : b`, which had a large performance impact.
For the `aabb::hit()` function, this improved performance of the new
interval code from 100% worse to about 10% better.

Added a new `span()` function that returns an interval of two doubles,
regardless of their order.

Added new `interval::is_empty()` function that also returns true when
either of the bounds is a NaN.

Added new `interval::intersect()` function.
- Use a consistent wrap format for long captions.
- Always have a blank line before a closing </div> tag following a
  listing, due to Markdeep behavior.
- Minor correction to one caption typo.
- Fix neglected indent for a code listing.
The `d` is more consistent with the code using `d` for direction.
- Add clarifying comment for corner cases of ray-aabb intersection.
- Fix some listings.
- Use new `span()` function.
- Update `aabb::hit()` code.
- Deprecate section "An Optimized AABB Hit Method"
- Document new `interval` functions.
@hollasch hollasch added this to the v4.0.0-alpha.2 milestone Sep 14, 2023
@hollasch hollasch self-assigned this Sep 14, 2023
@@ -462,20 +463,20 @@
How do we find the intersection between a ray and a plane? Recall that the ray is just defined by a
function that--given a parameter $t$--returns a location $\mathbf{P}(t)$:

$$ \mathbf{P}(t) = \mathbf{A} + t \mathbf{b} $$
$$ \mathbf{P}(t) = \mathbf{A} + t \mathbf{d} $$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine as it is but I wanted to point out that the "Ray-Sphere Intersection" chapter in InOneWeekend defines a ray as $\mathbf{P}(t) = \mathbf{Q} + t\mathbf{d}$ (i.e. it uses "Q" instead of "A").

I don't know if it's important to keep math symbols consistent across the books but might as well change that too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Will do. I'd love to use $\mathbf{O}$, but of course, that's easily confused for zero, which is why I chose $\mathbf{Q}$ for the sphere equations.

Comment on lines +632 to +639
const interval& ax = axis(a);
auto ao = r.origin()[a];
auto ad = r.direction()[a];

auto t_interval = span((ax.min - ao) / ad, (ax.max - ao) / ad);
ray_t = ray_t.intersect(t_interval);

if (ray_t.is_empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely surprised that changing the min/max calls to ternaries helped but I think this will ultimately depend on the compiler. Both ternaries and min/max intrinsics on floats likely bottom out at similar instructions but I haven't fully analyzed them down to instruction cycle counts.

Does the new code perform better than Andrew Kensler's version? I would be surprised as the new version has more branches and divisions afaict. If we're going down the path of optimizations, perhaps the code should keep the invD optimization as it is very common.

Copy link
Collaborator Author

@hollasch hollasch Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will ultimately depend on the compiler

Not quite. The ternary expression just lets infinities and NaNs flow according to IEEE-754. The fmin/fmax C++ library functions intercept NaN values so that the finite/infinite values are returned unless both parameters are NaN. In most cases for our code, this doesn't matter, so the extra (often dramatically slower) work is unnecessary.

I remembered hitting this in geospatial calculations for our Tableau codebase as well, with the preferred solution avoiding the use of these library functions.

And yes, the new code outperforms Andrew Kensler's version.

In additional weirdness, when I make the following changes, I get a 3% slowdown:

         for (int a = 0; a < 3; a++) {
             const interval& ax = axis(a);
             auto ao = r.origin()[a];
-            auto ad = r.direction()[a];
+            auto adinv = 1.0 / r.direction()[a];

-            auto t_interval = span((ax.min - ao) / ad, (ax.max - ao) / ad);
+            auto t_interval = span((ax.min - ao) * adinv, (ax.max - ao) * adinv);
             ray_t = ray_t.intersect(t_interval);

             if (ray_t.is_empty())

the following version of the code. It works extremely well on many compilers, and I have adopted it
as my go-to method:
<div class='together'>
The new code above introduces new interval functions we need to write: `interval::is_empty()`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rephrasing this as "The new code above relies on three new interval methods that we haven't defined: ...`.

"relies on" is a little more clear compared to "introduces". I've seen the book use both "method" and "function" interchangeably, which is fine, though method makes it a bit clearer that you're talking about instance functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased. Note that two of the new functions are class functions, and one is a standalone function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. I missed that span() is a standalone function. In that case I would just rephrase this like so:

The new code above relies on three new functions that we haven't defined: `interval::is_empty()`, `interval::intersect()`, and `span()`. 

books/RayTracingTheNextWeek.html Show resolved Hide resolved
Comment on lines 670 to 674
interval(const interval& a, const interval& b) {
// Create the interval tightly enclosing the two input intervals.
min = a.min <= b.min ? a.min : b.min;
max = a.max >= b.max ? a.max : b.max;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this same constructor gets added in the listing below following the sentence First, we'll add a new interval constructor that takes two intervals as input:. I'm not sure if it should be in this listing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@hollasch
Copy link
Collaborator Author

NOTE: This PR is currently under investigation for performance implications across more platforms.

@hollasch
Copy link
Collaborator Author

hollasch commented Sep 28, 2023

Quoting an offline thread between me and @armansito on Slack, to preserve comments here for posterity:

Arman Uguray

@Steve Hollasch Your performance observations are interesting. I was mainly going by godbolt in my musings but I guess that doesn't have much standing when faced with actual measurements. What type of CPU are you running on? The thing with the invD approach being slower is definitely counterintuitive.
Anyhow, I don't think these need to hold up landing that PR. I think the changes look good overall and the code looks clear to me (edited)

Steve Hollasch

invD slowing things down has me questioning my sanity. (edited)

Steve Hollasch

"Counterintuitive" is a mild way to put it. However, if there's one thing I've learned about perf, it's that intuition is hurts as much as helps, and the wall clock is everything. That said, as always, this is

  • on my machine
  • with my CPU
  • and my current compiler
  • and version
  • and compile flags
  • and calendar year
  • and barometric pressure

Arman Uguray

Yeah, I've had benchmarks that were influenced by the Zodiac

Arman Uguray

So, I got curious and ran a comparison of your PR branch and the current dev branch and I'm seeing a slowdown (this is just with theNextWeek). I ran multiple renders with and without compiler optimizations and the results are pretty consistent. To enable optimizations, I added a add_compile_options(-O3) line to CMakeLists.txt

  • Machine: Apple M1 Max
  • Compiler: AppleClang 14.0.3.14030022

Results from running time ./build/theNextWeek > image.ppm :

  • dev : 288.08 secs
  • aabb-rewrite: 367.04 secs
  • dev (-03): 39.40 secs
  • aabb-rewrite (-O3): 40.49 secs (edited)

Arman Uguray

I ran the renders 3 times each and they are all within 300 milliseconds of each other across runs, so fairly consistent at the granularity of seconds (edited)

Arman Uguray

The runtimes with -O3 are pretty similar either way. I also tried aabb-rewrite with the invD optimization and for me it gets slightly faster, 39.85 seconds on average (edited)

Arman Uguray

I'll try replicating this test on a different computer so we have some more results. Given the variability here I'm hesitant to remove the Andrew Kensler version from the book without testing some more.
That said your new version has very similar performance with compiler optimizations (in my case at least). I'll wait for your response to figure out how to move forward. (edited)

Arman Uguray

Anecdotally, I have a shader version of the aabb code that looks just like the Andrew Kensler version except with max/min calls and no loop because those can operate entirely on vector registers.

Arman Uguray

I was planning to have that in the GPU book and pose it as "a SIMD version"

Steve Hollasch

Figures. 😄 I'm inclined to use the inverse-d formulation, as it doesn't cost much and is just as clear.
With similar runtimes (-O3), I'm still inclined to use the interval approach as it's easier to grok.
I also want to try just the fmin/fmax changes to see how that fares on your system. I'll send you an experimental branch tomorrow or Monday.

Arman Uguray

Sounds good!

@hollasch
Copy link
Collaborator Author

On my system, replacing fmin()/fmax() with simple inline ternary functions does not significantly affect runtime for any of the three final scenes (before my ray-aabb changes).

@hollasch
Copy link
Collaborator Author

I've run the two versions' debug builds against each other on Windows. I see even worse performance than @armansito did, unfortunately: an almost 38% slowdown with the new code.

So, release compiles better or a small bit worse, debug significantly worse, and code that I subjectively feel is more clear. A conundrum.

@armansito
Copy link
Contributor

I've run the two versions' debug builds against each other on Windows. I see even worse performance than @armansito did, unfortunately: an almost 38% slowdown with the new code.

So, release compiles better or a small bit worse, debug significantly worse, and code that I subjectively feel is more clear. A conundrum.

OK, that seems to be overall consistent with my measurements. I don't feel great about regressing the performance but I think it makes sense for the book to prioritize clarity.

What do you think about changing the basic description of the function to your new interval approach to improve the clarity BUT also keeping Andrew's optimized version in place as an alternative? I don't think it hurts to provide both versions since I think both clearly explaining the concepts and presenting a straightforward optimization are equally valuable in a graphics learning resource.

@hollasch
Copy link
Collaborator Author

Yeah, that's not a bad idea. Still, I'm deep into "sunk cost fallacy" land with all the work I did — it's tempting to proceed, but may not be the wisest choice. I'm still playing with things to see if there's an out. 30% is a high price to pay for "clarity".

@hollasch
Copy link
Collaborator Author

(So gumption trapped. Trying to crawl out of this hole...) 😅

@hollasch
Copy link
Collaborator Author

hollasch commented Dec 6, 2023

Closing this one out for subsequent possible rewrite.

@hollasch hollasch closed this Dec 6, 2023
@hollasch hollasch deleted the aabb-rewrite branch March 21, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants