New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Aabb rewrite #1270

Closed

hollasch wants to merge 5 commits into dev from aabb-rewrite

Collaborator

hollasch commented Sep 14, 2023

Revise AABB hit function to use intervals

Increases program run by about 10%, and the code is easier to understand using interval arithmetic. A separate change to ray.origin() and ray.direction() returning const refs adds another 5% speedup.

In addition, I discovered that in many cases using fmin() and fmax() are performance bombs. Likely this is because these functions have special handling for NaNs. Instead, I switched to using ternary expressions like a < b ? a : b, which had a large performance impact. For the aabb::hit() function, this improved performance of the new interval code from 100% worse to about 10% better.

Added a new span() function that returns an interval of two doubles, regardless of their order.

Added new interval::is_empty() function that also returns true when either of the bounds is a NaN.

Added new interval::intersect() function.

hollasch added 4 commits

September 14, 2023 15:20


          Revise AABB hit function to use intervals

6c92b6a

Increases program run by about 10%, and the code is easier to understand
using interval arithmetic. A separate change to ray.origin() and
ray.direction() returning const refs adds another 5% speedup.

In addition, I discovered that in many cases using `fmin()` and `fmax()`
are performance bombs. Likely this is because these functions have
special handling for NaNs. Instead, I switched to using ternary
expressions like `a < b ? a : b`, which had a large performance impact.
For the `aabb::hit()` function, this improved performance of the new
interval code from 100% worse to about 10% better.

Added a new `span()` function that returns an interval of two doubles,
regardless of their order.

Added new `interval::is_empty()` function that also returns true when
either of the bounds is a NaN.

Added new `interval::intersect()` function.


          Make code listing captions consistent

d28d49d

- Use a consistent wrap format for long captions.
- Always have a blank line before a closing </div> tag following a
  listing, due to Markdeep behavior.
- Minor correction to one caption typo.
- Fix neglected indent for a code listing.


          Switch from ray = A + tb to A + td

073b0da

The `d` is more consistent with the code using `d` for direction.


          Update book for latest AABB changes

1e78335

- Add clarifying comment for corner cases of ray-aabb intersection.
- Fix some listings.
- Use new `span()` function.
- Update `aabb::hit()` code.
- Deprecate section "An Optimized AABB Hit Method"
- Document new `interval` functions.

hollasch added this to the v4.0.0-alpha.2 milestone

hollasch requested review from armansito and trevordblack

September 14, 2023 22:27

hollasch self-assigned this

armansito reviewed

View reviewed changes

books/RayTracingTheNextWeek.html Outdated

@@ @@ -462,20 +463,20 @@ @@
               How do we find the intersection between a ray and a plane? Recall that the ray is just defined by a
               function that--given a parameter $t$--returns a location $\mathbf{P}(t)$:
-                $$ \mathbf{P}(t) = \mathbf{A} + t \mathbf{b} $$
+                $$ \mathbf{P}(t) = \mathbf{A} + t \mathbf{d} $$

Contributor

armansito Sep 15, 2023

I think this is fine as it is but I wanted to point out that the "Ray-Sphere Intersection" chapter in InOneWeekend defines a ray as $\mathbf{P}(t) = \mathbf{Q} + t\mathbf{d}$ (i.e. it uses "Q" instead of "A").

I don't know if it's important to keep math symbols consistent across the books but might as well change that too.

Collaborator Author

hollasch Sep 15, 2023

I like it. Will do. I'd love to use $\mathbf{O}$, but of course, that's easily confused for zero, which is why I chose $\mathbf{Q}$ for the sphere equations.

books/RayTracingTheNextWeek.html

Comment on lines +632 to +639

+                              const interval& ax = axis(a);
+                              auto ao = r.origin()[a];
+                              auto ad = r.direction()[a];
+                              auto t_interval = span((ax.min - ao) / ad, (ax.max - ao) / ad);
+                              ray_t = ray_t.intersect(t_interval);
+                              if (ray_t.is_empty())

Contributor

armansito Sep 15, 2023

I'm not entirely surprised that changing the min/max calls to ternaries helped but I think this will ultimately depend on the compiler. Both ternaries and min/max intrinsics on floats likely bottom out at similar instructions but I haven't fully analyzed them down to instruction cycle counts.

Does the new code perform better than Andrew Kensler's version? I would be surprised as the new version has more branches and divisions afaict. If we're going down the path of optimizations, perhaps the code should keep the invD optimization as it is very common.

Collaborator Author

hollasch Sep 15, 2023 •

edited

Loading

I think this will ultimately depend on the compiler

Not quite. The ternary expression just lets infinities and NaNs flow according to IEEE-754. The fmin/fmax C++ library functions intercept NaN values so that the finite/infinite values are returned unless both parameters are NaN. In most cases for our code, this doesn't matter, so the extra (often dramatically slower) work is unnecessary.

I remembered hitting this in geospatial calculations for our Tableau codebase as well, with the preferred solution avoiding the use of these library functions.

And yes, the new code outperforms Andrew Kensler's version.

In additional weirdness, when I make the following changes, I get a 3% slowdown:

         for (int a = 0; a < 3; a++) {
             const interval& ax = axis(a);
             auto ao = r.origin()[a];
-            auto ad = r.direction()[a];
+            auto adinv = 1.0 / r.direction()[a];

-            auto t_interval = span((ax.min - ao) / ad, (ax.max - ao) / ad);
+            auto t_interval = span((ax.min - ao) * adinv, (ax.max - ao) * adinv);
             ray_t = ray_t.intersect(t_interval);

             if (ray_t.is_empty())

books/RayTracingTheNextWeek.html Outdated

-              the following version of the code. It works extremely well on many compilers, and I have adopted it
-              as my go-to method:
+              <div class='together'>
+              The new code above introduces new interval functions we need to write: `interval::is_empty()`,

Contributor

armansito Sep 15, 2023

Consider rephrasing this as "The new code above relies on three new interval methods that we haven't defined: ...`.

"relies on" is a little more clear compared to "introduces". I've seen the book use both "method" and "function" interchangeably, which is fine, though method makes it a bit clearer that you're talking about instance functions.

Collaborator Author

hollasch Sep 15, 2023

Rephrased. Note that two of the new functions are class functions, and one is a standalone function.

Contributor

armansito Sep 15, 2023

Ah, you're right. I missed that span() is a standalone function. In that case I would just rephrase this like so:

The new code above relies on three new functions that we haven't defined: `interval::is_empty()`, `interval::intersect()`, and `span()`.

books/RayTracingTheNextWeek.html Show resolved Hide resolved

books/RayTracingTheNextWeek.html Outdated

Comment on lines 670 to 674

+                      interval(const interval& a, const interval& b) {
+                          // Create the interval tightly enclosing the two input intervals.
+                          min = a.min <= b.min ? a.min : b.min;
+                          max = a.max >= b.max ? a.max : b.max;
+                      }

Contributor

armansito Sep 15, 2023

I think this same constructor gets added in the listing below following the sentence First, we'll add a new interval constructor that takes two intervals as input:. I'm not sure if it should be in this listing.

Collaborator Author

hollasch Sep 15, 2023

Fixed.


          Incorporate review feedback

c76c511

Collaborator Author

hollasch commented Sep 26, 2023

NOTE: This PR is currently under investigation for performance implications across more platforms.

hollasch mentioned this pull request

AABB hit and NaNs & Infinities #927

Closed

Collaborator Author

hollasch commented Sep 28, 2023 •

edited

Loading

Quoting an offline thread between me and @armansito on Slack, to preserve comments here for posterity:

Arman Uguray

@Steve Hollasch Your performance observations are interesting. I was mainly going by godbolt in my musings but I guess that doesn't have much standing when faced with actual measurements. What type of CPU are you running on? The thing with the invD approach being slower is definitely counterintuitive.
Anyhow, I don't think these need to hold up landing that PR. I think the changes look good overall and the code looks clear to me (edited)

Steve Hollasch

invD slowing things down has me questioning my sanity. (edited)

Steve Hollasch

"Counterintuitive" is a mild way to put it. However, if there's one thing I've learned about perf, it's that intuition is hurts as much as helps, and the wall clock is everything. That said, as always, this is

on my machine

with my CPU

and my current compiler

and version

and compile flags

and calendar year

and barometric pressure

Arman Uguray

Yeah, I've had benchmarks that were influenced by the Zodiac

Arman Uguray

So, I got curious and ran a comparison of your PR branch and the current dev branch and I'm seeing a slowdown (this is just with theNextWeek). I ran multiple renders with and without compiler optimizations and the results are pretty consistent. To enable optimizations, I added a add_compile_options(-O3) line to CMakeLists.txt

Machine: Apple M1 Max

Compiler: AppleClang 14.0.3.14030022

Results from running time ./build/theNextWeek > image.ppm :

dev : 288.08 secs

aabb-rewrite: 367.04 secs

dev (-03): 39.40 secs

aabb-rewrite (-O3): 40.49 secs (edited)

Arman Uguray

I ran the renders 3 times each and they are all within 300 milliseconds of each other across runs, so fairly consistent at the granularity of seconds (edited)

Arman Uguray

The runtimes with -O3 are pretty similar either way. I also tried aabb-rewrite with the invD optimization and for me it gets slightly faster, 39.85 seconds on average (edited)

Arman Uguray

I'll try replicating this test on a different computer so we have some more results. Given the variability here I'm hesitant to remove the Andrew Kensler version from the book without testing some more.
That said your new version has very similar performance with compiler optimizations (in my case at least). I'll wait for your response to figure out how to move forward. (edited)

Arman Uguray

Anecdotally, I have a shader version of the aabb code that looks just like the Andrew Kensler version except with max/min calls and no loop because those can operate entirely on vector registers.

Arman Uguray

I was planning to have that in the GPU book and pose it as "a SIMD version"

Steve Hollasch

Figures. 😄 I'm inclined to use the inverse-d formulation, as it doesn't cost much and is just as clear.
With similar runtimes (-O3), I'm still inclined to use the interval approach as it's easier to grok.
I also want to try just the fmin/fmax changes to see how that fares on your system. I'll send you an experimental branch tomorrow or Monday.

Arman Uguray

Sounds good!

Collaborator Author

hollasch commented Sep 28, 2023

On my system, replacing fmin()/fmax() with simple inline ternary functions does not significantly affect runtime for any of the three final scenes (before my ray-aabb changes).

Collaborator Author

hollasch commented Sep 28, 2023

I've run the two versions' debug builds against each other on Windows. I see even worse performance than @armansito did, unfortunately: an almost 38% slowdown with the new code.

So, release compiles better or a small bit worse, debug significantly worse, and code that I subjectively feel is more clear. A conundrum.

Contributor

armansito commented Sep 29, 2023

I've run the two versions' debug builds against each other on Windows. I see even worse performance than @armansito did, unfortunately: an almost 38% slowdown with the new code.

So, release compiles better or a small bit worse, debug significantly worse, and code that I subjectively feel is more clear. A conundrum.

OK, that seems to be overall consistent with my measurements. I don't feel great about regressing the performance but I think it makes sense for the book to prioritize clarity.

What do you think about changing the basic description of the function to your new interval approach to improve the clarity BUT also keeping Andrew's optimized version in place as an alternative? I don't think it hurts to provide both versions since I think both clearly explaining the concepts and presenting a straightforward optimization are equally valuable in a graphics learning resource.

Collaborator Author

hollasch commented Sep 29, 2023

Yeah, that's not a bad idea. Still, I'm deep into "sunk cost fallacy" land with all the work I did — it's tempting to proceed, but may not be the wisest choice. I'm still playing with things to see if there's an out. 30% is a high price to pay for "clarity".

Collaborator Author

hollasch commented Oct 28, 2023

(So gumption trapped. Trying to crawl out of this hole...) 😅

hollasch added the status: in progress label

Collaborator Author

hollasch commented Dec 6, 2023

Closing this one out for subsequent possible rewrite.

hollasch closed this

hollasch mentioned this pull request

Interval-based AABB update / Attempt #2 #1422

Closed

hollasch deleted the aabb-rewrite branch

March 21, 2024 21:59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment