New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some performance improvements and more idiomatic code in some places #1
Conversation
Thanks for the PR! It looks pretty solid. Currently I am on holidays but I will merge it as soon as I have time to read into it in details 😁 BTW. Linked it on the reddit post in comments. |
Thank you, feel free to look at it when ever you get to it and have a relaxing vacation! |
x: 0., | ||
y: 0., | ||
z: 0., | ||
macro_rules! impl_binop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this implementation! However, explaining the macros for the beginner would be slightly too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I though it might be a nice "hey, we have macros in rust" introduction, but I understand if you think it adds too much complexity...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mention it and the relevant changes in the updated text 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And honestly, Rust macros are a potent beast, much more powerful than C ones imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think they fit in very well with rust, since they restrict you a bit more (than C), but most of the thinks you can't do are a bad idea anyways and in that they go very well with the whole pattern matching theme
} | ||
} | ||
|
||
impl fmt::Display for Vec3 { | ||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the inline specifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I forgot about that.
None of these #[inline]
s actually do anything.
They only tell the compiler to inline bewteen separate crates when compiling with LTO. We neither compiler with LTO, nor do we use these function in separate crates. I think we should either replace them with #[inline(always)]
where benchmarking shows it makes sense or remove them and let LLVM figure out what to inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is probably something that can also confuse newcomers from C++.
Merging. Many thanks for the time and effort. |
Great, thanks! I think it might make sense revert the merge for now though, address your comments in this PR and then remerging instead of making a new PR for those changes... |
Revert done. The |
Great, I'll look at benchmarking them when I get home in approx. 2h |
Okay, so i have looked into this and seems like it's not as easy as I though... Baseline speed is around 47s for 50 Samples for me. It looks like After tuning the Cargo.toml a bit: [profile.release]
codegen-units = 1
lto = true the baseline stays the same at 47s, but the version without the |
Looks good to me. In a more complicated rendering engine it would be beneficial to see if these |
Introduction
Hi! I really liked your blog post and took a look at the code after reading it.
These are some ideas I had for making the code more idiomatic / faster.
I'd be happy to have a discussion about the things I changed and why I think they make sense or why you don't think they do.
Benchmarks
Here are some
time
s on my machine (8/16 cores/threads) withSAMPLES_PER_PIXEL=50
:f290e44 - current master:
033ad93 - pre-parallelization:
8e19696 - post-parallelization: