Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some performance improvements and more idiomatic code in some places #1

Merged
merged 5 commits into from Mar 9, 2021

Conversation

soruh
Copy link
Contributor

@soruh soruh commented Feb 28, 2021

Introduction

Hi! I really liked your blog post and took a look at the code after reading it.
These are some ideas I had for making the code more idiomatic / faster.

I'd be happy to have a discussion about the things I changed and why I think they make sense or why you don't think they do.

Benchmarks

Here are some times on my machine (8/16 cores/threads) with SAMPLES_PER_PIXEL=50:

f290e44 - current master:

...
Scanlines remaining 3
Scanlines remaining 2
Scanlines remaining 1
Scanlines remaining 0
Done
cargo run --release > /dev/null  289.71s user 0.34s system 99% cpu 4:50.22 total

033ad93 - pre-parallelization:

...
Scanlines remaining 3
Scanlines remaining 2
Scanlines remaining 1
Scanlines remaining 0
Done
cargo run --release > /dev/null  273.98s user 0.41s system 99% cpu 4:34.75 total

8e19696 - post-parallelization:

Calculated 960000/960000 pixels (100.0%)
Writing pixel 960000/960000 (100.0%)
Done
cargo run --release > /dev/null  730.92s user 0.77s system 1563% cpu 46.795 total

@soruh soruh changed the title Some performance improvments and more idiomatic code in some places Some performance improvements and more idiomatic code in some places Feb 28, 2021
@JDuchniewicz
Copy link
Owner

Thanks for the PR! It looks pretty solid. Currently I am on holidays but I will merge it as soon as I have time to read into it in details 😁

BTW. Linked it on the reddit post in comments.

@soruh
Copy link
Contributor Author

soruh commented Mar 1, 2021

Thank you, feel free to look at it when ever you get to it and have a relaxing vacation!

x: 0.,
y: 0.,
z: 0.,
macro_rules! impl_binop {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this implementation! However, explaining the macros for the beginner would be slightly too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I though it might be a nice "hey, we have macros in rust" introduction, but I understand if you think it adds too much complexity...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention it and the relevant changes in the updated text 😁

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And honestly, Rust macros are a potent beast, much more powerful than C ones imo

Copy link
Contributor Author

@soruh soruh Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think they fit in very well with rust, since they restrict you a bit more (than C), but most of the thinks you can't do are a bad idea anyways and in that they go very well with the whole pattern matching theme

}
}

impl fmt::Display for Vec3 {
#[inline]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the inline specifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot about that.
None of these #[inline]s actually do anything.
They only tell the compiler to inline bewteen separate crates when compiling with LTO. We neither compiler with LTO, nor do we use these function in separate crates. I think we should either replace them with #[inline(always)] where benchmarking shows it makes sense or remove them and let LLVM figure out what to inline.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is probably something that can also confuse newcomers from C++.

@JDuchniewicz
Copy link
Owner

Merging. Many thanks for the time and effort.

@JDuchniewicz JDuchniewicz merged commit 1801f84 into JDuchniewicz:master Mar 9, 2021
@soruh
Copy link
Contributor Author

soruh commented Mar 9, 2021

Great, thanks! I think it might make sense revert the merge for now though, address your comments in this PR and then remerging instead of making a new PR for those changes...

@JDuchniewicz
Copy link
Owner

Revert done. The inline specifiers were the only thing I think.

@soruh
Copy link
Contributor Author

soruh commented Mar 9, 2021

Great, I'll look at benchmarking them when I get home in approx. 2h

@soruh
Copy link
Contributor Author

soruh commented Mar 9, 2021

Okay, so i have looked into this and seems like it's not as easy as I though...
(Note that the following benchmarks are very unscientific and the random start scene might have a significant impact. I ran all of these a few times an the times looked relatively stable)

Baseline speed is around 47s for 50 Samples for me.
turning all #[inline] annotations into #[inline(always)] makes the code slightly slower at 48s
and removing all #[inline] annotations actually slows us down even more to 50s.

It looks like #[inline] also inlines across code-gen units. Since code optimized for speed is usually compiled with only
one codegen unit that shouldn't be relevant for us though:

After tuning the Cargo.toml a bit:

[profile.release]
codegen-units = 1
lto = true

the baseline stays the same at 47s, but the version without the #[inline] annotations goes down to 45s

@JDuchniewicz
Copy link
Owner

Looks good to me. In a more complicated rendering engine it would be beneficial to see if these #[inline]s can speedup the execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants