-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use local loop variable in copyFromFrameBuffer #782
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can Xdr::write throw an exception and therefore bypass the adjustment to writePtr and readPtr at the end of the function? (Or do we even care in the case of an exception?
Otherwise, LGTM.
@lgritz no that I know of. None of the inner function calls can throw exceptions. I did specifically look for this when I made this patch, but didn't come up any early return condition for this function or exceptions. When compiled, this function turns into an essential memcpy, and if there's the possibility of an exception, the function remains a lot slower than an equivalent memcpy, which, given how often this function is called and how much it has to lift in the inner loop, it would be a really bad design from a performance point of view. |
This change allows the compiler to keep the loop variable (readPtr) in a register and therefore avoid cache miss in what is essentially a more general memcpy. By analysing the assembly generated by both gcc 6.3.1, gcc 4.8.5 and clang 5.0 I found that these compilers interpret the mutable reference such that it has to be written back into memory in every iteration. The performance regression was when upgrading the compilers for Foundry's Nuke. Our original build of OpenEXR 2.2.0 built with GCC 4.1.2 did not exhibit this behaviour. It yielded significant speed up in Nuke's writing speed. Signed-off-by: Gyula Gubacsi <gyula.gubacsi@foundry.com>
In that case, LGTM. |
This looks good to me, too, thanks for the optimization. Xdr::write simply copies the bytes, so no exceptions. There's an explicit throw on line 1539, but that's the case where the pointer is not advanced. Can you submit a CLA? That's required before we can merge. Thanks again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm as well
@cary-ilm need to dig out who's Foundry's CLA manager, but hopefully I can get sorted soon. |
Any progress with the CLA? |
It was a long process, but finally done! @cary-ilm Sorry for the delay, but I guess now it's ready to go. |
Thanks! A new release should be out shortly. |
This change allows the compiler to keep the loop variable (readPtr) in a
register and therefore avoid cache miss in what is essentially a more general
memcpy.
By analysing the assembly generated by both gcc 6.3.1, gcc 4.8.5 and clang 5.0
I found that these compilers interpret the mutable reference such that it
has to be written back into memory in every iteration.
The performance regression was when upgrading the compilers for Foundry's
Nuke. Our original build of OpenEXR 2.2.0 built with GCC 4.1.2 did not exhibit
this behaviour. It yielded significant speed up in Nuke's writing speed.