Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: use local loop variable in copyFromFrameBuffer #782

Merged
merged 3 commits into from
Aug 5, 2020

Conversation

FnGyula
Copy link
Contributor

@FnGyula FnGyula commented Jul 14, 2020

This change allows the compiler to keep the loop variable (readPtr) in a
register and therefore avoid cache miss in what is essentially a more general
memcpy.

By analysing the assembly generated by both gcc 6.3.1, gcc 4.8.5 and clang 5.0
I found that these compilers interpret the mutable reference such that it
has to be written back into memory in every iteration.

The performance regression was when upgrading the compilers for Foundry's
Nuke. Our original build of OpenEXR 2.2.0 built with GCC 4.1.2 did not exhibit
this behaviour. It yielded significant speed up in Nuke's writing speed.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 14, 2020

CLA Check
The committers are authorized under a signed CLA.

Copy link
Contributor

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Xdr::write throw an exception and therefore bypass the adjustment to writePtr and readPtr at the end of the function? (Or do we even care in the case of an exception?

Otherwise, LGTM.

@FnGyula
Copy link
Contributor Author

FnGyula commented Jul 14, 2020

@lgritz no that I know of. None of the inner function calls can throw exceptions. I did specifically look for this when I made this patch, but didn't come up any early return condition for this function or exceptions. When compiled, this function turns into an essential memcpy, and if there's the possibility of an exception, the function remains a lot slower than an equivalent memcpy, which, given how often this function is called and how much it has to lift in the inner loop, it would be a really bad design from a performance point of view.

This change allows the compiler to keep the loop variable (readPtr) in a
register and therefore avoid cache miss in what is essentially a more general
memcpy.

By analysing the assembly generated by both gcc 6.3.1, gcc 4.8.5 and clang 5.0
I found that these compilers interpret the mutable reference such that it
has to be written back into memory in every iteration.

The performance regression  was when upgrading the compilers for Foundry's
Nuke. Our original build of OpenEXR 2.2.0 built with GCC 4.1.2 did not exhibit
this behaviour. It yielded significant speed up in Nuke's writing speed.

Signed-off-by: Gyula Gubacsi <gyula.gubacsi@foundry.com>
@lgritz
Copy link
Contributor

lgritz commented Jul 14, 2020

In that case, LGTM.

@cary-ilm
Copy link
Member

This looks good to me, too, thanks for the optimization. Xdr::write simply copies the bytes, so no exceptions. There's an explicit throw on line 1539, but that's the case where the pointer is not advanced.

Can you submit a CLA? That's required before we can merge. Thanks again.

Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as well

@FnGyula
Copy link
Contributor Author

FnGyula commented Jul 14, 2020

@cary-ilm need to dig out who's Foundry's CLA manager, but hopefully I can get sorted soon.

@cary-ilm
Copy link
Member

Any progress with the CLA?

@FnGyula
Copy link
Contributor Author

FnGyula commented Aug 5, 2020

Any progress with the CLA?

It was a long process, but finally done! @cary-ilm Sorry for the delay, but I guess now it's ready to go.

@cary-ilm
Copy link
Member

cary-ilm commented Aug 5, 2020

Thanks! A new release should be out shortly.

@cary-ilm cary-ilm merged commit 62f8ee6 into AcademySoftwareFoundation:master Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants