Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize PointerWrap. #892

Merged
merged 1 commit into from Aug 28, 2014
Merged

Optimize PointerWrap. #892

merged 1 commit into from Aug 28, 2014

Conversation

comex
Copy link
Contributor

@comex comex commented Aug 28, 2014

PointerWrap currently checks its mode for every individual byte of everything it 'does', including all of RAM. Make it not do that.

Decreases total Wii state save time (not counting compression) from
~570ms to ~18ms.

The compiler can't remove this check because of potential aliasing; this
might be fixable (e.g. by making mode const), but there is no reason to
have the code work in such a braindead way in the first place.

  • DoVoid now uses memcpy.
  • DoArray now uses DoVoid on the whole rather than Doing each element
    (would fail for an array of STL structures, but we don't have any of
    those).
  • Do also now uses memcpy (not DoVoid, in order to ensure each type gets
    its own implementation, which for small types then becomes a simple
    load/store in any modern compiler).

@comex
Copy link
Contributor Author

comex commented Aug 28, 2014

Incidentally, I wrote a hacky patch to make ptr and mode const:

  • On top of this change, it makes negligible difference. Because it's hacky (the issue is with aborting), I don't want to apply it.
  • Just out of curiosity, without this change, it only cuts the save time in half or so on my system - for whatever reason, clang can't optimize to memcpy, even with double restrict on the member.

break;
}

*ptr += sizeof(T);

This comment was marked as off-topic.

@degasus
Copy link
Member

degasus commented Aug 28, 2014

@comex "restrict" doesn't work for inlined function. It only has an effect if the outer buffer is marked as restrict.

@delroth
Copy link
Member

delroth commented Aug 28, 2014

20x boost on save state time is pretty awesome. Nice find.

As degasus said, this makes Do very similar to DoVoid. Any way to merge these two?

@shuffle2
Copy link
Contributor

@comex out of curiosity, does clang optimize std::copy here better? (there should be no cast to void*?)

@comex
Copy link
Contributor Author

comex commented Aug 28, 2014

shuffle2, not sure what you mean. Like other compilers, clang can already optimize memcpy of a small value into a regular load/store, and for larger structs or arrays memcpy should be the fastest way. My comment about clang was about whether it could optimize the byte-by-byte switch into memcpy given appropriate decorators.

…erything it 'does', including all of RAM. Make it not do that.

Decreases total Wii state save time (not counting compression) from
~570ms to ~18ms.

The compiler can't remove this check because of potential aliasing; this
might be fixable (e.g. by making mode const), but there is no reason to
have the code work in such a braindead way in the first place.

- DoVoid now uses memcpy.
- DoArray now uses DoVoid on the whole rather than Doing each element
(would fail for an array of STL structures, but we don't have any of
those).
- Do also now uses DoVoid.  (In the previous version, it replicated
DoVoid's code in order to ensure each type gets its own implementation,
which for small types then becomes a simple load/store in any modern
compiler.  Now DoVoid is __forceinline, which addresses that issue and
shouldn't make a big difference otherwise - perhaps a few extra copies
of the code inlined into DoArray or whatever.)
@comex
Copy link
Contributor Author

comex commented Aug 28, 2014

@delroth / @degasus I changed Do to call DoVoid, see the new commit message. I think this adds a kilobyte to the binary size (not sure - it doesn't seem very deterministic), but whatever.

@delroth
Copy link
Member

delroth commented Aug 28, 2014

LGTM.

Feel free to merge when the Buildbot is done building.

@shuffle2
Copy link
Contributor

@comex ah, i misread your comment. nevermind.

comex added a commit that referenced this pull request Aug 28, 2014
@comex comex merged commit 683191b into dolphin-emu:master Aug 28, 2014
@comex comex deleted the oh-the-abstraction branch August 28, 2014 21:28
@degasus
Copy link
Member

degasus commented Aug 28, 2014

How good are compiler at inlining the DoVoid function which is implemented after the usage?

@delroth
Copy link
Member

delroth commented Aug 28, 2014

Are you assuming that C++ compilers are single-pass?

On Fri, Aug 29, 2014 at 12:00 AM, Markus Wick notifications@github.com
wrote:

How good are compiler at inlining the DoVoid function which is implemented
after the usage?


Reply to this email directly or view it on GitHub
#892 (comment).

Pierre "delroth" Bourdon delroth@gmail.com
Software Engineer @ Zürich, Switzerland
http://code.delroth.net/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants