Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crc32c: Add ppc64le fast zero optimized assembly. #15100

Merged
merged 1 commit into from Jun 9, 2017

Conversation

Projects
None yet
4 participants
@kestrels
Copy link
Contributor

kestrels commented May 16, 2017

Allow faster calculation of crc32c when a NULL
buffer is passed.

Signed-off-by: Andrew Solomon asolomon@us.ibm.com

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented May 16, 2017

@smatzek

@tchaikov @branch-predictor Notifying you of this pull request, since you've previously been reviewing drops in this area of the code. Thanks!

@branch-predictor
Copy link
Member

branch-predictor left a comment

Unfortunately I don't have any PPC hardware at hand, so I can't test it myself, and I have no former experience with PPC either, so I can't be particularly elaborate in your case, sorry. Still, a few general nits.

crc = crc32_vpmsum(crc, buf2, len);
free(buf2);
/* Handle the NULL buffer case. */
#ifdef REFLECT

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor May 16, 2017

Member

Where is that defined?

This comment has been minimized.

Copy link
@kestrels

kestrels May 18, 2017

Author Contributor

The top of src/common/crc32c_ppc_constants.h.

int i;
unsigned int result = 0;
for (i=0; i<32; i++) {
if (num & (1 << i))

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor May 16, 2017

Member

Since it is called twice per each ceph_crc32c_ppc, you may want to use more sophisticated implementation (one you have few lines below is already better).

This comment has been minimized.

Copy link
@kestrels

kestrels May 18, 2017

Author Contributor

Thanks very much for this comment. This is something I had overlooked and it made a considerable difference in performance.

@kestrels kestrels force-pushed the kestrels:wip-crc32c-fastzero2 branch from 4f3162d to 982954e May 18, 2017

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented May 18, 2017

@branch-predictor
@smatzek
@daxtens
@tchaikov

Changing the reflect32 used in the ppc64le fastzero path was a big help! In order to avoid any issues with licensing I simply used a better version of this function that was already included in the Ceph codebase (see the _reverse_bits function in src/common/hobject.h).

I had previously wanted to measure against @aclamk 's pull request #11966 so I setup a git repo to test the fastzero code in isolation. I linked this in the other pull request but I'll link here again for convenience.

Link to the test program:
https://github.com/kestrels/crc32c-fastzero

My conclusions are posted in the repo itself, here:
https://github.com/kestrels/crc32c-fastzero/blob/master/output-data/conclusions.txt

Now that reflect32 is faster, I had to retest and update the conclusions at the above link. My new conclusion is that on ppc64le the optimized assembly fastzero algorithm is faster than @aclamk 's fastzero algorithm in #11966. On x86 architecture @aclamk 's fastzero algorithm is faster than the optimized x86 assembly.

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented May 18, 2017

Please don't approve / merge this yet, still looking at something.

@tchaikov tchaikov changed the title crc32c: Add ppc64le fast zero optimized assembly. [DNM] crc32c: Add ppc64le fast zero optimized assembly. May 18, 2017

@tchaikov

This comment has been minimized.

Copy link
Contributor

tchaikov commented May 18, 2017

@kestrels ack, i prefixed the title of this PR with [DNM] .

@kestrels kestrels force-pushed the kestrels:wip-crc32c-fastzero2 branch from 982954e to 7cf9cb9 May 19, 2017

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented May 19, 2017

@tchaikov Sorry about that. You can remove the [DNM] prefix now. On my end the re-testing looks fine, so it is ready for review.

@kestrels kestrels changed the title [DNM] crc32c: Add ppc64le fast zero optimized assembly. crc32c: Add ppc64le fast zero optimized assembly. May 19, 2017

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented May 19, 2017

Oh right, I have permissions to change it myself, so I removed the DNM prefix.

return vt[0];
}

static unsigned int reflect32(unsigned int v) {

This comment has been minimized.

Copy link
@branch-predictor

branch-predictor May 25, 2017

Member

Same function is defined in src/common/hobject.h. I suggest that you move it from there into some common module and use that instead of copying it here.

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented May 30, 2017

Sorry for the delay in getting this updated. @branch-predictor had made a review comment about common code between src/common/crc32c_ppc.c and src/common/hobject.h. I also wanted to update to the latest code to integrate with #11966 but then I ran into some problems compiling on ppc64le architecture with latest master.

The issue is described here:
http://tracker.ceph.com/issues/20109

Once I've tested that one I'll make the pull request, then resume this drop.

crc32c: Add ppc64le fast zero optimized assembly.
Allow faster calculation of crc32c when a NULL
buffer is passed.

Signed-off-by: Andrew Solomon <asolomon@us.ibm.com>

@kestrels kestrels force-pushed the kestrels:wip-crc32c-fastzero2 branch from 7cf9cb9 to 50d781a Jun 1, 2017

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented Jun 1, 2017

@branch-predictor I've tried to address your comment about hobject.h. Please let me know if this doesn't match what you had in mind.

The newest version of this pull request also incorporates #11966. We don't want to use that code on ppc64le since the optimized assembly is faster.

@branch-predictor

This comment has been minimized.

Copy link
Member

branch-predictor commented Jun 1, 2017

@kestrels just skimmed through it, looks right. Though you could omit the extra reverse.(cc|h) and use include/intarith.h for that (there's already a bunch of bit related functions there).

@liewegas

This comment has been minimized.

Copy link
Member

liewegas commented Jun 1, 2017

retest this please

@kestrels

This comment has been minimized.

Copy link
Contributor Author

kestrels commented Jun 8, 2017

Not sure if this was ever officially approved by @branch-predictor. Does that need to happen before it can be retested in the CI environment?

I'd prefer not to move these functions from reverse.cc into a .h file, just because those functions are already being called from both C++ and C files, and making them inline functions (or C++ class related functions that are called from C code) just adds to that complexity.

Please let me know if that is ok. If not and you're waiting on me to take the next action by rewriting some of this please let me know.

@branch-predictor

This comment has been minimized.

Copy link
Member

branch-predictor commented Jun 8, 2017

@kestrels My only objection is adding extra .cc/.h for just two functions, it's not a showstopper or some kind of criminal offense. Besides, I have no way to do a live test of your code, so please don't consider me as some kind of official Ceph authority. ;)

@liewegas liewegas added the needs-qa label Jun 8, 2017

@branch-predictor

This comment has been minimized.

Copy link
Member

branch-predictor commented Jun 8, 2017

LGTM with an exception that I don't have suitable hardware to measure performance of this code by myself.

Reviewed-By: Piotr Dałek <piotr.dalek@corp.ovh.com>

@liewegas liewegas merged commit f718755 into ceph:master Jun 9, 2017

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.