-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C solution 2: Don't over-allocate #406
Conversation
Tagging @danielspaangberg and allowing time so they can review and respond to this PR on their original solution, if they want to do so. |
Thanks @tjol for noticing and @rbergen for notifying me. You're right, I allocate 8 times as much as needed, but I think your solution needs to be changed slightly. For example for 1'000'000 integers and a 64 bit storage your code would allocate 7812 uint64_t, but they only hold 499968 bits. So I think your solution is good, but there should be (maxints+16*sizeof(TYPE)-1)/16/sizeof(TYPE). |
You're right, of course. |
@danielspaangberg No, you've done an excellent job by responding so quickly and being so eagle-eyed. Thank you! |
When I run the latest commit locally in docker, this is what I get:
|
Well that's interesting... I didn't get any segmentation faults running locally, but I do in docker. Curious. |
Ouch. Will look into what is wrong. I got a segfault running locally when linking with electric fence. |
I have a bug with a loop test terminating a bunch of loops with |
@danielspaangberg Again, impressively eagle-eyed! My advice would be to enjoy your vacation while it lasts. If this ends up taking a week to get fixed, that's just life. |
@danielspaangberg I've come to the same conclusion – it looks like that's where the problem is. For For the
with
While this works, I do not expect that it's an ideal solution. If you agree, I'm happy to commit it – otherwise I'll wait for you to do the job properly in a week or two. Enjoy the rest of your holiday! |
Great and thanks! It all looks OK to me, but since it is hard for me to test right now, I can do a proper check next week. The tricky thing with the parallel version masking of the bits, is making sure no two threads touch the same integer. And it is hard to test, since the results will only get garbled sometimes. So I'll read through that part of the code again later. |
I took the freedom to learn from this PR and the original code of |
Thanks @fvbakel ! I will look at this as soon as I have checked the other bugs. I looked at your code, but cannot understand how that allocates memory faster? The code is simpler for sure, but the performance should not be that different? |
@danielspaangberg I was surprised by the difference in performance too. I have to check if I can really measure a difference when I run multiple times. I found that on my under powered hardware there can sometimes be a bit of variation in performance. |
So I have read through the code and tested a bit - I can see no problems with your changes @tjol - they solve the issues for the parallel code as well. Thanks for all the changes and fixes!! |
@fvbakel - I checked the bug with the odd integers as you mention. I can add a simple fix for this in this solution as well (basically just maxints+=maxints%2 in create sieve will solve it), and will do soon. Thanks for noticing. I also looked at your segmented solution, which is clever. I did some experiments with a segmented sieve, but could not make it run faster for only one million ints. I have looked more at the code around the calloc, and do not understand how it can be much different in speed, if I am not missing anything essential, it is just the code to compute how many TYPE should be allocated, which should now in the code by @tjol just compile to an add and and a shift? I just checked the compiler output and that's basically what's there (a couple of moves too). |
@danielspaangberg Thanks for the feedback. I think the difference in performance of the memory allocation can be considered a glitch in my testing. It is indeed just the calculation of the amount. On Alpine, the segmented solution is slower because alpine has a bad implementation of |
Nice! Good to know - I was just staring at the code for a while. I tested your code on debian under wsl2 and your code ran almost twice as fast as the 1of2. I think I tried my segmented code to be too generic. It is clever to do the first pass just for the first word. |
@danielspaangberg Thank you, they only remaining huddle is to get that PR merged. |
Do I understand correctly that we're all happy for this PR to be reviewed and merged in its current form? (In additional to @fvbakel’s new and improved version) |
This PR (your, @tjol, changes) at least I am completely happy with!! The additions @fvbakel has done regarding the segmented version, is in a separate PR, right? The changes suggested by @fvbakel for the arguments of calloc, I do not see as necessary in this PR and no further changes are required. But, there might be additional changes needed, depending on how we see what the maxint argument means. If we give an odd integer as maxint, and that integer corresponds to a prime, should that be counted? For instance 97 is prime, but is currently not counted by this implementation. Do we count primes below maxint or below and including maxint? |
@danielspaangberg, @tjol (and others), what you decide concerning this PR is entirely up to you. I just want to note that we have not "standardized" this from a maintainer's perspective. An important reason for that is that we focus on a sieve size of an even number, which means either approach works. For now, I'll just wait for a comment that says that I can indeed review this PR towards merging it. |
Thanks @rbergen - really appreciate good management! |
Description
C solution 2 was allocating 8x the space required.
Contributing requirements
drag-race
as the target branch.