C solution 2: Don't over-allocate #406

tjol · 2021-07-13T18:07:46Z

Description

C solution 2 was allocating 8x the space required.

Contributing requirements

I read the contribution guidelines in CONTRIBUTING.md.
I placed my solution in the correct solution folder.
I added a README.md with the right badge(s).
I added a Dockerfile that builds and runs my solution.
I selected drag-race as the target branch.
All code herein is licensed compatible with BSD-3.

rbergen · 2021-07-13T18:15:47Z

Tagging @danielspaangberg and allowing time so they can review and respond to this PR on their original solution, if they want to do so.

danielspaangberg · 2021-07-13T19:20:58Z

Thanks @tjol for noticing and @rbergen for notifying me. You're right, I allocate 8 times as much as needed, but I think your solution needs to be changed slightly. For example for 1'000'000 integers and a 64 bit storage your code would allocate 7812 uint64_t, but they only hold 499968 bits. So I think your solution is good, but there should be (maxints+16*sizeof(TYPE)-1)/16/sizeof(TYPE).

tjol · 2021-07-13T19:27:34Z

You're right, of course.

danielspaangberg · 2021-07-13T19:38:59Z

@tjol That looks good to me! @rbergen - is any action required by me at this point?

rbergen · 2021-07-13T19:41:12Z

@danielspaangberg No, you've done an excellent job by responding so quickly and being so eagle-eyed. Thank you!

rbergen · 2021-07-13T19:55:22Z

When I run the latest commit locally in docker, this is what I get:

danielspaangberg_1of2;6728;5.000034;1;algorithm=base,faithful=yes,bits=1
danielspaangberg_8of30;8816;5.000249;1;algorithm=wheel,faithful=yes,bits=1
Segmentation fault
danielspaangberg_48of210;9936;5.000007;1;algorithm=wheel,faithful=yes,bits=1
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault
danielspaangberg_1of2_epar;13078;5.007350;8;algorithm=base,faithful=yes,bits=1
Segmentation fault
Segmentation fault
Segmentation fault
Segmentation fault

tjol · 2021-07-13T20:07:08Z

Well that's interesting...

I didn't get any segmentation faults running locally, but I do in docker. Curious.

danielspaangberg · 2021-07-13T20:07:41Z

Ouch. Will look into what is wrong. I got a segfault running locally when linking with electric fence.

danielspaangberg · 2021-07-13T20:28:22Z

I have a bug with a loop test terminating a bunch of loops with
for (unsigned int i=(factor*factor)>>1U; i<=maxintsh; )
which should be
for (unsigned int i=(factor*factor)>>1U; i<maxintsh; )
and with 32 bit ints it actually allocates exactly the number of integers required, so going off by one here triggers a segfault.
In some of the codes this was fixed already, why the segmentation fault does not always happen.
Problem - I am on vacation and cannot fix this with the computer I am currently at (not skilled enough with the git command line unfortunately). I can update the code when I get back home in a week.

rbergen · 2021-07-13T20:35:53Z

@danielspaangberg Again, impressively eagle-eyed! My advice would be to enjoy your vacation while it lasts. If this ends up taking a week to get fixed, that's just life.

tjol · 2021-07-13T20:43:04Z

@danielspaangberg I've come to the same conclusion – it looks like that's where the problem is. For *_epar and *_only_write_read_bits, changing the <= to a < fixes the problem. However, for *_par, this ends up giving an invalid result. This appears to be due to the way you're splitting up the work between threads, which I do not fully understand.

For the *_par solutions, I get the correct result by leaving the <= intact and replacing

 if (ithread==numthreads-1)
      mylastbit=lastbit;

with

 if (ithread==numthreads-1)
      mylastbit=lastbit-1;

While this works, I do not expect that it's an ideal solution. If you agree, I'm happy to commit it – otherwise I'll wait for you to do the job properly in a week or two.

Enjoy the rest of your holiday!

danielspaangberg · 2021-07-13T20:53:10Z

@danielspaangberg I've come to the same conclusion – it looks like that's where the problem is. For *_epar and *_only_write_read_bits, changing the < to a <= fixes the problem. However, for *_par, this ends up giving an invalid result. This appears to be due to the way you're splitting up the work between threads, which I do not fully understand.

For the *_par solutions, I get the correct result by leaving the <= intact and replacing
 if (ithread==numthreads-1)
      mylastbit=lastbit;
with
 if (ithread==numthreads-1)
      mylastbit=lastbit-1;
While this works, I do not expect that it's an ideal solution. If you agree, I'm happy to commit it – otherwise I'll wait for you to do the job properly in a week or two.

Enjoy the rest of your holiday!

Great and thanks! It all looks OK to me, but since it is hard for me to test right now, I can do a proper check next week. The tricky thing with the parallel version masking of the bits, is making sure no two threads touch the same integer. And it is hard to test, since the results will only get garbled sometimes. So I'll read through that part of the code again later.

fvbakel · 2021-07-19T14:48:21Z

I took the freedom to learn from this PR and the original code of sieve_1of2.c and build my own idea on top of that. Doing so I found a faster way to allocate the memory than what is described above. See my comments and code in PR #422. Next to that I found a bug in sieve_1of2.c, if you set the maxints to 97, 97 itself is not counted and the number of reported primes is off by one. I solved that to in the solution of PR #422. Maybe you can find a good way to solve that bug in this PR too.

danielspaangberg · 2021-07-20T12:37:50Z

I took the freedom to learn from this PR and the original code of sieve_1of2.c and build my own idea on top of that. Doing so I found a faster way to allocate the memory than what is described above. See my comments and code in PR #422. Next to that I found a bug in sieve_1of2.c, if you set the maxints to 97, 97 itself is not counted and the number of reported primes is off by one. I solved that to in the solution of PR #422. Maybe you can find a good way to solve that bug in this PR too.

Thanks @fvbakel ! I will look at this as soon as I have checked the other bugs. I looked at your code, but cannot understand how that allocates memory faster? The code is simpler for sure, but the performance should not be that different?

fvbakel · 2021-07-20T12:47:37Z

@danielspaangberg I was surprised by the difference in performance too. I have to check if I can really measure a difference when I run multiple times. I found that on my under powered hardware there can sometimes be a bit of variation in performance.

danielspaangberg · 2021-07-20T14:00:35Z

@danielspaangberg I've come to the same conclusion – it looks like that's where the problem is. For *_epar and *_only_write_read_bits, changing the < to a <= fixes the problem. However, for *_par, this ends up giving an invalid result. This appears to be due to the way you're splitting up the work between threads, which I do not fully understand.
For the *_par solutions, I get the correct result by leaving the <= intact and replacing
 if (ithread==numthreads-1)
      mylastbit=lastbit;
with
 if (ithread==numthreads-1)
      mylastbit=lastbit-1;
While this works, I do not expect that it's an ideal solution. If you agree, I'm happy to commit it – otherwise I'll wait for you to do the job properly in a week or two.
Enjoy the rest of your holiday!
Great and thanks! It all looks OK to me, but since it is hard for me to test right now, I can do a proper check next week. The tricky thing with the parallel version masking of the bits, is making sure no two threads touch the same integer. And it is hard to test, since the results will only get garbled sometimes. So I'll read through that part of the code again later.

So I have read through the code and tested a bit - I can see no problems with your changes @tjol - they solve the issues for the parallel code as well. Thanks for all the changes and fixes!!

danielspaangberg · 2021-07-20T14:13:07Z

@danielspaangberg I was surprised by the difference in performance too. I have to check if I can really measure a difference when I run multiple times. I found that on my under powered hardware there can sometimes be a bit of variation in performance.

@fvbakel - I checked the bug with the odd integers as you mention. I can add a simple fix for this in this solution as well (basically just maxints+=maxints%2 in create sieve will solve it), and will do soon. Thanks for noticing. I also looked at your segmented solution, which is clever. I did some experiments with a segmented sieve, but could not make it run faster for only one million ints. I have looked more at the code around the calloc, and do not understand how it can be much different in speed, if I am not missing anything essential, it is just the code to compute how many TYPE should be allocated, which should now in the code by @tjol just compile to an add and and a shift? I just checked the compiler output and that's basically what's there (a couple of moves too).

fvbakel · 2021-07-20T14:19:20Z

@danielspaangberg Thanks for the feedback. I think the difference in performance of the memory allocation can be considered a glitch in my testing. It is indeed just the calculation of the amount. On Alpine, the segmented solution is slower because alpine has a bad implementation of memcpy.

danielspaangberg · 2021-07-20T14:23:16Z

@danielspaangberg Thanks for the feedback. I think the difference in performance of the memory allocation can be considered a glitch in my testing. It is indeed just the calculation of the amount. On Alpine, the segmented solution is slower because alpine has a bad implementation of memcpy.

Nice! Good to know - I was just staring at the code for a while. I tested your code on debian under wsl2 and your code ran almost twice as fast as the 1of2. I think I tried my segmented code to be too generic. It is clever to do the first pass just for the first word.

fvbakel · 2021-07-20T14:48:41Z

@danielspaangberg Thank you, they only remaining huddle is to get that PR merged.

tjol · 2021-07-20T15:23:38Z

Do I understand correctly that we're all happy for this PR to be reviewed and merged in its current form? (In additional to @fvbakel’s new and improved version)

danielspaangberg · 2021-07-20T15:36:05Z

Do I understand correctly that we're all happy for this PR to be reviewed and merged in its current form? (In additional to @fvbakel’s new and improved version)

This PR (your, @tjol, changes) at least I am completely happy with!! The additions @fvbakel has done regarding the segmented version, is in a separate PR, right? The changes suggested by @fvbakel for the arguments of calloc, I do not see as necessary in this PR and no further changes are required. But, there might be additional changes needed, depending on how we see what the maxint argument means. If we give an odd integer as maxint, and that integer corresponds to a prime, should that be counted? For instance 97 is prime, but is currently not counted by this implementation. Do we count primes below maxint or below and including maxint?

rbergen · 2021-07-20T15:45:34Z

Do we count primes below maxint or below and including maxint?

@danielspaangberg, @tjol (and others), what you decide concerning this PR is entirely up to you. I just want to note that we have not "standardized" this from a maintainer's perspective. An important reason for that is that we focus on a sieve size of an even number, which means either approach works.

For now, I'll just wait for a comment that says that I can indeed review this PR towards merging it.

danielspaangberg · 2021-07-20T15:52:49Z

@rbergen - @tjol marked it as ready for review and I am also happy with this. I have tested with electric fence outside of docker, and with docker as well. So it should be fine for reviewing and merging.

danielspaangberg · 2021-07-20T16:59:02Z

Thanks @rbergen - really appreciate good management!

C solution 2: Don't over-allocate

6267f73

Actually allocate the correct number of bytes this time

bfdd751

tjol marked this pull request as draft July 13, 2021 20:52

Fix parallel versions (possibly temporary)

4c9f12a

fvbakel mentioned this pull request Jul 15, 2021

Added implementation in C with segmented algorithm #422

Merged

6 tasks

tjol marked this pull request as ready for review July 20, 2021 15:23

rbergen merged commit f97cf59 into PlummersSoftwareLLC:drag-race Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C solution 2: Don't over-allocate #406

C solution 2: Don't over-allocate #406

tjol commented Jul 13, 2021

rbergen commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021

tjol commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021

rbergen commented Jul 13, 2021

rbergen commented Jul 13, 2021

tjol commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021 •

edited

Loading

rbergen commented Jul 13, 2021 •

edited

Loading

tjol commented Jul 13, 2021 •

edited

Loading

danielspaangberg commented Jul 13, 2021

fvbakel commented Jul 19, 2021

danielspaangberg commented Jul 20, 2021

fvbakel commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

fvbakel commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

fvbakel commented Jul 20, 2021

tjol commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

rbergen commented Jul 20, 2021 •

edited

Loading

danielspaangberg commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

C solution 2: Don't over-allocate #406

C solution 2: Don't over-allocate #406

Conversation

tjol commented Jul 13, 2021

Description

Contributing requirements

rbergen commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021

tjol commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021

rbergen commented Jul 13, 2021

rbergen commented Jul 13, 2021

tjol commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021

danielspaangberg commented Jul 13, 2021 • edited Loading

rbergen commented Jul 13, 2021 • edited Loading

tjol commented Jul 13, 2021 • edited Loading

danielspaangberg commented Jul 13, 2021

fvbakel commented Jul 19, 2021

danielspaangberg commented Jul 20, 2021

fvbakel commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

fvbakel commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

fvbakel commented Jul 20, 2021

tjol commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

rbergen commented Jul 20, 2021 • edited Loading

danielspaangberg commented Jul 20, 2021

danielspaangberg commented Jul 20, 2021

danielspaangberg commented Jul 13, 2021 •

edited

Loading

rbergen commented Jul 13, 2021 •

edited

Loading

tjol commented Jul 13, 2021 •

edited

Loading

rbergen commented Jul 20, 2021 •

edited

Loading