Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve BitSet #3

Open
badamczewski opened this issue Oct 6, 2020 · 7 comments
Open

Improve BitSet #3

badamczewski opened this issue Oct 6, 2020 · 7 comments

Comments

@badamczewski
Copy link

Bit sets currently use 32bit words, this is not optimal since CPUs use 64-bit words and 64bit cache lines.

 public class BitArreintjeFastInnerMapArray
    {
        //Internal since it's used by some other classes (because this is so awesome)
        internal long[] _innerData;

        public BitArreintjeFastInnerMapArray(int height)
        {
            _innerData = new long[height / 64 + 1];
        }
....
....
....
@devedse
Copy link
Owner

devedse commented Oct 7, 2020

Cool stuff, I'll implement that and see what the performance benefits are for me. Thanks!!!!! 😄

I'd like to give you some reasoning why I made certain decisions and have your thoughts on it:

  1. I tried a try / catch so avoid having the while (stackje.Count == 0). I thought maybe if we wouldn't have this 'if check' every loop, the code would run faster. But according to your comment it would not :).
  2. I've mostly removed branching everywhere I could. Instead of doing if (direction == Left) { x = x - 2 } I now do some cool calculation stuff to do this. The only point where I still have branching is where I check if a specific point is a valid direction:
bool validLeft = cur.X - 2 > 0 && !map[cur.X - 2, cur.Y];
bool validRight = cur.X + 2 < width && !map[cur.X + 2, cur.Y];
bool validUp = cur.Y - 2 > 0 && !map[cur.X, cur.Y - 2];
bool validDown = cur.Y + 2 < height && !map[cur.X, cur.Y + 2];

I was wondering if you have any ideas on how to make this faster as well?

This is still the latest fastest version:
https://github.com/devedse/DeveMazeGeneratorCore/blob/master/DeveMazeGeneratorCore/Generators/AlgorithmBacktrack2Deluxe2.cs

One thing I found really cool as well is the way I use generics now to ensure the action.Invoke will be omitted by the compiler if Actions is not implemented. I learned that from the other guy in this issue: #2

@devedse
Copy link
Owner

devedse commented Oct 7, 2020

  1. Another question I had was that I noticed quite some performance difference when re-ordering C# statements. See the comparison between these 2:

1 (A bit slower it seems):

                    var chosenDirection = random.Next(targetCount);
                    int countertje = 0;

                    bool actuallyGoingLeft = validLeft && chosenDirection == countertje++;
                    bool actuallyGoingRight = validRight && chosenDirection == countertje++;
                    bool actuallyGoingUp = validUp && chosenDirection == countertje++;
                    bool actuallyGoingDown = validDown && chosenDirection == countertje;

                    int actuallyGoingLeftByte = Unsafe.As<bool, int>(ref actuallyGoingLeft);
                    int actuallyGoingRightByte = Unsafe.As<bool, int>(ref actuallyGoingRight);
                    int actuallyGoingUpByte = Unsafe.As<bool, int>(ref actuallyGoingUp);
                    int actuallyGoingDownByte = Unsafe.As<bool, int>(ref actuallyGoingDown);

                    var nextX = cur.X + actuallyGoingLeftByte * -2 + actuallyGoingRightByte * 2;
                    var nextY = cur.Y + actuallyGoingUpByte * -2 + actuallyGoingDownByte * 2;

                    var nextXInBetween = cur.X - actuallyGoingLeftByte + actuallyGoingRightByte;
                    var nextYInBetween = cur.Y - actuallyGoingUpByte + actuallyGoingDownByte;

                    stackje.Push(new MazePoint(nextX, nextY));
                    map[nextXInBetween, nextYInBetween] = true;
                    map[nextX, nextY] = true;

                    pixelChangedCallback.Invoke(nextXInBetween, nextYInBetween, currentStep, totSteps);
                    pixelChangedCallback.Invoke(nextX, nextY, currentStep, totSteps);

2 (A bit faster it seems):

                    var chosenDirection = random.Next(targetCount);
                    int countertje = 0;

                    bool actuallyGoingLeft = validLeft & chosenDirection == countertje;
                    int actuallyGoingLeftByte = Unsafe.As<bool, int>(ref actuallyGoingLeft);
                    countertje += validLeftByte;

                    bool actuallyGoingRight = validRight & chosenDirection == countertje;
                    int actuallyGoingRightByte = Unsafe.As<bool, int>(ref actuallyGoingRight);
                    countertje += validRightByte;

                    bool actuallyGoingUp = validUp & chosenDirection == countertje;
                    int actuallyGoingUpByte = Unsafe.As<bool, int>(ref actuallyGoingUp);
                    countertje += validUpByte;

                    bool actuallyGoingDown = validDown & chosenDirection == countertje;
                    int actuallyGoingDownByte = Unsafe.As<bool, int>(ref actuallyGoingDown);

                    var nextX = cur.X + actuallyGoingLeftByte * -2 + actuallyGoingRightByte * 2;
                    var nextY = cur.Y + actuallyGoingUpByte * -2 + actuallyGoingDownByte * 2;

                    var nextXInBetween = cur.X - actuallyGoingLeftByte + actuallyGoingRightByte;
                    var nextYInBetween = cur.Y - actuallyGoingUpByte + actuallyGoingDownByte;

                    stackje.Push(new MazePoint(nextX, nextY));
                    map[nextXInBetween, nextYInBetween] = true;
                    map[nextX, nextY] = true;

                    pixelChangedCallback.Invoke(nextXInBetween, nextYInBetween, currentStep, totSteps);
                    pixelChangedCallback.Invoke(nextX, nextY, currentStep, totSteps);

@devedse
Copy link
Owner

devedse commented Oct 7, 2020

I did just implement your idea to switch int to long and wow the difference is impressive:
image

@badamczewski
Copy link
Author

The picture shows the new performance but what was the old result? how many % faster? On my local environment, I got 50% performance boost.

Regarding things going a bit faster when not grouped. Normally it's a good practice to group the same of instructions but it's not the case here since JIT emits multiple different instructons to handle the conditions etc. So in the second case what I assume happens is that registers are beeing reused.

Locals like: actuallyGoingLeft etc. Are beeing either pushed to registers or pushed to the stack but since they are a temp computation the register/stack can be reused. If you group them you will have many locals and you will most likeley run out of registers and the compiler will be forced to push and out of the stack.

@badamczewski
Copy link
Author

The next big one should be optimizing memory access for the bit array and the two dimensional map array. Another big one is SIMD instructions but they work the best if you can access your memory and load up the vectors as fast as possible.

@devedse
Copy link
Owner

devedse commented Oct 7, 2020

I'm running into some trouble though that the mazes generated aren't correct anymore (see top and botom):

SmallTestGeneratedMazeNoPathAlgorithmBacktrack2Deluxe2

Might be something that I've done wrong though

@devedse
Copy link
Owner

devedse commented Oct 7, 2020

I just found the issue, apparently I had to change 1 to 1L to make them long's.
When I did that though the performance went back to what it was. So around ~3.28 seconds instead of the 1.7 that we had initially :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants