Optimization #24

ghost · 2020-05-08T17:39:00Z

I changed the starting points and limited the number of kangaroos. I am currently working on a CPU and I have received the following results:

CPU i3, 534.2K j/s, 64 bit key, 100 tests - mean runtime ~ 3 hours

I have to say how the tests were conducted so that there were no questions about this. Each time a key was found, a new random 64-bit key was created and the search algorithm was run again

Now I'm busy porting code to OpenCL, this is a big and complicated job for me.

I think it's still possible to optimize the key finding time by calculating the correct jump size for each kangaroo.

Any ideas?

CatfishCrypt · 2020-05-08T19:45:36Z

how did you do this:
"Each time a key was found, a new random 64-bit key was created and the search algorithm was run again"
the random part more specifically?

ghost · 2020-05-08T21:05:35Z

the random part more specifically?

Generate random number 2^64:2^65
Create secp256k1 point with generated number
Run kangaroo
After key is find and valid - repeat

I did this to understand how much faster the algorithm works with the changes I made. The standard algorithm completes on average in 5 hours

CatfishCrypt · 2020-05-08T23:34:27Z

Interesting...do you have to manually load each new key and then run Kang or do you have this automated somehow?

ghost · 2020-05-09T02:22:12Z

I have automated this. Everything is very simple there, I added a loop and deleted the command line arguments

CatfishCrypt · 2020-05-09T04:28:22Z

Simple for someone who knows their way away programming :)
Do you feel you are finding keys faster via this way?

JeanLucPons · 2020-05-09T04:51:08Z

Hi,

Please be more precise:
You translated (+2^64) the start range of Tame or Wild ?
You 100 keys are the same or uniformly distributed ?
To make average it is important to have key uniformly distributed in the range.

You can speed up the search by spreading the wild to -N/8,N/8, even -N/16,N/16
CreateHerd() function in Kangaroo.cpp
This optimize the overlap between tame and wild, and you can achieve up to 1.25sqrt(N) by compressing the wild but this has side effects when the key to search is near the border of the range and increase also collision between wild...

CatfishCrypt · 2020-05-09T05:49:45Z

Jean Luc,

Using the Div8 in CreateHerd?

JeanLucPons · 2020-05-09T06:27:27Z

Yes, replace the code between in Kangaroo.cpp:574,579

by

    if((j + firstType) % 2 == TAME) {
      // Tame in [0..N]
      d[j].Rand(rangePower);
    } else {
      // Wild in [-N/8..N/8]
      d[j].Rand(rangePower-2);
      d[j].ModSubK1order(&rangeWidthDiv8);
    }

You win a lot for key around the center but keys on the border will longer to solve , in average you still win.

CatfishCrypt · 2020-05-09T07:00:05Z

Can it be put in lines 581:586? Lines 574:579 are grayed out; that's the symmetry code-grayed out.

CatfishCrypt · 2020-05-09T07:11:56Z

Current code:

#ifdef USE_SYMMETRY

		// Tame in [0..N/2]
		d[j].Rand(rangePower - 1);
		if ((j + firstType) % 2 == WILD) {
			// Wild in [-N/4..N/4]
			d[j].ModSubK1order(&rangeWidthDiv4);
		}

#else

		// Tame in [0..N]
		d[j].Rand(rangePower);
		if ((j + firstType) % 2 == WILD) {
			// Wild in [-N/2..N/2]
			d[j].ModSubK1order(&rangeWidthDiv2);
		}

#endif

JeanLucPons · 2020-05-09T07:38:05Z

It is well in the second part 574:579, USE_SYMMETRY is not defined.
If in your code the it is 581:586, you may have added line of code ?

Kangaroo/Kangaroo.cpp

Line 574 in ea13121

// Tame in [0..N]

  // Choose random starting distance
  if(lock) LOCK(ghMutex);

  for(uint64_t j = 0; j<nbKangaroo; j++) {

#ifdef USE_SYMMETRY

    // Tame in [0..N/2]
    d[j].Rand(rangePower - 1);
    if((j+ firstType) % 2 == WILD) {
      // Wild in [-N/4..N/4]
      d[j].ModSubK1order(&rangeWidthDiv4);
    }

#else

    if((j + firstType) % 2 == TAME) {
      // Tame in [0..N]
      d[j].Rand(rangePower);
    } else {
      // Wild in [-N/8..N/8]
      d[j].Rand(rangePower-2);
      d[j].ModSubK1order(&rangeWidthDiv8);
    }

#endif

    pk.push_back(d[j]);

  }

ghost · 2020-05-09T07:52:46Z

Do you feel you are finding keys faster via this way?

Yes

ghost · 2020-05-09T08:11:41Z

Hi @JeanLucPons

You 100 keys are the same or uniformly distributed ?
You translated (+2^64) the start range of Tame or Wild ?
You can speed up the search by spreading the wild to -N/8,N/8, even -N/16,N/16
This optimize the overlap between tame and wild, and you can achieve up to 1.25sqrt(N) by compressing the wild but this has side effects when the key to search is near the border of the range and increase also collision between wild...

I generated keys randomly, duplicates are excluded. I also created a file with the keys and ran the test again to make sure that there were no duplicate keys and the results were correct.
The result shows a significant increase in the key finding speed.

I tested the keys and in a small range (2^32 - 2^63) in all ranges the speed of finding the key increases.

I use Four kangaroo method. The problem is that the more kangaroos you use, the slower the key will be found. Optimal use 4 kangaroos - 2T and 2W. You can find research on it.
Now, using 4 kangaroos, we can parallelize their work by creating groups in which 4 kangaroos will work.
If you want maximum speed, all herds (groups) should work on the same key, but at the same time have different initial parameters.
I'm working on parallelization right now

CatfishCrypt · 2020-05-09T08:15:31Z

@AndrewBrz

when you say 4 kangaroos, you mean 4 herds of kangaroos? I've read something to that extent T1 T2, W1 W2. How are you implementing that?

ghost · 2020-05-09T08:45:01Z

@CatfishCrypt
4 kangaroos it is 1 herd, T1, T2, W1 and W2

CatfishCrypt · 2020-05-09T09:25:01Z

Wow...Doesn't seem right, mathematically speaking. Huge space, with only 4 kangaroos versus thousands/millions.

JeanLucPons · 2020-05-09T09:32:20Z

Ok

Make test on 40 or 48 bit range and large number of trials to get good average estimation. Note that the error in proportional to 1/sqrt(trials). Rather than giving a time comparison, give the ratio of sqrt(N), Standard method is 2.08sqrt(N) , for 4 kangaroos (non paralell) around 1.78sqrt(N).

Take also in consideration the GPU handle large number of kangaroos, there is some users with several GPU that use 2^24 or more kangaroos with large distinguished bit number on large range.
The GPU performance is much depend on the access to global memory, so the less access to global memory you have, better it is.

I will work soon on a distributed version (client/server) where the server will handle DP and collision check.

ghost · 2020-05-09T09:33:58Z

Wow...Doesn't seem right, mathematically speaking. Huge space, with only 4 kangaroos versus thousands/millions.

You can create many kangaroo herds, but all of them should work independently of each other, but you can try and add communication between them (I still have no idea how to do this without sacrificing speed). There should be 4 kangaroos in one herd. This is mathematically correct

ghost · 2020-05-09T09:37:40Z

Make test on 40 or 48 bit range and large number of trials to get good average estimation. Note that the error in proportional to 1/sqrt(trials). Rather than giving a time comparison, give the ratio of sqrt(N), Standard method is 2.08sqrt(N) , for 4 kangaroos (non paralell) around 1.78sqrt(N).

I will do all the tests as I finish on the OpenCL version

CatfishCrypt · 2020-05-09T09:46:14Z

@JeanLucPons

The server version is exactly what I was thinking! That would be awesome...and I believe would reduce solution time by Eleventy Gabillionsqrt(N) :)

JeanLucPons · 2020-05-09T12:20:10Z

Make test on 40 or 48 bit range and large number of trials to get good average estimation. Note that the error in proportional to 1/sqrt(trials). Rather than giving a time comparison, give the ratio of sqrt(N), Standard method is 2.08sqrt(N) , for 4 kangaroos (non paralell) around 1.78sqrt(N).

I will do all the tests as I finish on the OpenCL version

You write an OpenCL version for the standard method of for 4 kangaroo method ?

JeanLucPons · 2020-05-09T14:42:19Z

I tested tested the Wild in [-N/8..N/8], 1000 trials, 40 bits search.
2.139 is the estimation of the expected average taking into account the DP overhead.

Original code:
-t 2 -d 5 in40_1000.txt
[999] 2^21.926 Dead:5 Avg:2^21.147 DeadAvg:1.4 (2.214sqrt(N) 2.139sqrt(N))

With the modification above:
-t 2 -d 5 in40_1000.txt
[999] 2^20.585 Dead:3 Avg:2^20.939 DeadAvg:2.6 (1.917sqrt(N) 2.139sqrt(N))

Gain: 15%
2 times more dead due to the compression of wild.
This trick works only for parallel version with enough kangaroo.

ghost · 2020-05-09T20:41:03Z

You write an OpenCL version for the standard method of for 4 kangaroo method ?

Standart, three method and 4 method. There are not big changes in the code, so you can add and remove kangaroos for tests.

ghost · 2020-05-09T20:43:34Z

Gain: 15%
2 times more dead due to the compression of wild.

how much kangaroo was used T and W?

ghost · 2020-05-09T20:51:55Z

@JeanLucPons
Change this line
bool Point::equals(Point &p) { return x.IsEqual(&p.x) && y.IsEqual(&p.y) && z.IsEqual(&p.z); }
to
bool Point::equals(Point &p) { return x.IsEqual(&p.x); }

This change will only check X coordinates. Another optimization is to calculate only the X coordinate, which allows us to accelerate the speed of calculations (there are research and you can find and read them)
There are many more optimizations for jumps and starting points.

CatfishCrypt · 2020-05-09T21:14:40Z

What changes have you experimented with for starting and jump points? More interested in how you changed your start points, you change them to start other than around the center?

JeanLucPons · 2020-05-10T03:13:34Z

Thanks Andrew for your job ;)

There was 1024T and 1024W.
Concerning the equals, I prefer to keep as it is, if you hit a symmetric point (even without using symmetry), the check with keyToSearch and keyToSearchNeg will fail and it can miss a point. If you want to make such an optimization, do it locally by using Int::IsEqual(). For instance, in the hashTable, this optimization is done locally and only a part of x is tested.
Concerning jumps, take care of keeping compatibility with workfiles. Of course depending on the used method they probably won't be compatible but it would be great if 2 users using the same method can share work files. There is a version field that can be used. I remarked that I missed the version check in the 1.4, F.ck !

kpot87 · 2020-05-10T19:00:50Z

@AndrewBrz if you make AMD support please share with others

DvR4 · 2020-05-13T13:54:14Z

if you make AMD support please share with others

I think he will not be share. After your proposal, he does not respond.

JeanLucPons · 2020-05-13T14:26:52Z

That's too bad. I hope he will share his work.

ghost · 2020-05-14T12:11:58Z

when I finish working on the algorithm, I will share the code

dem10 · 2020-05-14T13:16:09Z

when I finish working on the algorithm, I will share the code

It's great. I'm glad you took the opportunity to make a tool for red cards.

kpot87 · 2020-06-01T13:25:48Z

@AndrewBrz hi! Any new about AMD support? Tnx

ghost · 2020-06-01T13:40:44Z

@kpot87 hi!
Yes, I am testing and fixing bugs. Opencl kernel shows excellent results, I implemented it in the official bitcoin library. But there are still bugs that need to be fixed.
I don’t have much free time to finish it quickly.

These are the results of kernel testing, they indicate that the kernel is optimized as much as possible

JeanLucPons · 2020-06-02T14:08:20Z

Hi Andrew,
Great news.
Keep up the good work !

RB61 · 2020-06-03T11:14:57Z

I changed the starting points and limited the number of kangaroos. I am currently working on a CPU and I have received the following results:

CPU i3, 534.2K j/s, 64 bit key, 100 tests - mean runtime ~ 3 hours

I have to say how the tests were conducted so that there were no questions about this. Each time a key was found, a new random 64-bit key was created and the search algorithm was run again

Now I'm busy porting code to OpenCL, this is a big and complicated job for me.

I think it's still possible to optimize the key finding time by calculating the correct jump size for each kangaroo.

Any ideas?

Hi... any progress on opencl version?
The framework seems to be ready (brichard19/BitCrack). The necessary basic functions are ready (clMath/Secp256k1.cl).

ghost · 2020-06-03T11:35:52Z

hi @RB61
I wanted to take BitCrack opencl code, but after the tests I refused. Their code is not stable, my code shows the best results. I'm testing and making some changes right now. Some more work left.

This is the BitCrack OpenCl kernel

RB61 · 2020-06-08T12:22:32Z

hi @RB61
I wanted to take BitCrack opencl code, but after the tests I refused. Their code is not stable, my code shows the best results. I'm testing and making some changes right now. Some more work left.

This is the BitCrack OpenCl kernel

The opencl code as has a huge bug that brichard19 has not fixed. The cuda version works fine.
see here:
brichard19/BitCrack#223

RB61 · 2020-06-21T01:30:58Z

Hi @AndrewBrz ... any progress?

Optimization #24

Optimization #24

Comments

ghost commented May 8, 2020 • edited by ghost

CatfishCrypt commented May 8, 2020

ghost commented May 8, 2020

CatfishCrypt commented May 8, 2020

ghost commented May 9, 2020

CatfishCrypt commented May 9, 2020

JeanLucPons commented May 9, 2020

CatfishCrypt commented May 9, 2020

JeanLucPons commented May 9, 2020

CatfishCrypt commented May 9, 2020

CatfishCrypt commented May 9, 2020

JeanLucPons commented May 9, 2020 • edited

ghost commented May 9, 2020 • edited by ghost

ghost commented May 9, 2020 • edited by ghost

CatfishCrypt commented May 9, 2020

ghost commented May 9, 2020

CatfishCrypt commented May 9, 2020

JeanLucPons commented May 9, 2020

ghost commented May 9, 2020

ghost commented May 9, 2020

CatfishCrypt commented May 9, 2020

JeanLucPons commented May 9, 2020

JeanLucPons commented May 9, 2020 • edited

ghost commented May 9, 2020 • edited by ghost

ghost commented May 9, 2020

ghost commented May 9, 2020

CatfishCrypt commented May 9, 2020

JeanLucPons commented May 10, 2020

kpot87 commented May 10, 2020

DvR4 commented May 13, 2020

JeanLucPons commented May 13, 2020

ghost commented May 14, 2020

dem10 commented May 14, 2020 • edited

kpot87 commented Jun 1, 2020

ghost commented Jun 1, 2020 • edited by ghost

JeanLucPons commented Jun 2, 2020

RB61 commented Jun 3, 2020

ghost commented Jun 3, 2020

RB61 commented Jun 8, 2020 • edited

RB61 commented Jun 21, 2020

ghost commented May 8, 2020 •

edited by ghost

JeanLucPons commented May 9, 2020 •

edited

ghost commented May 9, 2020 •

edited by ghost

ghost commented May 9, 2020 •

edited by ghost

JeanLucPons commented May 9, 2020 •

edited

ghost commented May 9, 2020 •

edited by ghost

dem10 commented May 14, 2020 •

edited

ghost commented Jun 1, 2020 •

edited by ghost

RB61 commented Jun 8, 2020 •

edited