Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secp256k1_ecmult_gen_context_build allocates 90KB on the stack #251

Closed
tdaede opened this issue May 15, 2015 · 10 comments
Closed

secp256k1_ecmult_gen_context_build allocates 90KB on the stack #251

tdaede opened this issue May 15, 2015 · 10 comments

Comments

@tdaede
Copy link
Contributor

tdaede commented May 15, 2015

secp256k1_ge_t prec[1024];
secp256k1_gej_t precj[1024]; /* Jacobian versions of prec. */

Microcontrollers have expensive SRAM and cheap Flash. These should be turned into precomputed tables, at least as an option.

@DavidEGrayson
Copy link

There is already a discussion happening in issue #189 about how to make this library work well on microcontrollers. I suspect it would be best to just have a precomputed context that is neither created nor destroyed, and gets stored in flash, but I haven't looked into it.

@laanwj
Copy link
Member

laanwj commented May 16, 2015

Precomputing the tables and storing them in the executable could also help in the libbitcoinconsensus case, where we don't really want a dynamically allocated context.

@gmaxwell
Copy link
Contributor

Precomputing verify would be very undesirable-- it's 1.3MB with current settings (which are optimal for performance, when enough memory is available).

Libbitcoinconsensus really should be really using a context in order to avoid the constant dynamic memory allocations all through its internals which it currently has.

@laanwj
Copy link
Member

laanwj commented May 16, 2015

The context brings a lot of issues for users, and risks to use more memory in total.
Every client of secp256k1 makes its own context, all lugging along 1.3MB of the same read-only data.
Say a program uses both secp256k1 directly and through libbitcoin_consensus. This means (at least) two contexts. Passing an existing secp256k1 context to libbitcoin_consensus would leak implementation details.
Also it is not clear whether the libbitcoin_consensus contexts may be shared between threads. If not, this is 1.3MB per thread. If they can, this needs to be documented well as it is bound to cause confusion.
Hence I much prefer stateless, context-less functions.

@sipa
Copy link
Contributor

sipa commented May 16, 2015 via email

@tdaede
Copy link
Contributor Author

tdaede commented May 18, 2015

There are actually two problems here:
One is that signing context initialization currently uses the size of the context, plus over 90K of temporary data. In my case, the temporary data causes me to exceed my RAM budget.
The second is that, in some use cases (including mine), I have memory-mapped Flash that can store the contexts instead. This totally avoids the problem of context initialization.

I am not sure it is worth trying to reduce the temporary RAM usage, because the devices where that is the problem generally also have memory-mapped Flash.

@DavidEGrayson
Copy link

The STM32F205RE microcontroller on the TREZOR has 512 KB of flash and 128 KB of RAM, and it is capable of doing ECDSA signing. So hopefully a signing-capable context can be made to fit within that amount of flash. Of course, the smaller we can make it, the better.

I imagine you would run a utility on your computer that uses libsecp256k1 to generate a (randomized?) context for signing, and then it would save it in the form of C code which can be compiled by the cross-compiler for the microcontroller.

@sipa
Copy link
Contributor

sipa commented May 19, 2015 via email

@gmaxwell
Copy link
Contributor

DavidEGrayson: the current signing context is 65536 bytes, and it can be trivially adjusted to whatever size makes sense. The size grows exponentially for linear increases in performance-- I've thought it would be good to support a mode that uses less but since realizing that all the current mid sized microcontrollers have memory mapped flash it hardly seems worth carrying around extra support code for smaller sizes.

Tdaede suggested on IRC that he was thinking about making a tool that uses the current precomp code to write out a header, and having a switch to use the prefabricated header; which I said sounded good at least for the signing context.

@gmaxwell
Copy link
Contributor

Fixed by #254

real-or-random added a commit that referenced this issue Sep 5, 2019
dcb2e3b variable signing precompute table (djb)

Pull request description:

  This pull request gives an option to reduce the precomputed table size for the signing context (`ctx`) by setting `#define ECMULT_GEN_PREC_BITS [N_BITS]`.

  Motivation: Per #251 and #254, the static table can be reduced to 64kB. However, this is still too big for some of my embedded applications. Setting `#define ECMULT_GEN_PREC_BITS 2` produces a 32kB table at a tradeoff of about 75% of the signing speed. Not defining this value will default to the existing implementation of 4 bits. Statistics:

  ```
  ECMULT_GEN_PREC_BITS = 1
  Precomputed table size: 32kB
  ./bench_sign
  ecdsa_sign: min 195us / avg 200us / max 212us

  ECMULT_GEN_PREC_BITS = 2
  Precomputed table size: 32kB
  ./bench_sign
  ecdsa_sign: min 119us / avg 126us / max 134us

  ECMULT_GEN_PREC_BITS = 4 (default)
  Precomputed table size: 64kB
  ./bench_sign
  ecdsa_sign: min 83.5us / avg 89.6us / max 95.3us

  ECMULT_GEN_PREC_BITS = 8
  Precomputed table size: 512kB
  ./bench_sign
  ecdsa_sign: min 96.4us / avg 99.4us / max 104us
  ```

  Only values of 2 and 4 make sense. 8 bits causes a larger table size with no increase in speed. 1 bit runs, actually, but does not reduce table size and is slower than 2 bits.

ACKs for top commit:
  real-or-random:
    ACK dcb2e3b verified that all changes to the previous ACKed 1d26b27 were due to the rebase
  jonasnick:
    ACK dcb2e3b read the code and tested various configurations with valgrind

Tree-SHA512: ed6f68ca23ffdc4b59d51525336b34b25521233537edbc74d32dfb3eafd8196419be17f01cbf10bd8d87ce745ce143085abc6034727f742163f7e5f13f26f56e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants