Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Streaming SHA256 ops. #817

Closed
wants to merge 1 commit into from

Conversation

roconnor-blockstream
Copy link
Contributor

@roconnor-blockstream roconnor-blockstream commented Feb 14, 2020

My proposal is

  • OP_SHA256INITIALIZE to pop a bytestring and push SHA256 context creating by adding the bytestring to the initial SHA256 context.
  • OP_SHA256UPDATE to pop a SHA256 context and bytestring, and push an updated context by adding the bytestring to the data stream being hashed.
  • OP_SHA256FINALIZE to pop a SHA256 context and bytestring, and push a SHA256 hash value after adding the bytestring and completing the padding.

I propose defining the SHA256-context as a bytestring that is a 40 to 103 bytes in length where

  • The first 32-bytes is the SHA256 midstate
  • The next 8-bytes is the (little-endian) unsigned integer value containing the number of bits of data processed so far. This value will always be a multiple of 8.
  • The next 0 to 63 bytes is the store of data processed but not compressed. The length of this "field" is X/8 % 64 (equiv. (X % 512) / 8) where X is the integer value of the previous field.

Any invalid SHA256-context will cause the operations that process the context to fail.
If any input would cause the SHA256 bit counter to overflow, the operation will fail.

I could easily be convinced to use bytes instead of bits in the second field. Bits is nicely aligned with how SHA-256 works, and the size is perfectly aligned. OTOH Bytes is what the SHA256 internal state uses in practice and would be marginally faster to processes.

Motivation: There already exists a OP_SH256 opcode that performs computes SHA256 computations. However because the MAX_SCRIPT_ELEMENT_SIZE is 520 bytes, that is the maximum message size that can OP_SHA256 can operate on. These proposed operations allow us to circumvent the MAX_SCRIPT_ELEMENT_SIZE limit on message size; however these opcode might be subject to abuse (e.g. SHA-256 with freestart).

src/crypto/sha256.cpp Outdated Show resolved Hide resolved
src/crypto/sha256.cpp Show resolved Hide resolved
src/script/interpreter.cpp Outdated Show resolved Hide resolved
src/crypto/sha256.cpp Outdated Show resolved Hide resolved
@dgpv
Copy link
Contributor

dgpv commented Feb 14, 2020

re OP_SHA256_UPDATE_AND_FINALIZE: Is finalize useful without a preceding write ? In a linear script without loops it seems that it is almost always will be preceded by some write, and if not, OP_0 OP_SHA256_UPDATE_AND_FINALIZE should be an equivalent

@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Feb 14, 2020

I've updated the proposal so that FINALIZE also does an update, just like INIT.

@dgpv
Copy link
Contributor

dgpv commented Feb 14, 2020

code looks good to me, utACK fb6840a

src/script/script.cpp Outdated Show resolved Hide resolved
@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Feb 25, 2020

I know it is bikeshedding, but I kinda want rename SHA256_WRITE (back) to SHA256_UPDATE.

@instagibbs
Copy link
Collaborator

instagibbs commented Feb 25, 2020

(just saw this) I presume this is some sort of simplicity groundwork?

@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Feb 25, 2020

(just saw this) I presume this is some sort of simplicity groundwork?

Nope. Taproot is effectively a Simplicity prerequisite, and I'd somewhat like these opcodes added along with taproot to make Covenants and Oracles a little bit easier by not having to contend with the 520 (or whatever) stack item limit.

@instagibbs
Copy link
Collaborator

instagibbs commented Feb 25, 2020

Oh I misread the top post, nevermind.


bool CSHA256::SafeWrite(const unsigned char* data, size_t len) {
const uint64_t SHA256_MAX = 0x1FFFFFFFFFFFFFFF; // SHA256's maximum allowed message length in bytes.
if (SHA256_MAX < bytes || SHA256_MAX - bytes < len) return false;
Copy link
Contributor

@dgpv dgpv Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for SHA256_MAX - MAX_SCRIPT_ELEMENT_SIZE in Load() would allow to get rid of three ifs (if !ctx.SafeWrite(...) -> ctx.Write(...)), because Write() will be always safe in regard to this check: the size of data on the stack cannot be bigger than MAX_SCRIPT_ELEMENT_SIZE. Why you think that doing this in Write() is better ? Note that in OP_SHA256_INITIALIZE it is even impossible to fail this check, because the CTX is new and can only increase up to MAX_SCRIPT_ELEMENT_SIZE. I think an if check is due in Load() and here this condition can just be checked with assert to be safe.

Copy link
Contributor

@dgpv dgpv Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because you want to allow for UPDATE with user-supplied midstate to be able to go up to SHA256_MAX exactly ?

Copy link
Contributor

@dgpv dgpv Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cold make Load() take the length and then check SHA_MAX - bytes < len inside Load().
Although this makes Load/Save api less generic. But if you want it to be more generic so it can be used in other places, you probably want that check inside Load() anyway, because in general you cannot prevent anyone to use just Write() instead of SafeWrite() after Load()

Copy link
Contributor Author

@roconnor-blockstream roconnor-blockstream Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I could write Load to return a int64_t that returns the number of bytes used so far (or alternatively the number of remaining bytes available to be processed), and return -1 for failure.

Is this really all worthwhile just to avoid one if statement in SHA256INITALIZE?

Copy link
Contributor Author

@roconnor-blockstream roconnor-blockstream Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we also get rid of SafeWrite which is kinda nice to get rid of.

Copy link
Contributor

@dgpv dgpv Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all three if statements that are in place for SafeWrite now are not needed if Write is always safe, if the check SHA_MAX - bytes < len is done inside Load() - at least in regard to manipulating the state from the script. The Load() can just fail if the subsequent Write() will be invalid. But my opinion is not strong on this - this is mostly bikeshedding. But I would like to add that check inside Load() would follow 'fail early' principle, which I favour.

Copy link
Contributor Author

@roconnor-blockstream roconnor-blockstream Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having Load take in an argument of the number of bytes I intend to write later is such an unusual API, that I am very hesitant to implement it that way.

Copy link
Contributor

@dgpv dgpv Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just rename Load to Prepare then :-)

Copy link
Contributor

@dgpv dgpv Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, my opinion on this is not strong. Speaking of returning the number of bytes still available to process from Load and then checking it might be more messy than SafeWrite approach, because you will need to have that check in two places (in UPDATE and FINALIZE)

uint64_t bits = ReadLE64(&vch[32]);
size_t buf_size = (bits >> 3) % 64;

if ((bits & 0x07) != 0 || vch.size() != 40 + buf_size) return false;
Copy link
Contributor

@dgpv dgpv Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment to state that bits & 0x07 check is to prevent malleability ?

Copy link
Contributor Author

@roconnor-blockstream roconnor-blockstream Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more than that: the SHA256 class is simply incapable of processing sub-byte data as it stands. But I can write a comment to that effect.

Copy link
Contributor

@dgpv dgpv Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you could just just do bits &= ~0x07 and not do the check, but this would introduce non-determinism, and the comment could tell the reader why it is not a good idea

Copy link
Contributor Author

@roconnor-blockstream roconnor-blockstream Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had hoped it goes without saying that one shouldn't mangle an invalidly serialized data blob into a valid one.

Copy link
Contributor

@dgpv dgpv Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validity of serialized data depends on the definition of the validity, but it is not explicitly specified (the serialization code in Save() is only one possible implementation of implicit specification). But, on the other hand, you can view the (bits & 0x07) != 0 as the statement of the de-facto specification "low-order bits should be zero", and in this light, no extra comment seems necessary.

src/script/script.h Outdated Show resolved Hide resolved
@JeremyRubin
Copy link
Contributor

JeremyRubin commented Jun 5, 2020

See some discussion of the concept here: https://lists.linuxfoundation.org/pipermail/lightning-dev/2019-October/002218.html There was some discussion on if it should be allowed to pass in midstates or not.

I think a lot of scripts can benefit from a batch update style API where you specify how many things to take off the stack.

@dgpv
Copy link
Contributor

dgpv commented Jun 5, 2020

One data point re the linked 2019 discussion is that I encountered the limitation of OP_CAT in practice, where it limited the number of possible outputs (520 byte stack item limit) and thus one option of a complex covenant was not possible. Some of the outputs were blinded, and thus much larger than normal outputs, and I could fit two blinded outputs and two non-blinded outputs plus a fee output in a transaction, but third blinded output was too much. With streaming SHA256 operations, this limitation does not exist.

@dgpv
Copy link
Contributor

dgpv commented Jun 5, 2020

This is of course specific to Elements where blinded outputs are possible (which are larger than normal outputs). Without blinding, the 520 bytes limit might not be a problem for practical purposes.

@JeremyRubin
Copy link
Contributor

JeremyRubin commented Jun 6, 2020

It's true for checktemplateverify that it imposes a limitation on how many outputs/inputs you can add dynamically, so it's true beyond elements.

@dgpv
Copy link
Contributor

dgpv commented Jun 6, 2020

This particular limitation has no connection to CTV, because fixing outputs hash via CHECKSIGFROMSTACK does not impose any limits by itself, as outputs hash that is checked comes from just a normal sighash for segwit. This is about dynamically generating this outputs hash within the script from the fixed part (that enforces the covenant) and dynamic part (explicit fees and non-restricted outputs). CAT is used for that. Full outputs data need to be constructed on the stack before doing SHA256 on it, and this is where the 520 byte stack item datasize limit becomes the factor.

@JeremyRubin
Copy link
Contributor

JeremyRubin commented Jun 6, 2020

@dgpv that's exactly what I'm saying; it imposes a limit if you're dynamically adding data to a CTV template using op_cat. Hence sha256stream being preferred there too

@gwillen
Copy link
Contributor

gwillen commented Aug 5, 2020

This is kind of ancient I guess, but: Rather than have "update" plus "initialize-and-update", just have "update" and "finalize" treat an empty stack element as an empty context. Then you can ditch initialize entirely.

@roconnor-blockstream
Copy link
Contributor Author

roconnor-blockstream commented Aug 5, 2020

gwillen, if I understand your comment, that would entail requiring an extra OP_0 opcode to create the empty context in all typical uses of this functionality.

@apoelstra
Copy link
Member

apoelstra commented Sep 5, 2021

Closing, superceded by #1020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants