Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 64bit interpreter review #752

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
198 changes: 198 additions & 0 deletions _posts/2024-03-20-#29221.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
layout: pr
date: 2024-03-20
title: "Implement 64 bit arithmetic op codes in the Script interpreter"
pr: 29221
authors: [Christewart]
components: ["consensus"]
host: christewart
status: upcoming
commit:
---

Many future extensions of the bitcoin protocol - such as [OP_TLUV](https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2021-September/019419.html) - want to create smart contracts based on the amount of satoshis in a bitcoin output.

Unfortunately, Satoshi values can be up to 51 bits in value, but we can only do math on 32 bit values in Script.

This means we cannot safely do math on Satoshi values in the interpreter without 64bit arithmetic!

This PR introduces 64bit arithmetic op codes and a new (to the interpreter) number encoding.

## How arithmetic works currently in Script

Bitcoin has an embedded programming language called Script. Script has op codes such as `OP_ADD` and `OP_SUB`
that allow you to pop 2 elements off of the stack, perform the arithmetic operation and push the
resulting value back onto the stack. For instance, if my Script is using the old op codes

### Example of how arithmetic currently works
```
OP_1 OP_2 OP_ADD OP_3 OP_EQUAL
```

```
[Stack: ] --(OP_1)--> [Stack: 1] --(OP_2)--> [Stack: 1, 2] --(OP_ADD)--> [Stack: 3] --(OP_3)--> [Stack: 3, 3] --(OP_EQUAL)--> [Stack: true]
```

Explanation:

1. The initial state of the stack is empty.
2. `OP_1` pushes the number 1 onto the stack.
3. `OP_2` pushes the number 2 onto the stack.
4. `OP_ADD` pops the top two elements (2 and 1), adds them, and pushes the result (3) onto the stack.
5. `OP_3` pushes the number 3 onto the stack.
6. `OP_EQUAL` pops the top two elements (3 and 3), compares them, and if they are equal, verification succeeds. Otherwise, verification fails.

In this case, since the values are equal, the verification succeeds, and the final state of the stack is empty.


Simple enough, now lets create a Script with some larger number values that would not be possible without this PR.

In this example, we are going to assume we are doing math on 1,000 BTC. In satoshis, this number is 100,000,000,000.
Encoded as [CScriptNum](https://github.com/bitcoin/bitcoin/blob/1105aa46dd1008c556b8c435f1efcb9be09a1644/src/script/script.h#L225) the hex representation for 1,000 BTC is `0x00e8764817`
```
0x00e8764817 0x00e8764817 OP_ADD 0x00d0ed902e OP_EQUAL
```

```
[Stack: ] --(0x00e8764817)--> [Stack: 0x00e8764817] --(0x00e8764817)--> [Stack: 0x00e8764817, 0x00e8764817] --(OP_ADD)--> [Stack: OP_ADD ERROR]
```

Explanation:

1. The initial state of the stack is empty.
2. `0x00e8764817` pushes the hexadecimal value 0x00e8764817 onto the stack.
3. `0x00e8764817` pushes another instance of the same value onto the stack.
4. `OP_ADD` consumes the two top stack elements and FAILS with an [overflow exception](https://github.com/bitcoin/bitcoin/blob/1105aa46dd1008c556b8c435f1efcb9be09a1644/src/script/script.h#L248)

This [version fails because OP_ADD can only consume 4 byte inputs](https://github.com/bitcoin/bitcoin/blob/1105aa46dd1008c556b8c435f1efcb9be09a1644/src/script/interpreter.cpp#L961).
Even worse, this does not give the Script programmer the ability to handle the exception thrown by CScriptNum.

## How arithmetic works with #29221

Three key differences exist in how 64-bit opcodes function compared to their previous counterparts:

1. **Enhanced Precision**: They support 64 bits of precision, enabling more accurate arithmetic operations.

2. **Error Handling Capability**: These opcodes provide error handling by pushing either true or false onto the stack, depending on whether the operation succeeds or fails.

3. **Standardized Encoding**: They utilize a consistent fixed-length 8-byte number encoding format, aligning with conventions elsewhere in the Bitcoin codebase, such as in `CTxOut::nValue`.

As an illustration of the third difference, consider the encoding of 1,000 BTC. It would now be represented in the same format as seen on a block explorer (`0x00e8764817000000`) rather than `0x00e8764817` which is the CScriptNum encoding.

### Example: Adding 1,000 BTC together with OP_ADD64

Here's the same example from above with `OP_ADD64` rather than `OP_ADD` with our new little endian encoding format rather than `CScriptNum`:
```
0x000e876481700000 0x000e876481700000 OP_ADD64 OP_DROP 0x001d0ed902e00000 OP_EQUAL
```

```
[Stack: ] --(0x00e8764817000000)--> [Stack: 0x00e8764817000000]
--(0x00e8764817000000)--> [Stack: 0x00e8764817000000, 0x00e8764817000000]
--(OP_ADD64)--> [Stack: 0x01d0ed902e000000, true]
--(OP_DROP)--> [Stack: 0x01d0ed902e000000]
--(0x01d0ed902e000000)--> [Stack: 0x01d0ed902e000000, 0x01d0ed902e000000]
--(OP_EQUAL)--> [Stack: true]
```

Explanation:

1. The initial state of the stack is empty.
2. `0x00e8764817000000` pushes the hexadecimal value `0x00e8764817000000` onto the stack (representing 100,000,000,000 satoshis).
3. Another instance of `0x00e8764817000000` is pushed onto the stack.
4. `OP_ADD64` attempts to pop the top two elements (`0x00e8764817000000` and `0x00e8764817000000`) to add them. The correct result of the addition `0x01d0ed902e000000` (representing 200,000,000,000 satoshis) is pushed onto the stack first, followed by `true`, indicating that the arithmetic executed correctly.
5. `OP_DROP` drops the `true` pushed onto the stack by OP_ADD64 indicating the arithmetic operation was successfull.
6. `0x001d0ed902e00000` pushes the hexadecimal value `0x001d0ed902e00000` onto the stack (representing 200,000,000,000 satoshis).
7. `OP_EQUAL` compares the two top stack values `0x001d0ed902e00000` and pushes `true` onto the stack

## Design questions

### Signed vs unsigned arithmetic

Much of the implementation uses code from the [elements blockchain](https://github.com/ElementsProject/elements/). In elements they implemented new arithmetic opcodes as fixed size 64 bit signed integers.
Do we have a use case for using signed math rather than unsigned math? The satoshi example would work with unsigned math (outputs can't have negative value) even though sats are encoded
as `int64_t` in the bitcoin protocol. Signed integer overflow is [undefined behavior in the cpp spec](https://en.cppreference.com/w/cpp/language/ub)

### Existing opcode interop

What is the best way to interop with existing op codes such as `OP_WITHIN`, `OP_SIZE`, `OP_CHECKSIGADD`, etc? They may be explicitly or implicitly converted:

#### Explicit conversion op codes

Elements and, as a by product, this PR implement explicit casting op codes. They are `OP_SCRIPTNUMTOLE64`, `OP_LE64TOSCRIPTNUM`, `OP_LE32TOLE64`.

This means a Script programmer must explicitly cast stack tops in an opcode. For instance, from our example above
```
0x000e876481700000 0x000e876481700000 OP_ADD64 OP_DROP OP_LE64TOSCRIPTNUM OP_SIZE OP_8 OP_EQUALVERIFY OP_SCRIPTNUMTOLE64 0x001d0ed902e00000 OP_EQUAL
```

#### Implicit conversion opcodes

You could redefine opcodes such as `OP_WITHIN`, `OP_SIZE`, `OP_CHECKSIGADD` to be context dependent on the SigVersion. Lets look at a potential implementation for `OP_SIZE`

```c++
case OP_SIZE:
{
// (in -- in size)
if (stack.size() < 1)
return set_error(serror, SCRIPT_ERR_INVALID_STACK_OPERATION);

if (sigversion == SigVersion::BASE || sigversion == SigVersion::WITNESS_V0 || sigversion == SigVersion::TAPROOT || sigversion == SigVersion::TAPSCRIPT) {
//this is for backwards compatability, we always want to use the old numbering
//system for already deployed versions of the bitcoin protocol
CScriptNum bn(stacktop(-1).size());
stack.push_back(bn.getvch());
} else {
// All future soft forks assume 64-bit math.
// Don't push variable length encodings onto
// the stack when we are using SigVersion::TAPSCRIPT_64BIT.
int64_t result = stacktop(-1).size();
push8_le(stack, result);
}
}
```

The key here is the `else` clause which assumes that every `SigVersion` that is NOT specified in the `if` clause uses 64bit signed integer fixed length numbers.
This removes the need for conversion/casting op codes and makes the developer experience much nicer, IMO.


### Encoding debate

There is a debate ongoing along 2 dimensions

1. Whether fixed size encodings will encumber us for features introduced in future soft forks (such as 256bit scalar arithmetic)
2. Whether moving away from `CScriptNum` will be too disruptive to the ecosystem and force everyone to update their tooling.

I'm not going to go into further detail about this debate as its been written about at length on [delving bitcoin](https://delvingbitcoin.org/t/64-bit-arithmetic-soft-fork/397?u=chris_stewart_5)

## Questions

1. Did you review the PR? [Concept ACK, approach ACK, tested ACK, or NACK](https://github.com/bitcoin/bitcoin/blob/master/CONTRIBUTING.md#peer-review)? What was your review approach?

2. What does the CScriptNum [`nMaxNumSize`](https://github.com/bitcoin/bitcoin/blob/015ac13dcc964a31ef06dfdb565f88f901607f0e/src/script/script.h#L245) parameter do?

3. Why was the [`fRequireMinimal`](https://github.com/bitcoin/bitcoin/blob/015ac13dcc964a31ef06dfdb565f88f901607f0e/src/script/script.h#L244) flag introduced to `CScriptNum`?

4. Is #29221 malleability safe? Why?

5. What 2 opcodes accept 5 byte numeric inputs?

6. The Script in the `Explicit conversion op codes` section will not work. Can you guess why? Hint: it has something to do with `OP_LE64TOSCRIPTNUM`.

7. Is the `OP_SIZE` implementation safe for future soft forks? Hint: look at the control flow.

8. What should we do with the old opcodes (`OP_ADD`, `OP_SUB`)?

<!-- TODO: After a meeting, uncomment and add meeting log between the irc tags
## Meeting Log

### Meeting 1

{% irc %}
-->
<!-- TODO: For additional meetings, add the logs to the same irc block. This ensures line numbers keep increasing, avoiding hyperlink conflicts for identical line numbers across meetings.

### Meeting 2

-->
{% endirc %}