Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stratum v2 Template Provider (take 3) #29432

Draft
wants to merge 40 commits into
base: master
Choose a base branch
from
Draft

Stratum v2 Template Provider (take 3) #29432

wants to merge 40 commits into from

Conversation

Sjors
Copy link
Member

@Sjors Sjors commented Feb 14, 2024

Based on on @Fi3's master...Fi3:bitcoin:PatchTemplates which is based on @ccdle12's #27854. I rebased it and re-wrote the commit history. Compared to #28983 it introduces EllSwift in the handshake and fixes various bugs. I used that opportunity to change the branch name, which makes testing against SRI slightly easier. There's no conceptual discussion on #28983 so it can be ignored by reviewers.

See docs/stratum-v2.md for a brief description of Stratum v2 and the role of Bitcoin Core in that system..

What to test and review?

I'll make separate pull requests for parts that are ready for detailed review.

See the testing guide for various ways to test this PR. This branch is actively used by (testnet) pools, so it should be ready for high level review.

Related useful PRs

Related useful issues

Implementation notes

There's roughly three layers:

  1. Noise encryption Stratum v2 Noise Protocol #29346
  2. Messages and transport layer
  3. The Template Provider
  • the ci: commits (Support self-hosted Cirrus workers on forks #29274) are there to facilitate PR's against this branch, but they are not blocking for Stratum v2
  • the commits that move transport.h and some other stuff from node to common are not blocking. But in the longer run I'd like to see process separation between the node and the template provider.
  • I will occasionally add commits to undo bug fixes, in order to stay compatible with the SRI main branch. Those will get dropped over time and can be ignored.

Contributing

If you want to help out with any of the issues below, please open a PR to my fork. I will then squash your commits into my own where needed.

Things left todo

Spec

  • modify spec to use ProvideMissingTransactions? (followup?)
  • pick a good default for default_coinbase_tx_additional_output_size (see getblocktemplate RPC)

Networking

  • add -sv2bind and -sv2allowip
  • optional -sv2cert
  • drop Sv2TemplateProvider::SendBuf, reuse p2p socket handling if possible
  • limit number of connected clients
  • maybe limit (number of) coinbase_output_max_additional_size
  • TMP / TODO comments at the top of sv2_messages.h

Testing

  • expand sv2_template_provider_tests
  • add transport fuzzer
  • add template provider fuzzer

Template generation and updating

  • group templates with the same coinbase_tx_additional_output_size
  • don't generate templates when no client is connected

Misc

Potential followups

  • implement Noise protocol and mock client in Python, add functional tests (based on test/sv2_template_provider_tests.cpp)
  • use process separation, e.g. a bitcoin-tp binary, see multiprocess.md
  • make template updates push based, on top of Cluster Mempool, see docs/stratum-v2.md (for new blocks it's already push based)
  • push empty template for the next block (downstream can ignore or use, Implement a clever way to create and manage future jobs  stratum-mining/stratum#715)
    • send prevhash for this template as soon as any new block arrives
  • push optimistic template for the next block
    • send prevhash if and only if our template won (i.e. we got a SubmitSolution message)

@DrahtBot
Copy link
Contributor

DrahtBot commented Feb 14, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Approach NACK stickies-v

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30315 (Stratum v2 Transport by Sjors)
  • #30214 (refactor: Improve assumeutxo state representation by ryanofsky)
  • #30203 (Enhance signet chain configuration in bitcoin.conf by BrandonOdiwuor)
  • #30200 (Introduce Mining interface by Sjors)
  • #30141 (kernel: De-globalize validation caches by TheCharlatan)
  • #30130 (contrib/signet/miner: increase miner search space by edilmedeiros)
  • #30051 (crypto, refactor: add new KeyPair class by josibake)
  • #29876 (build: add -Wundef by fanquake)
  • #29838 (Feature: Use different datadirs for different signets by BrandonOdiwuor)
  • #29775 (Testnet4 including PoW difficulty adjustment fix by fjahr)
  • #29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)
  • #29686 (Update manpage descriptions by willcl-ark)
  • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
  • #28843 ([refactor] Remove BlockAssembler m_mempool member by TheCharlatan)
  • #28417 (contrib/signet/miner updates by ajtowns)
  • #26697 (logging: use bitset for categories by LarryRuane)
  • #10102 (Multiprocess bitcoin by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/21562393655

{
bool started = m_tp->Start(Sv2TemplateProviderOptions { .port = 18447 });
if (! started) return false;
// Avoid "Connection refused" on CI:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template provider tests are quite brittle because they use a real socket.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being I just added handling for MSG_MORE (on e.g. macOS sequential messages are sent separately while on Linux they're combined). I also made the timeouts a bit longer.

Hopefully that does the trick. This can be revisited closer to the time when the Template Provider is ready for its own PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on a fix in Sjors#34

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably look into using StaticContentsSock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasild any thoughts on how to make mock Socks that can be used to play messages in two directions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! See the first two commits in #26812:

bee6bdf test: put the generic parts from StaticContentsSock into a separate class
f42e4f3 test: add a mocked Sock that allows inspecting what has been Send() to it

and then how to use that in the last commit of the same PR:

8b10990 test: add unit tests exercising full call chain of CConnman and PeerManager

With those it is possible to send/receive raw bytes to/from the (mocked) socket, or NetMsgs, e.g.:

pipes->recv.PushNetMsg(NetMsgType::GETBLOCKS, block_locator, hash_stop);

ss << TX_WITH_WITNESS(tx);
tx_size = ss.size();
}

Copy link
Member Author

@Sjors Sjors Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSAN is tripping up somewhere around here. The last thing it logs is - Connect 2 transactions:. It doesn't get to - Verify ... txins:. I wonder if this is related to mock time, which I'm testing in Sjors#34

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maflcko shouldn't TSan on CI output something useful about why it crashed? I currently only says "error 2": https://cirrus-ci.com/task/5124733717446656?logs=ci#L3531

When running this locally on Ubuntu with clang 16.0.6 I get a WARNING: ThreadSanitizer: data race and significantly more details (still a bit cryptic, but hopefully enough to figure out what's happening).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess the unit tests don't capture the tsan output?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they should. At least back when I tested #27667

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to re-check this when/after the cmake migration is done?

@Sjors
Copy link
Member Author

Sjors commented Feb 15, 2024

Added m_tp_mutex to Sv2TemplateProvider.

@Sjors
Copy link
Member Author

Sjors commented Feb 15, 2024

Bumping macOS to 14 on the CI does not help (tried in Sjors#35). I also can't reproduce this failure on my own Intel macOS machines, not on 13.6.4 and not on 14.2.1. A Sock mock is probably the most robust solution, but it'd be nice to find another workaround.

This extra delay seems to do the trick for now: Sjors@c8d10af

Another option to consider is using the functional test framework instead, since these are not really unit tests. However that involves implementing the sv2 noise protocol in Python and a bunch of other work to export transport functions to the functional test framework. If anyone feels up to that challenge, let me know...

fjahr and others added 19 commits June 21, 2024 18:54
Co-authored-by: Christopher Coverdale <chris.coverdale24@gmail.com>
Co-Authored-By: Christopher Coverdale <chris.coverdale24@gmail.com>
The Template Provider does stuff like this:

DataStream ss (sv2_net_msg.m_msg);
try {
            ss >> setup_conn;
        } catch (const std::exception& e) {
...

Such error checking should be moved inside Sv2Message.
The template provider will listen for a Job Declarator client.
It can establish a connection and detect various protocol errors.

Co-Authored-By: Christopher Coverdale <chris.coverdale24@gmail.com>
Co-Authored-By: Fi3
Stratum v2 requires pools to communicate the maximum bytes they intend to add to the coinbase outputs, as well as the maximum number of sigops.

This commit adds them to BlockAssembler::Options, so they can be set later.
These defaults are needed by the Mining interface in the next commit.
Co-Authored-By: Christopher Coverdale <chris.coverdale24@gmail.com>
Similar to the previous commit, however the behavior isn't opt-in
with NO_BRANCH, because Github CI lacks custom variables.
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
@Sjors
Copy link
Member Author

Sjors commented Jun 21, 2024

Rebased after #30314 and #30202. Incorporated the (light) transport refactor from #30315 to make sure that didn't break anything.

@stickies-v
Copy link
Contributor

stickies-v commented Jun 21, 2024

I don't think our RPC is suitable for this, because it's inherently poll based. We want to push out new block templates as soon as something better is found.

What about a combination of extending the zmq and rest interface and/or using longpollid, as suggested e.g. here? It seems like that would already improve things significantly, without having to add the encryption and transport layer?

It needs to do this because solutions for custom block templates have to be submitted by the node that created it. The pool will try, but it might fail if it's missing transactions.

I'm not sure I understand this. This PR is not an sv2 discussion, so if my understanding is wrong we don't need to talk about it here, but are you saying the pool will approve a proposed block template without having/keeping all the transactions? That seems undesired?

Regardless, I'm not sure the TP necessarily needs to hold on to block templates when sv2tpd caches all block template transactions and add any missing ones back to core through RPC before submitting a block? Perhaps the existing interfaces aren't sufficient currently, but then again - perhaps we could improve them instead of wholesale implementing sv2 logic?

I guess you're saying it should be separate software altogether

Yes, a separate project and a separate process would be my strong preference. Again, I do think it's important Bitcoin Core supports SV2 well. But if the bitcoind interfaces are lacking for sv2tpd to be a separate project, I think we should first exhaust the possibilities of improving the interfaces before just incorporating (part of) the downstream project into Bitcoin Core.

As a practical matter I'm not sure who would be implementing that

Wouldn't it be more straightforward to implement this as a separate sv2tpd daemon versus incorporating it into Core, giving more freedom in terms of language, dependencies, review, release process, ...?

@sipa
Copy link
Member

sipa commented Jun 21, 2024

But if the bitcoind interfaces are lacking for sv2tpd to be a separate project, I think we should first exhaust the possibilities of improving the interfaces before just incorporating (part of) the downstream project into Bitcoin Core.

FWIW, my view is that indeed the current Bitcoin Core interfaces are inadequate for implementing template providers, and while it's probably possible to make a number of improvements to existing ones to improve the situation, the result will either be far from ideal still (RPC is not binary, ZMQ is only one-directional, ...), or quite different from anything we have now. And if we're going to make a substantially different interface for the purpose of template providing, it probably makes sense to just pick Sv2 to be that interface as it's designed exactly for that purpose. The encryption aspect of Sv2 isn't necessary for that, but if all sv2tpd would do is be an Sv2 encrypting proxy, it might be logistically just simpler to have Bitcoin Core also do the encryption aspect.

So this doesn't exclude the possibility that future block template providing protocols be implemented in external proxy servers, which then talk still talk Sv2 to Bitcoin Core, if that is still appropriate.

@Sjors
Copy link
Member Author

Sjors commented Jun 21, 2024

I'm not sure I understand this. This PR is not an sv2 discussion, so if my understanding is wrong we don't need to talk about it here, but are you saying the pool will approve a proposed block template without having/keeping all the transactions? That seems undesired?

The Job Declaration Protocol has a param async_mining_allowed which, if set, allows miners to immediately submit work even before the template has been approved. Afaik SRI currently just hardcodes this to true for simplicity (cc @Fi3).

Approval takes time, because we have to send the pool any transactions it's missing, and it needs to check those (SRI currently only checks that all transactions are present, no additional checks - I suspect that's a can of worms on its own, since there may be conflicts with the pool node mempool). During this time you would either have to mine on a pool template, or solo mine.

Regardless of the merits of async mining, it's more robust for the miner to broadcast a found block themselves (this is done via the SubmitSolution message).

Example of this actually going wrong (in testing): #912

I'm not sure the TP necessarily needs to hold on to block templates when sv2tpd caches all block template transactions and add any missing ones back to core through RPC before submitting a block?

Maybe. But we also need to hold on to everything that was in the mempool at the time (an aspect I haven't thoroughly tested yet). From what I've seen in SRI, the Job Declarator Server (pool side) implements some sort of pseudo-mempool for that purpose.

Wouldn't it be more straightforward to implement this as a separate sv2tpd daemon versus incorporating it into Core

I'm not sure if it is. The two most practical approaches are improving this PR vs. expanding SRI. Building something from scratch is probably more work. Using a language other than Rust of c++ means re-writing the Noise implementation and other reusable code. The Bitcoin Core codebase has some very useful building blocks, e.g. the Transport class, BIP324 implementation (which we use parts of), well tested TCP socket handling code (unfortunately very tightly coupled with p2p, but refactoring that can be orthogonally useful), process separation in the pipeline (so it can crash without taking the node down - though so can a separate tool), good CI, and deterministic builds (so I can easily and relatively safely ship this PR as its own binary).

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/26528227399

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

ryanofsky added a commit that referenced this pull request Jun 24, 2024
a9716c5 rpc: call IsInitialBlockDownload via miner interface (Sjors Provoost)
dda0b08 rpc: minize getTipHash() calls in gbt (Sjors Provoost)
7b4d324 rpc: call processNewBlock via miner interface (Sjors Provoost)
9e22835 rpc: getTransactionsUpdated via miner interface (Sjors Provoost)
64ebb0f Always pass options to BlockAssembler constructor (Sjors Provoost)
4bf2e36 rpc: call CreateNewBlock via miner interface (Sjors Provoost)
404b01c rpc: getblocktemplate getTipHash() via Miner interface (Sjors Provoost)
d8a3496 rpc: call TestBlockValidity via miner interface (Sjors Provoost)
8ecb681 Introduce Mining interface (Sjors Provoost)

Pull request description:

  Introduce a `Mining` interface for the `getblocktemplate`, `generateblock` and other mining RPCs to use now, and for Stratum v2 to use later.

  Suggested here: #29346 (comment)

  The selection of methods added to the interface is mostly based on what the Template Provider in #29432 uses. It could be expanded further so that `rpc/mining.cpp` no longer needs `EnsureMemPool` and `EnsureChainman`.

  This PR should be a pure refactor.

ACKs for top commit:
  tdb3:
    re ACK a9716c5
  itornaza:
    Code review and std-tests ACK a9716c5
  ryanofsky:
    Code review ACK a9716c5 with one minor suggestion in case you update. Only changes since last review were other small changes to the interface.

Tree-SHA512: cf97f87d6e9ed89da3835a0730da3b24a7b14c8605ea221149103a5915e79598cf082a95f2bc88e33f1c450e3d4aad88aed1163a29195acca88bcace055af724
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants