Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay own transactions only via short-lived Tor or I2P connections #27509

Closed
wants to merge 17 commits into from

Conversation

vasild
Copy link
Contributor

@vasild vasild commented Apr 21, 2023

To improve privacy, relay locally submitted transactions (from the wallet or via RPC) to the P2P network only via Tor or I2P short-lived connections.

  • Introduce a new connection type for relaying sensitive data (our own transactions) with the following properties:

    • started whenever there are local unbroadcast transactions to be sent
    • only opened to Tor or I2P peers
    • opened regardless of max connections limits
    • after handshake is completed one local transaction is pushed to the peer and the connection is closed
    • ignore all incoming messages after handshake is completed
  • Relay locally submitted transactions (from the wallet or via RPC) using this new mechanism, to a few Tor or I2P peers.

  • Hide those transactions from GETDATA and MEMPOOL requests, as if we don't have them.

  • Once we get an INV from somebody, request the transaction with GETDATA, as if we didn't have it before.

  • After we receive the full transaction as a TX message, in reply to our GETDATA request, only then consider the transaction has propagated through the network and remove it from the unbroadcast list.

  • INV it to others as a result of the TX message we get, like what we do with transactions that are not ours.

The messages exchange should look like this:

tx-sender >--- connect -------> tx-recipient
tx-sender >--- VERSION -------> tx-recipient
tx-sender <--- VERSION -------< tx-recipient
tx-sender <--- WTXIDRELAY ----< tx-recipient (maybe) 
tx-sender <--- SENDADDRV2 ----< tx-recipient (maybe) 
tx-sender <--- SENDTXRCNCL ---< tx-recipient (maybe) 
tx-sender <--- VERACK --------< tx-recipient
tx-sender >--- VERACK --------> tx-recipient
tx-sender >--- TX ------------> tx-recipient
tx-sender >--- PING ----------> tx-recipient
tx-sender <--- PONG ----------< tx-recipient
tx-sender disconnects

Whenever a new local transaction is received (from the wallet or RPC), the node will send it to 5 (SENSITIVE_RELAY_NUM_BROADCAST_PER_TX) recipients right away. If after 10-15 mins we still have not heard anything about the transaction from the network, then it will be sent to 5 more peers (see PeerManagerImpl::ReattemptInitialBroadcast()).

A few considerations:

  • The short-lived sensitive relay connections are very cheap and fast wrt network traffic. It is expected that some of those peers will blackhole the transaction. Just one honest/proper peer is enough for successful propagation.
  • The peers that receive the transaction could deduce that this is initial transaction broadcast from the transaction originator. This is ok, they can't identify the sender.

TODO:

  • Disable if -connect is used or Tor and I2P are not reachable.
  • Use I2P transient connections even if listening on I2P.
  • Do something with the user-agent string (could reveal identity if it has been personalized) and make sure no identifying information in the VERSION message. Done: took the suggestion below.
  • Improve the condition when we consider the transaction seen by the network: right now a single INV from somebody suffices (this is still an improvement over master which is ok right after pushing the tx to a peer). We could wait for more than one INV, from peers that we have selected (outbound) and that are specifically not the ones we broadcasted to (via short-lived connection). Edit: One INV should be enough because after that we broadcast the transaction to everybody we are connected to (like if it is not ours).
  • Idea: start with one connection per transaction (not 5 as it is now), optimistically assuming it will be sufficient. (numbers are examples) After 1 minute, if still not received from the network, try 2 connections, after a few minutes try more. If still unsuccessful after 10 minutes, then fall back to the old mechanism.

This is supposed to address:
#3828
#14692
#19042
#24557
#25450

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 21, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK dergoegge, kristapsk, MaxHillebrand, mzumsande, ajtowns, RandyMcMillan, brunoerg, amitiuttarwar, 0xB10C, naumenkogs
Approach NACK maflcko
Stale ACK pinheadmz

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28222 (Use shared_ptr for CNode inside CConnman by willcl-ark)
  • #28170 (p2p: adaptive connections services flags by furszy)
  • #28107 (util: Type-safe transaction identifiers by dergoegge)
  • #28043 (fuzz: Test headers pre-sync through p2p interface by dergoegge)
  • #27837 (net: introduce block tracker to retry to download blocks after failure by furszy)
  • #27826 (validation: log which peer sent us a header by Sjors)
  • #27675 (p2p: Drop m_recently_announced_invs bloom filter by ajtowns)
  • #27539 (init: Fixes for file descriptor accounting by Empact)
  • #27407 (net, refactor: Privatise CNode send queue by dergoegge)
  • #27213 (p2p: Diversify automatic outbound connections with respect to networks by amitiuttarwar)
  • #27114 (p2p: Allow whitelisting outgoing connections by brunoerg)
  • #26697 (logging: use bitset for categories by LarryRuane)
  • #26621 (refactor: Continue moving application data from CNode to Peer by dergoegge)
  • #25832 (tracing: network connection tracepoints by 0xB10C)
  • #25572 (refactor: Introduce EvictionManager and use it for the inbound eviction logic by dergoegge)
  • #25284 (net: Use serialization parameters for CAddress serialization by MarcoFalke)
  • #25268 (refactor: Introduce EvictionManager by dergoegge)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@vasild vasild marked this pull request as draft April 21, 2023 13:46
@dergoegge
Copy link
Member

dergoegge commented Apr 21, 2023

Concept ACK - cool!

Not sure about "only" relaying to Tor and I2P but seems useful as an option.

after handshake is completed one local transaction is pushed to the peer and the connection is closed

#4432 comes to mind

@kristapsk
Copy link
Contributor

Concept ACK

1 similar comment
@MaxHillebrand
Copy link

Concept ACK

@maflcko
Copy link
Member

maflcko commented Apr 21, 2023

Looking at the current implementation, I am not sure how useful this is going to be, if the transaction is sent along with the potentially uniquely identifying UA comment, or other fingerprints in the version message. Maybe it makes sense to define a static version message, similar to how the tor browser replaces headers with constants?

@sipa
Copy link
Member

sipa commented Apr 21, 2023

@MarcoFalke Good idea! Perhaps it's worth mimicking bitcoin-submittx (https://github.com/laanwj/bitcoin-submittx/blob/master/bitcoin-submittx#L28) ?

@kristapsk
Copy link
Contributor

Looking at the current implementation, I am not sure how useful this is going to be, if the transaction is sent along with the potentially uniquely identifying UA comment, or other fingerprints in the version message. Maybe it makes sense to define a static version message, similar to how the tor browser replaces headers with constants?

Or it could randomize it, from the list of different version strings.

@dergoegge
Copy link
Member

dergoegge commented Apr 21, 2023

One shot connections + immediate tx broadcast without inv,getdata exchange leaks that the tx is likely being broadcast for the first time (easy to censor the tx as the receiver).

Perhaps it is worth combining this with #27213 and instead of one-shot connections we broadcast only to existing selected connections on these privacy networks. Our own transactions would then still benefit from being broadcast along side other transactions.

@mzumsande
Copy link
Contributor

mzumsande commented Apr 21, 2023

Concept ACK. Not sure if this should be the default or an optional mode (in some cases, a faster and more reliable transmission may be more important than privacy).

It might also be helpful to combine this with some heuristics that the tx percolated through the network:
For example, require that 2 of your regular full outbound connections (that you didn't send it to ) have inv'ed the tx back to you - otherwise try again after some time.

@vasild
Copy link
Contributor Author

vasild commented Apr 22, 2023

potentially uniquely identifying UA comment, or other fingerprints in the version message

added to the TODO in the OP

One shot connections + immediate tx broadcast without inv,getdata exchange leaks that the tx is likely being broadcast for the first time (easy to censor the tx as the receiver).

Yes. I was thinking its probably ok to broadcast to a few peers, not just one. And if they blackhole it, then the code will retry. We can retry more aggressively if it keeps being blackholed. I think it is ok to assume that there are some honest nodes in our addrman.

Perhaps it is worth combining this with #27213 and instead of one-shot connections we broadcast only to existing selected connections on these privacy networks.

There are two aspects of this:

  1. Linking a bitcoin transaction to IP address/geolocation. Broadcasting to long-lived Tor/I2P connections solves this (modulo other issues like linking Tor nodes with IP addresses).
  2. Linking one bitcoin transaction with another bitcoin transaction that otherwise are unrelated. To solve/mitigate this we need to send each transaction over its individual connection and that connection to not be linkable to other connections.

Not sure if this should be the default or an optional mode

Right, also assuming there will be bugs/unexpected things - maybe have an option to enable/disable this. Hardest would be to introduce it without an option to disable it. Softest approach would be: 1. introduce under an option with default off, 2. in next version flip the default to on (if it seems to be working well)

It might also be helpful to combine this with some heuristics that the tx percolated through the network:
For example, require that 2 of your regular full outbound connections (that you didn't send it to ) have inv'ed the tx back to you - otherwise try again after some time.

I actually changed the behavior to do something like that in 512077b net_processing: relay local transactions via the SENSITIVE_RELAY mechanism in this PR: see where RemoveUnbroadcastTx() was called before / is called now. Maybe that deserves a separate commit. Before we would be ok as long as we pushed the tx to the peer. Now we wait until we receive an INV about that tx from somebody. There is some room for improvement - added in the TODO in the OP.

@mzumsande
Copy link
Contributor

mzumsande commented Apr 22, 2023

I think that in the suggested approach we might reveal information by not inv'ing our own tx to others.
If a spy is connected to each node, we might send the tx to them first currently (master), but with the PR, if we are now the only node that never inv's the tx to them at all (provided that all other nodes do that eventually with the flooding mechanism) that would equally identify us as the source.

@vasild
Copy link
Contributor Author

vasild commented Apr 23, 2023

@mzumsande, right, ideally, after the initial broadcast via the short-lived connection, the node should behave as if it does not know the tx:

  • before it gets INV for that tx, exclude it from GETDATA and MEMPOOL replies
  • if it gets INV for that tx, request the entire tx via GETDATA
  • broadcast to others

@ajtowns
Copy link
Contributor

ajtowns commented Apr 26, 2023

Concept ACK.

Might be good to have some way of being able to configure where to connect to for sensitive relay, defaulting to tor/i2p, but so that you could perhaps enable specific ipv4/ipv6 addresses as well, particularly for testing purposes.

@amitiuttarwar
Copy link
Contributor

amitiuttarwar commented Apr 27, 2023

concept ACK

a high level question: the privacy benefit comes from the guarantees of using the altnets - the lifecycle of the connection is expected to reveal that the tx is an initial broadcast. if a spy is able to reliably fingerprint a node between networks, this mechanism could potentially degrade privacy. could it be worthwhile to mitigate this by occasionally broadcast non-wallet txs via the same short-lived connections, or broadcast a couple additional txs over the same connection? obv some big tradeoffs, just want to consider the possibility. even if we are able to ensure no leaks now, it is difficult to guarantee this over time of developing the project.

@RandyMcMillan
Copy link
Contributor

Concept ACK

@RandyMcMillan
Copy link
Contributor

Disagree with:

opened regardless of max connections limits

The connection should be established after another tor/i2p connection is dropped - this way the connection appears to be just another connection - nothing special about it.

@RandyMcMillan
Copy link
Contributor

It may also be useful to randomize the duration of the connection?

Additionally - padding the connection with dummy txs during its lifecycle may help obfuscate the real payload?

@brunoerg
Copy link
Contributor

Concept ACK

This applies only if sensitive relay is used.

Before this change, we would consider a local transaction relayed
successfully to the network right after we push it to a peer (when
`PushMessage()` returns, we still have not even sent it to the peer
yet).

Instead wait for somebody to send us `INV` about that transaction,
then retrieve it with `GETDATA` (as if we don't have it already) and
only after receiving it, consider it relayed. This is a better
indication that the peer we sent it to has successfully received it,
not blackholed it and the transaction has propagated to the network.
This is used in both cases - TCP server (accept) and TCP client (connect).
The message "Connected & Listening address:port" is confusing.

Print both ends of the TCP connection.
…tion

If requested, make the SOCKS5 Python proxy redirect connections to a set
of given destinations. Actually act as a real proxy, connecting the
client to a destination, except that the destination is not what the
client asked for.

This would enable us to "connect" to Tor addresses from the functional
tests.
@vasild
Copy link
Contributor Author

vasild commented Aug 4, 2023

6e6f83b0f7...36074c093b: rebase, fix a bug where we sent the transaction without the witness, and extend the test to cover that

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 6, 2023

🐙 This pull request conflicts with the target branch and needs rebase.

@maflcko
Copy link
Member

maflcko commented Aug 31, 2023

@vasild Re your question on IRC about the wallet when not adding the tx to the mempool: Not counting non-mempool change is a pre-existing issue with the wallet, and I think should be fixed unrelated to this pull request. As I said previously, this can be observed with current wallet settings, see #27509 (comment). So I think all you'd need to do here is add yet another setting to the wallet (disabled by default) to enable this feature.

Conceptually, it seems easier to fix, review and test the local wallet issue about counting change, than it is to fix the remote p2p behavior of withholding or supplying transactions that are in or out of the mempool.

@pinheadmz
Copy link
Member

pinheadmz commented Sep 12, 2023

I'm noticing this in the log when sending a tx:

2023-09-12 10:12:40 2023-09-12T14:12:40Z [miner] Submitting wtx 86a83ee7c540e833b1132969f13ce770adae8f80e50bba6038eaec6ff1185657 to mempool for relay

but we are not actually submitting anything to the mempool, right ...?

Or yes we are but not to the part of the mempool where we relay to peers over clear net? (i.e. is that what "m_unbroadcast_txids" is?)

@vasild
Copy link
Contributor Author

vasild commented Sep 13, 2023

@pinheadmz that log is from before this PR and I did not modify it. It is true that right now on master and on this PR the transaction is added to the mempool and also, in addition to m_unbroadcast_txids. Thus I think the message is ok.

However, given the feedback above and on IRC, I will change this to not add the transaction to the mempool. I am just figuring out how to exactly handle it. Then this message "Submitting ... to mempool for relay" would be inaccurate. It is a good opportunity to discuss this f2f.

@@ -600,6 +600,7 @@ void SetupServerArgs(ArgsManager& argsman)
OptionsCategory::NODE_RELAY);
argsman.AddArg("-minrelaytxfee=<amt>", strprintf("Fees (in %s/kvB) smaller than this are considered zero fee for relaying, mining and transaction creation (default: %s)",
CURRENCY_UNIT, FormatMoney(DEFAULT_MIN_RELAY_TX_FEE)), ArgsManager::ALLOW_ANY, OptionsCategory::NODE_RELAY);
argsman.AddArg("-sensitiverelayowntx", strprintf("Relay locally submitted transactions via short lived connections to Tor or I2P for improved privacy. There are two aspects of this - linking transaction-origin/Bitcoin-owner with IP-address/geolocation and linking seemingly unrelated transactions to the same owner/origin (this applies even if the node is running in tor-only mode) (default: %u)", DEFAULT_SENSITIVE_RELAY_OWN_TX), ArgsManager::ALLOW_ANY, OptionsCategory::NODE_RELAY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe -walletsensitiverelay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used "own" instead of "wallet" because local transactions could originate also from the RPC and they may have nothing to do with the wallet, there may not be a wallet at all. Come to think about this, maybe "private" is better word than "sensitive" and "broadcast" is better than "relay". So, maybe I should change this to something like -privatebroadcast...

@amitiuttarwar
Copy link
Contributor

we discussed this proposal in person, and I wanted to share some of my current thoughts here:

  • I agree with the alternate approach to keep the transaction out of the mempool. it's very difficult to ensure that there aren't any information leaks, and even if we are able to achieve that with the current code, it is unrealistic to be able to maintain that over time
  • the idea of starting via RPC submission only seems like a good way to incrementally develop this feature
  • in the current implementation, we schedule opening 5 connections per transaction, but then there's randomness around which one of the transactions each connection decides to broadcast. this could lead to a lot of variability if there are multiple transactions queued around the same time, which doesn't seem desirable. It makes more sense to me for there to be a direct mapping

@vasild

I will change this to not add the transaction to the mempool. I am just figuring out how to exactly handle it.

my recommendation is to make this PR a draft until you have a new proposal and are ready for further review. happy to take a look when the new approach is figured out :)

@fanquake fanquake marked this pull request as draft October 2, 2023 09:20
@naumenkogs
Copy link
Member

Concept ACK. Looking for the next actions w.r.t not adding to the mempool.

@DrahtBot
Copy link
Contributor

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@vasild
Copy link
Contributor Author

vasild commented Jan 3, 2024

This is relevant and I am rewriting it as per the discussions above.

@vasild
Copy link
Contributor Author

vasild commented Feb 9, 2024

I opened a new PR at #29415 to continue with this.

in the current implementation, we schedule opening 5 connections per transaction, but then there's randomness around which one of the transactions each connection decides to broadcast. this could lead to a lot of variability if there are multiple transactions queued around the same time, which doesn't seem desirable. It makes more sense to me for there to be a direct mapping

@amitiuttarwar, I was thinking about this, but realized that once connman has opened N private broadcast connections they are identical from peerman's point of view. They don't have to be tied to transactions like "connection1 for txA", "connection2 for txB". Since connection1 and connection2 are identical, it may as well be "connection2 for txA" and "connection1 for txB".

To avoid the variability, e.g. one transaction to be broadcast more times than another, I introduced a priority for the transactions based on how many times one has been broadcast and how recently. Then peerman picks the tx with a higher priority when it starts processing a private broadcast connection.

I don't feel strongly about this, but it seems to be working well and avoids communication between peerman and connman like:

  • peerman to connman: open 1 connection for txA
  • connman to peerman: this connection is for txA

Also, if a transaction is quickly received back from the network and removed from the broadcast list, then the remaining connections created due to it will be used for broadcasting others instead.

@vasild vasild closed this Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet