Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connect P2p with TxPool #590

Merged
merged 48 commits into from
Oct 4, 2022
Merged

Connect P2p with TxPool #590

merged 48 commits into from
Oct 4, 2022

Conversation

ControlCplusControlV
Copy link
Contributor

Connects P2p with TxPool which should allow transactions inserted over the txpool of one node to be broadcasted to the others. Done in collaboration with @bvrooman but this has fallen stale so I had to re-impl over master since merge conflicts and git history had grown too messy

Closes #477
Closes #478

@ControlCplusControlV ControlCplusControlV marked this pull request as draft September 4, 2022 05:11
@ControlCplusControlV
Copy link
Contributor Author

ControlCplusControlV commented Sep 21, 2022

Seems a lot of tests are failing now, do you know why yet @bvrooman ? I am seeing a lot of timeouts

@bvrooman bvrooman self-assigned this Sep 22, 2022
@bvrooman
Copy link
Contributor

bvrooman commented Sep 23, 2022

It appears that the difficulty we are encountering is caused by the different configurations created by the combinations of different feature flags, specifically the "p2p" and "relayer" feature flags. These enable or disable the P2P service and Relayer service respectively. When we disable services and channels based on flags, we must also be cognizant of how the senders and receivers of each channel is affected.

I'm mapping out the channels in a draw.io doc here. This map is not complete, and it may not be completely accurate thus far.

From this map, we can observe that certain channels will have a different set of senders/receivers depending on the services enabled. When running tests with all features enabled, including P2P and Relayer, tests pass. However, we can see that without P2P, some senders are removed from the system and some signals are never passed to receivers. This causes some channels to wait indefinitely causing timeouts.

We need to make sure that tests pass for all combinations of the matrix:

P2P Relayer
✗ Disabled ✗ Disabled
✓ Enabled ✗ Disabled
✗ Disabled ✓ Enabled
✓ Enabled ✓ Enabled

Right now, the code is written as if all services are enabled. Tests fail when P2P is disabled (e.g. no-default-features).

It is clear from reading the code and mapping out the connections that using channels to this volume introduces a lot of complexity and potential for error.

@ControlCplusControlV
Copy link
Contributor Author

Will try and reconfigure so we can get all feature combos working, although realistically we just disable tx gossiping if both aren't enabled. Maybe we could do something like wrapping gossip under a meta feature?

@Voxelot
Copy link
Member

Voxelot commented Sep 23, 2022

FYI - I don't have permission to open that draw.io link

I think we should try to find the simplest way to enable this for now without coming up with a perfect solution to the channels issue, since we have a pending initiative to redesign our service arch and reduce our reliance on channels where possible.

@bvrooman
Copy link
Contributor

bvrooman commented Sep 23, 2022

I propose a couple of options:

  • Enable the incoming_tx_receiver.recv() branch of the txpool service context only when p2p is enabled (otherwise, the recv will return an error). However, macros like #[cfg(feature = "p2p")] are not valid syntax once nested inside the tokio::select! macro (see tokio::select! cfg-gated tokio-rs/tokio#3974).
  • Box the incoming_tx_sender to store it on the heap and leak the box to prevent RAII deletion. This will keep the sender alive, ensuring the receiver has a corresponding sender. Of course, the cost is the memory leak.
  • Alternatively, we can just make p2p a default feature, and we can worry about the enabling/disabling of p2p once we go through the rearchitecting of the services.

I am pushing a commit to implement the second option. This option is quick and dirty, but it will unblock the tests.

let node_two = FuelService::new_node(node_config).await.unwrap();
let client_two = FuelClient::from(node_two.bound_address);

tokio::time::sleep(Duration::new(3, 0)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: We can replace the sleep calls with polling for peers once the functionality is enabled. This will be tracked in a separate issue here: #649.

let tx = client_one.transaction(&result.0.to_string()).await.unwrap();
assert!(tx.is_some());

tokio::time::sleep(Duration::new(3, 0)).await;
Copy link
Contributor

@bvrooman bvrooman Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Same as above; we will replace sleep with polling once that is available.

Copy link
Collaborator

@xgreenx xgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief review

tx: tx.clone(),
status: TxStatus::Submitted,
});
let _ = network_sender
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method insert below do the same except for this part. Can we move common logic to a separate function and pass a closure that will broadcast in the case of p2p.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than complicating with a closure, this method could just call the one below, iterate over the Vec<Result<Arc<Transaction>>> return type and then trigger a broadcast for each successful one.

Copy link
Contributor

@bvrooman bvrooman Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rather than complicating with a closure, this method could just call the one below, iterate over the Vec<Result<Arc<Transaction>>> return type and then trigger a broadcast for each successful one.

We originally proposed this but we changed it in favour of a more "performant" approach, at the cost of the extra code. But I think this approach is the simplest and cleanest, and I prefer the cleaner approach to the performant approach in this case.

The draw back is:

  • Double the number of iterations

But the benefits are:

  • No extra method or code duplication
  • Easier to read and reason about code

I'll opt for that instead and push a commit for that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably atleast leave a note in the future though in case in future txpool optimizations we need areas to improve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Draft PR based on this branch: #670

match new_transaction.unwrap() {
TransactionBroadcast::NewTransaction ( tx ) => {
let txs = vec!(Arc::new(tx));
TxPool::insert(txpool, db.as_ref().as_ref(), tx_status_sender, txs).await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must also use insert_with_broadcast if p2p is enabled.

Also, it means that this case is not tested. Do we plan to test in this PR or in follow-up?

Copy link
Member

@Voxelot Voxelot Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be done unless we change the current gossip logic to only broadcast to immediate peers (depth 1). Even with depth of 1, we risk bouncing broadcasts back and forth in an infinite loop. Right now, any gossiped messages will be propagated to the whole network. We have other P2P tasks to investigate how to prevent nodes from re-gossiping invalid data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it may not be clear from a cold read, but this branch is executed when new transactions are received from gossip. In this case, we do not want to echo the same transaction back to the network with a broadcast.

Comment on lines 168 to 170
pub incoming_tx_receiver: broadcast::Receiver<TransactionBroadcast>,
#[cfg(feature = "p2p")]
pub network_sender: mpsc::Sender<P2pRequestEvent>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can already introduce the traits we discussed on the call in this PR?

You can represent P2P as Gossiper and TransactionProvider traits. It will aggregate incoming_tx_receiver and network_sender into one field, making the code cleaner. It seems easy to do because TXPool already knows nothing about P2P(based on the imports in Cargo.toml).

image

Copy link
Member

@Voxelot Voxelot Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to move forward with the current PR as it's been in progress for a long time and we should checkpoint this progress, and then we can do channel / trait conversion in subsequent PRs.

}
for (ret, tx) in res.iter().zip(txs.into_iter()) {
match ret {
Ok(removed) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current code, we check that transactions were in the transactions pool, but we don't check that transactions have been committed in the block.

With the introduction of P2P gossiping seems we need to do this check. Overwise, we will gossip about already committed transactions)

Copy link
Member

@Voxelot Voxelot Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think txpool cleanup should be deferred to the block importer/sync task. Without that in place, it doesn't make as much sense to implement this yet.

fuel-txpool/src/service.rs Outdated Show resolved Hide resolved
* cleaned up feature flags

* fmt

* fix leak:

* fmt again
Copy link
Member

@Voxelot Voxelot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :shipit:

The service setup w/ channels looks painful, but luckily we have a plan to address this soon.

The PR in draft can come after this one.

@ControlCplusControlV ControlCplusControlV merged commit c6ea9dd into master Oct 4, 2022
@ControlCplusControlV ControlCplusControlV deleted the controlc/p2p_tx branch October 4, 2022 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants