You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is nowhere near production-ready, but in principle working. Currently, the /transactions POST endpoint can be easily abused to cause high load on nodes, because it validates all transactions on the main thread, which causes CPU spikes and this can even be caused by broadcasting invalid transactions targeted at specific nodes since the signature check is pretty heavy. This is the reason why it is advised for node operators to secure their core-api access or completely disable it if used in front of a forger.
So the problem only manifests when using the core-api endpoint. However, ideally, this workaround is replaced with a more generic solution which also affects core-api. This is where the pool worker comes in.
The flow right now is:
POST /transactions -> create new Processor instance -> validate -> addTransactionsToPool
-> return response (accepted, ignored, excess, error)
With the pool worker it would change to something like this:
POST /transactions -> enqueue transactions which creates a job -> return response (ticketId, e.g a sequentially increasing number)
ticketId represents a job, that is either in the queue (about to be sent to worker), being processed (somewhere in worker) or done (returned from worker).
End users/clients can query the status by hitting an endpoint of a peer they broadcasted to using the ticket id which would return a response resembling what they get currently if they use the endpoint.
^ This is a significant change API wise and breaks all kind of client software so that's why the worker has been postponed to 3.0
Queued jobs are then pushed to the worker which does all the heavy lifting and once done reports back to the main thread, which will add valid transactions to the transaction pool.
A nice benefit of this approach is that it also makes the frequency in which transactions are rebroadcasted by a node to other peers more deterministic. Right now they rebroadcast whenever they are done validating the current batch of transactions (i.e. 1 request -> 1 broadcast), while a worker could report his finished jobs like only every 100ms, then a behaving node would at most rebroadcast 10 times per second. which greatly reduces the snowball effect that can currently be observed when the network is flooded with many transactions. Also, the rate-limit on the endpoint can then be properly calibrated.
The text was updated successfully, but these errors were encountered:
An initial explanation of the issue and idea based @supaiku0. This most likely will need refinement as implementation is ongoing.
A few months @supaiku0 worked on a proof of concept:
https://github.com/ArkEcosystem/core/tree/wip/core-transaction-pool/worker
It is nowhere near production-ready, but in principle working. Currently, the
/transactions
POST endpoint can be easily abused to cause high load on nodes, because it validates all transactions on the main thread, which causes CPU spikes and this can even be caused by broadcasting invalid transactions targeted at specific nodes since the signature check is pretty heavy. This is the reason why it is advised for node operators to secure theircore-api
access or completely disable it if used in front of a forger.The p2p endpoint already received a workaround, by giving the main thread room to breath: https://github.com/ArkEcosystem/core/pull/2848/files
So the problem only manifests when using the
core-api
endpoint. However, ideally, this workaround is replaced with a more generic solution which also affectscore-api
. This is where the pool worker comes in.The flow right now is:
POST /transactions -> create new
Processor
instance -> validate -> addTransactionsToPool-> return response (accepted, ignored, excess, error)
With the pool worker it would change to something like this:
POST /transactions -> enqueue transactions which creates a job -> return response (
ticketId
, e.g a sequentially increasing number)ticketId
represents a job, that is either in the queue (about to be sent to worker), being processed (somewhere in worker) or done (returned from worker).End users/clients can query the status by hitting an endpoint of a peer they broadcasted to using the ticket id which would return a response resembling what they get currently if they use the endpoint.
^ This is a significant change API wise and breaks all kind of client software so that's why the worker has been postponed to 3.0
Queued jobs are then pushed to the worker which does all the heavy lifting and once done reports back to the main thread, which will add valid transactions to the transaction pool.
A nice benefit of this approach is that it also makes the frequency in which transactions are rebroadcasted by a node to other peers more deterministic. Right now they rebroadcast whenever they are done validating the current batch of transactions (i.e. 1 request -> 1 broadcast), while a worker could report his finished jobs like only every 100ms, then a behaving node would at most rebroadcast 10 times per second. which greatly reduces the snowball effect that can currently be observed when the network is flooded with many transactions. Also, the rate-limit on the endpoint can then be properly calibrated.
The text was updated successfully, but these errors were encountered: