-
Notifications
You must be signed in to change notification settings - Fork 3.8k
implement basic scheduler #145
implement basic scheduler #145
Conversation
add basic tests for scheduler
NB: we do not have the facilities to create/process these yet but they can be scheduled scheduling is generated/signed agnostic though the block format still segregates them chain_controllers _db now has indices for generated transactions. tests using all signed transactions for now. closes EOSIO#139
… the producer plugin
changed the output of the schedulers to be just pending_transactions as those are tagged unions of pointers making them small and easily copyable without needing the additional indirection implemented basic test cases to stochastically verify that order of transaction delivery does not affect schedulability
I haven't followed the conversations on scheduling that closely, so I probably need to be filled in. It looks like the overall strategy used here is to collect pending transactions in a list the same way as always, and then when it comes time to produce a block, switch that list over several cycles/threads. Is that correct? That strategy doesn't make sense to me, but perhaps it's because I don't know what all the requirements of scheduling are... As I see it right now, transactions roll in from the network one at a time, but we are saving them up and scheduling them all at once at the end? Why not build up the schedule one trx at a time as they come in so that when it comes time to produce a block (crunch time, when we're on a deadline), we already have them all scheduled because all that work was done earlier? So is there some necessary information we don't know until we have a full list of (hopefully-valid) transactions we'd like to schedule in the block? So what are the requirements for a valid schedule? I know we can't put messages with the same code account in different threads of the same cycle... What other requirements are there? |
@bytemaster and I talked about pipe-lining scheduling so that we can push the cost of that entirely out of the quantum a producer has to produce a new block once it finally has knowledge of the preceding block. It is an open area of discussion. However, without some commitment scheme that provides some guarantees that you are not pre-scheduling transactions another producer will commit to a block before your next quantum, pre-scheduling has limited benefits. Best case scenario, you have built a pre-schedule that you must drop some transactions from once you finally learn the preceding block's contents and the resulting schedule is still performant. Worst case, the transactions have "high-degree" scopes and the resulting schedule after there removal is unnecessarily synchronous without them. I think more than likely, a set of well behaving producers will only have to schedule transactions broadcast during the previous producer-quantum and/or generated by the previous block. The discovered transactions may come in at the very end of the previous quantum and the block certainly will. In this case, pre-scheduling is largely busy-work as a majority of transactions will end up in blocks prior to the block a producer would get to generate. |
What are the possible conflicts between transactions? I know of a couple:
Are there any others? |
as of now, conflicts only arise from these cases (though sheepishly I didnt realize that the code account in the messages may not be referenced in the scope array. if that is the case, I need to respect it in scheduling and I currently do not) Pre-scheduling shouldn't result in an invalid schedule when transactions are dropped just potentially a poor schedule. Each additional transaction potentially adds restrictions to the scheduler, removing a transaction may reduce the restrictions a scheduler has. Naively, the fewer restrictions the more parallel a schedule can be. Having too many "phantom" restrictions from removed transactions has the same effect on the schedule efficiency for now benefit downstream (eg a retired transaction). |
One thing I think is worth bearing in mind is that "scheduling" does not matter to any node at all, except to the node scheduled to produce block N, between the time that node has received block N-1 and the time it produces block N. Outside that very small window, scheduling is irrelevant and probably shouldn't be done at all, unless we make it so cheap that it's basically free. At all other times, the only thing any node cares about is whether the transaction appears valid enough to propagate it through the P2P network. During that scheduling window, transactions appear one at a time, except for possibly some pending transactions we had that block N-1 didn't include (these should be rare, so it's probably not worth optimizing for them). So we probably do want to optimize for that incoming trickle of transactions. Hmm, it's just occurred to me that notifying accounts is hairy... When a message handler notifies account A, to deliver that notification we've got to evaluate a handler with A as the code account... but that makes scheduling pretty much impossible because the scheduler doesn't know that A is going to be notified, so it can't ensure that A isn't already code account in a different thread.... or did I make a mistake somewhere? |
Its my understanding after we had the big scheduling talk a few weeks back that all affected accounts would be listed explicitly and a transaction would fail if it violated this constraint. I would think this would include any notify actions that are delivered synchronously, those accounts must be in the transactions list of scopes.
During the quantum for block N, there is a lot to do that is post-scheduling such as actually executing the transactions in parallel and pruning any that fail during actual block dispatch (for whatever reason) and transmitting the block. I don't know that we can afford to wait for trickles. This may be a case where we have to trade transaction latency for throughput? I'm open to a JIT-scheduler but I do think that will limit the efficiency of scheduling and as a result decrease efficiency for downstream block learners (both partial and full). That said, If every Producer is doing "one-time-batch" scheduling, then after validating block N-1 we should have a list of pending transactions ready to schedule which roughly resembles those that arrived since the time Producer(N-1) performed its scheduling. The time between these scheduling batches should be roughly the time between blocks. The latency expectation for a transaction would be [bt, 2bt] instead of [0, bt] where "bt" is the time between blocks. Likewise a scheduler has full knowledge of the transaction set which gives us the best shot at efficiency. |
I have been reading the discussion and the "code account" should not be relevant to scheduling, only the scopes. The scopes define which areas of memory can be locked. The same code can execute in parallel in different scopes. |
libraries/chain/block_schedule.cpp
Outdated
|
||
uint cycle; | ||
uint thread; | ||
pending_transaction const *transaction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistically, the * belongs to the type not the variable. Also, since transaction cannot be null it should be a const & and no pointers should be used at all.
libraries/chain/block_schedule.cpp
Outdated
} | ||
|
||
template<typename CONTAINER> | ||
auto initialize_pointer_vector(CONTAINER const &c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stylistic, const CONTAINER& c, once again the '&' is part of the type, not the variable.
libraries/chain/block_schedule.cpp
Outdated
size_t current_size; | ||
size_t const max_size; | ||
|
||
bool should_skip(pending_transaction const *t) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pointer should be reference.
libraries/chain/block_schedule.cpp
Outdated
} | ||
|
||
block_schedule block_schedule::by_threading_conflicts( | ||
vector<pending_transaction> const &transactions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const type& transactions
libraries/chain/block_schedule.cpp
Outdated
|
||
block_schedule block_schedule::by_threading_conflicts( | ||
vector<pending_transaction> const &transactions, | ||
const global_property_object &properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& should be by type not variable name.
libraries/chain/chain_controller.cpp
Outdated
@@ -460,7 +501,7 @@ void chain_controller::_apply_block(const signed_block& next_block) | |||
#warning TODO: Process generated transaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to github issue and link to it and remove compile time warning
*/ | ||
struct block_schedule | ||
{ | ||
typedef block_schedule (*factory)(vector<pending_transaction> const &, const global_property_object&); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const T&
shuffled_functor(NEXT &&_next) : next(_next){}; | ||
|
||
block_schedule operator()(vector<pending_transaction> const &transactions, const global_property_object& properties) { | ||
std::random_device rd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This source of entropy is much slower and more robust than necessary for fuzzing. Also, I am not sure why we need to introduce randomness into our scheduling process. Not only will this make reproducing issues trickier it generates non-deterministic behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fuzzing and testing concerns should be kept outside the chain library in testing files anyways, no?
tests/common/expect.hpp
Outdated
|
||
|
||
template<typename PRED, typename EVAL, typename T> | ||
auto expect(EVAL &&eval, T expected, char const * const msg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding standard, template types should be Eval, Pred. ALL CAPS are reserved for preprocessor macros.
Also '&&' belongs next to type not variable name.
const char* const not char const * const
lossy_functor(NEXT &&_next) : next(_next){}; | ||
|
||
block_schedule operator()(vector<pending_transaction> const &transactions, const global_property_object& properties) { | ||
std::random_device rd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more random fuzzing... ?
tests/tests/block_schedule_tests.cpp
Outdated
@@ -31,23 +31,73 @@ using namespace eos; | |||
using namespace chain; | |||
|
|||
/* | |||
* templated meta schedulers for fuzzing | |||
*/ | |||
template<typename NEXT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEXT template argument looks like MACRO when. Should be updated to conform to style standards. "Next" for a template argument.
I am still looking for an explanation for fuzzing / randomness. |
@bytemaster in the updates all fuzzing code has been moved out of the library code and in to the test cases. There are 2 fuzzed test cases at the moment. While the current scheduler is very basic, i wanted to set a precedent for future scheduling modes that tests would include a basic stochastic verification that the order in which transactions are included does not affect the ability of the scheduler to run correctly and to completion. This doesn't make use of the random dropping scheduler as scheduling at this level is not concerned by that. However, there is the need for future testing that shows dropped/re-ordered real transactions do not negatively affect a producers ability to schedule and process a block. We can remove the dropping fuzzer until it is needed. As to why this was originally in non-test code, the idea was that contract devs would want to test how their contracts behave when faced with non-ideal network conditions. In particular making sure there is no way to get undesired behavior out of transaction ordering or drops that are part of any distributed system may be useful to a contract developer. To that end, i had added the ability to trivially fire up a sandbox with a fuzzing scheduler to create these scenarios semi-automatically on a local test network. |
during the last merge of features from master, there was a non-trivial merge I would like to point out: The changes to master dropped transactions from the pending set if they failed during scheduling. In order to maintain this with the schedulable set no longer being just |
bump cmake version
# This is the 1st commit message: various improvements # This is the commit message #2: new hash # This is the commit message #3: fix for script path # This is the commit message #4: fixes # This is the commit message #5: fixes # This is the commit message #6: fixes # This is the commit message #7: fixes # This is the commit message #8: fixes # This is the commit message #9: fixes # This is the commit message #10: fixes # This is the commit message #11: fixes # This is the commit message #12: fixes # This is the commit message #13: fixes # This is the commit message #14: fixes # This is the commit message #15: fixes # This is the commit message #16: fixes # This is the commit message #17: fixes # This is the commit message #18: fixes # This is the commit message #19: fixes # This is the commit message #20: fixes # This is the commit message #21: fixes # This is the commit message #22: fixes # This is the commit message #23: fixes # This is the commit message #24: fixes # This is the commit message #25: fixes # This is the commit message #26: testing # This is the commit message #27: testing # This is the commit message #28: testing # This is the commit message #29: testing # This is the commit message #30: testing # This is the commit message #31: testing # This is the commit message #32: testing # This is the commit message #33: testing # This is the commit message #34: testing # This is the commit message #35: testing # This is the commit message #36: testing # This is the commit message #37: testing # This is the commit message #38: testing # This is the commit message #39: testing # This is the commit message #40: testing # This is the commit message #41: testing # This is the commit message #42: testing # This is the commit message #43: testing # This is the commit message #44: fixes # This is the commit message #45: fixes # This is the commit message #46: fixes # This is the commit message #47: fixes # This is the commit message #48: fixes # This is the commit message #49: fixes # This is the commit message #50: fixes # This is the commit message #51: fixes # This is the commit message #52: fixes # This is the commit message #53: fixes # This is the commit message #54: fixes # This is the commit message #55: fixes # This is the commit message #56: fixes # This is the commit message #57: fixes # This is the commit message #58: fixes # This is the commit message #59: fixes # This is the commit message #60: fixes # This is the commit message #61: fixes # This is the commit message #62: fixes # This is the commit message #63: fixes # This is the commit message #64: fixes # This is the commit message #65: fixes # This is the commit message #66: fixes # This is the commit message #67: fixes # This is the commit message #68: fixes # This is the commit message #69: fixes # This is the commit message #70: fixes # This is the commit message #71: fixes # This is the commit message #72: fixes # This is the commit message #73: fixes # This is the commit message #74: fixes # This is the commit message #75: fixes # This is the commit message #76: fixes # This is the commit message #77: fixes # This is the commit message #78: fixes # This is the commit message #79: more testing # This is the commit message #80: testing # This is the commit message #81: fixes # This is the commit message #82: fixes # This is the commit message #83: fixes # This is the commit message #84: fixes # This is the commit message #85: fixes # This is the commit message #86: fixes # This is the commit message #87: fixes # This is the commit message #88: fixes # This is the commit message #89: fixes # This is the commit message #90: fixes # This is the commit message #91: fixes # This is the commit message #92: fixes # This is the commit message #93: propagate-environment for buildkite-agent # This is the commit message #94: propagate-environment for buildkite-agent # This is the commit message #95: propagate-environment for buildkite-agent # This is the commit message #96: propagate-environment for buildkite-agent # This is the commit message #97: fixes # This is the commit message #98: fixes # This is the commit message #99: fixes # This is the commit message #100: fixes # This is the commit message #101: fixes # This is the commit message #102: fixes # This is the commit message #103: fixes # This is the commit message #104: fixes # This is the commit message #105: fixes # This is the commit message #106: fixes # This is the commit message #107: fixes # This is the commit message #108: fixes # This is the commit message #109: fixes # This is the commit message #110: fixes # This is the commit message #111: fixes # This is the commit message #112: fixes # This is the commit message #113: fixes # This is the commit message #114: fixes # This is the commit message #115: fixes # This is the commit message #116: fixes # This is the commit message #117: fixes # This is the commit message #118: fixes # This is the commit message #119: fixes # This is the commit message #120: fixes # This is the commit message #121: fixes # This is the commit message #122: fixes # This is the commit message #123: fixes # This is the commit message #124: fixes # This is the commit message #125: fixes # This is the commit message #126: fixes # This is the commit message #127: fixes # This is the commit message #128: fixes # This is the commit message #129: fixes # This is the commit message #130: fixes # This is the commit message #131: fixes # This is the commit message #132: fixes # This is the commit message #133: fixes # This is the commit message #134: fixes # This is the commit message #135: fixes # This is the commit message #136: fixes # This is the commit message #137: fixes # This is the commit message #138: fixes # This is the commit message #139: fixes # This is the commit message #140: fixes # This is the commit message #141: fixes # This is the commit message #142: fixes # This is the commit message #143: fixes # This is the commit message #144: fixes # This is the commit message #145: fixes # This is the commit message #146: fixes # This is the commit message #147: fixes # This is the commit message #148: fixes # This is the commit message #149: fixes # This is the commit message #150: fixes # This is the commit message #151: fixes # This is the commit message #152: fixes # This is the commit message #153: testing # This is the commit message #154: fixes # This is the commit message #155: fixes # This is the commit message #156: fixes # This is the commit message #157: fixes # This is the commit message #158: fixes # This is the commit message #159: fixes # This is the commit message #160: fixes # This is the commit message #161: fixes # This is the commit message #162: fixes # This is the commit message #163: fixes # This is the commit message #164: fixes # This is the commit message #165: fixes # This is the commit message #166: fixes # This is the commit message #167: fixes # This is the commit message #168: fixes # This is the commit message #169: fixes # This is the commit message #170: fixes # This is the commit message #171: fixes # This is the commit message #172: fixes # This is the commit message #173: fixes # This is the commit message #174: fixes # This is the commit message #175: fixes # This is the commit message #176: fixes # This is the commit message #177: fixes # This is the commit message #178: fixes # This is the commit message #179: fixes # This is the commit message #180: fixes # This is the commit message #181: fixes # This is the commit message #182: fixes # This is the commit message #183: fixes # This is the commit message #184: fixes # This is the commit message #185: fixes # This is the commit message #186: fixes
This PR addresses the features of #127
A transaction scheduler is now its own concept extracted from block generation. Producers can select a different scheduling algorithm for any block/test/etc.
A few basic schedulers are provided as are some meta-schedulers which can be used to compose fuzzy test cases. As new requirements emerge the set of schedulers and test cases should grow.
In addition, basic plumbing for scheduling transactions generated as an output of processing other transactions is included however, we currently have no capacity for generating/processing that type of transaction. Those will come as part of a later feature pull.