Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

implement basic scheduler #145

Merged

Conversation

wanderingbort
Copy link
Contributor

This PR addresses the features of #127

A transaction scheduler is now its own concept extracted from block generation. Producers can select a different scheduling algorithm for any block/test/etc.

A few basic schedulers are provided as are some meta-schedulers which can be used to compose fuzzy test cases. As new requirements emerge the set of schedulers and test cases should grow.

In addition, basic plumbing for scheduling transactions generated as an output of processing other transactions is included however, we currently have no capacity for generating/processing that type of transaction. Those will come as part of a later feature pull.

Bart Wyatt added 4 commits August 8, 2017 10:21
add basic tests for scheduler
NB: we do not have the facilities to create/process these yet but they can be scheduled
scheduling is generated/signed agnostic though the block format still segregates them
chain_controllers _db now has indices for generated transactions.
tests using all signed transactions for now.

closes EOSIO#139
changed the output of the schedulers to be just pending_transactions as those are tagged unions of pointers making them small and easily copyable without needing the additional indirection
implemented basic test cases to stochastically verify that order of transaction delivery does not affect schedulability
@nathanielhourt
Copy link
Contributor

I haven't followed the conversations on scheduling that closely, so I probably need to be filled in. It looks like the overall strategy used here is to collect pending transactions in a list the same way as always, and then when it comes time to produce a block, switch that list over several cycles/threads. Is that correct?

That strategy doesn't make sense to me, but perhaps it's because I don't know what all the requirements of scheduling are... As I see it right now, transactions roll in from the network one at a time, but we are saving them up and scheduling them all at once at the end? Why not build up the schedule one trx at a time as they come in so that when it comes time to produce a block (crunch time, when we're on a deadline), we already have them all scheduled because all that work was done earlier?

So is there some necessary information we don't know until we have a full list of (hopefully-valid) transactions we'd like to schedule in the block? So what are the requirements for a valid schedule? I know we can't put messages with the same code account in different threads of the same cycle... What other requirements are there?

@wanderingbort
Copy link
Contributor Author

@bytemaster and I talked about pipe-lining scheduling so that we can push the cost of that entirely out of the quantum a producer has to produce a new block once it finally has knowledge of the preceding block. It is an open area of discussion.

However, without some commitment scheme that provides some guarantees that you are not pre-scheduling transactions another producer will commit to a block before your next quantum, pre-scheduling has limited benefits. Best case scenario, you have built a pre-schedule that you must drop some transactions from once you finally learn the preceding block's contents and the resulting schedule is still performant. Worst case, the transactions have "high-degree" scopes and the resulting schedule after there removal is unnecessarily synchronous without them.

I think more than likely, a set of well behaving producers will only have to schedule transactions broadcast during the previous producer-quantum and/or generated by the previous block. The discovered transactions may come in at the very end of the previous quantum and the block certainly will. In this case, pre-scheduling is largely busy-work as a majority of transactions will end up in blocks prior to the block a producer would get to generate.

@nathanielhourt
Copy link
Contributor

What are the possible conflicts between transactions? I know of a couple:

  • Trxs A and B cannot be in separate threads of the same cycle if A and B both contain a message with code account Q
  • Trxs A and B cannot be in separate threads of the same cycle if A contains a message with code account Q and B contains a message with account Q in scope

Are there any others?

@wanderingbort
Copy link
Contributor Author

as of now, conflicts only arise from these cases (though sheepishly I didnt realize that the code account in the messages may not be referenced in the scope array. if that is the case, I need to respect it in scheduling and I currently do not)

Pre-scheduling shouldn't result in an invalid schedule when transactions are dropped just potentially a poor schedule. Each additional transaction potentially adds restrictions to the scheduler, removing a transaction may reduce the restrictions a scheduler has. Naively, the fewer restrictions the more parallel a schedule can be. Having too many "phantom" restrictions from removed transactions has the same effect on the schedule efficiency for now benefit downstream (eg a retired transaction).

@nathanielhourt
Copy link
Contributor

nathanielhourt commented Aug 8, 2017

One thing I think is worth bearing in mind is that "scheduling" does not matter to any node at all, except to the node scheduled to produce block N, between the time that node has received block N-1 and the time it produces block N. Outside that very small window, scheduling is irrelevant and probably shouldn't be done at all, unless we make it so cheap that it's basically free. At all other times, the only thing any node cares about is whether the transaction appears valid enough to propagate it through the P2P network.

During that scheduling window, transactions appear one at a time, except for possibly some pending transactions we had that block N-1 didn't include (these should be rare, so it's probably not worth optimizing for them). So we probably do want to optimize for that incoming trickle of transactions.

Hmm, it's just occurred to me that notifying accounts is hairy... When a message handler notifies account A, to deliver that notification we've got to evaluate a handler with A as the code account... but that makes scheduling pretty much impossible because the scheduler doesn't know that A is going to be notified, so it can't ensure that A isn't already code account in a different thread.... or did I make a mistake somewhere?

@wanderingbort
Copy link
Contributor Author

or did I make a mistake somewhere?

Its my understanding after we had the big scheduling talk a few weeks back that all affected accounts would be listed explicitly and a transaction would fail if it violated this constraint. I would think this would include any notify actions that are delivered synchronously, those accounts must be in the transactions list of scopes.

During that scheduling window, transactions appear one at a time, except for possibly some pending transactions we had that block N-1 didn't include

During the quantum for block N, there is a lot to do that is post-scheduling such as actually executing the transactions in parallel and pruning any that fail during actual block dispatch (for whatever reason) and transmitting the block. I don't know that we can afford to wait for trickles. This may be a case where we have to trade transaction latency for throughput? I'm open to a JIT-scheduler but I do think that will limit the efficiency of scheduling and as a result decrease efficiency for downstream block learners (both partial and full).

That said, If every Producer is doing "one-time-batch" scheduling, then after validating block N-1 we should have a list of pending transactions ready to schedule which roughly resembles those that arrived since the time Producer(N-1) performed its scheduling. The time between these scheduling batches should be roughly the time between blocks. The latency expectation for a transaction would be [bt, 2bt] instead of [0, bt] where "bt" is the time between blocks. Likewise a scheduler has full knowledge of the transaction set which gives us the best shot at efficiency.

@bytemaster
Copy link
Contributor

I have been reading the discussion and the "code account" should not be relevant to scheduling, only the scopes. The scopes define which areas of memory can be locked. The same code can execute in parallel in different scopes.


uint cycle;
uint thread;
pending_transaction const *transaction;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistically, the * belongs to the type not the variable. Also, since transaction cannot be null it should be a const & and no pointers should be used at all.

}

template<typename CONTAINER>
auto initialize_pointer_vector(CONTAINER const &c) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistic, const CONTAINER& c, once again the '&' is part of the type, not the variable.

size_t current_size;
size_t const max_size;

bool should_skip(pending_transaction const *t) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer should be reference.

}

block_schedule block_schedule::by_threading_conflicts(
vector<pending_transaction> const &transactions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const type& transactions


block_schedule block_schedule::by_threading_conflicts(
vector<pending_transaction> const &transactions,
const global_property_object &properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& should be by type not variable name.

@@ -460,7 +501,7 @@ void chain_controller::_apply_block(const signed_block& next_block)
#warning TODO: Process generated transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to github issue and link to it and remove compile time warning

*/
struct block_schedule
{
typedef block_schedule (*factory)(vector<pending_transaction> const &, const global_property_object&);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const T&

shuffled_functor(NEXT &&_next) : next(_next){};

block_schedule operator()(vector<pending_transaction> const &transactions, const global_property_object& properties) {
std::random_device rd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This source of entropy is much slower and more robust than necessary for fuzzing. Also, I am not sure why we need to introduce randomness into our scheduling process. Not only will this make reproducing issues trickier it generates non-deterministic behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fuzzing and testing concerns should be kept outside the chain library in testing files anyways, no?



template<typename PRED, typename EVAL, typename T>
auto expect(EVAL &&eval, T expected, char const * const msg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding standard, template types should be Eval, Pred. ALL CAPS are reserved for preprocessor macros.

Also '&&' belongs next to type not variable name.

const char* const not char const * const

lossy_functor(NEXT &&_next) : next(_next){};

block_schedule operator()(vector<pending_transaction> const &transactions, const global_property_object& properties) {
std::random_device rd;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more random fuzzing... ?

@@ -31,23 +31,73 @@ using namespace eos;
using namespace chain;

/*
* templated meta schedulers for fuzzing
*/
template<typename NEXT>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEXT template argument looks like MACRO when. Should be updated to conform to style standards. "Next" for a template argument.

@bytemaster
Copy link
Contributor

I am still looking for an explanation for fuzzing / randomness.

@wanderingbort
Copy link
Contributor Author

@bytemaster in the updates all fuzzing code has been moved out of the library code and in to the test cases. There are 2 fuzzed test cases at the moment. While the current scheduler is very basic, i wanted to set a precedent for future scheduling modes that tests would include a basic stochastic verification that the order in which transactions are included does not affect the ability of the scheduler to run correctly and to completion.

This doesn't make use of the random dropping scheduler as scheduling at this level is not concerned by that. However, there is the need for future testing that shows dropped/re-ordered real transactions do not negatively affect a producers ability to schedule and process a block. We can remove the dropping fuzzer until it is needed.

As to why this was originally in non-test code, the idea was that contract devs would want to test how their contracts behave when faced with non-ideal network conditions. In particular making sure there is no way to get undesired behavior out of transaction ordering or drops that are part of any distributed system may be useful to a contract developer. To that end, i had added the ability to trivially fire up a sandbox with a fuzzing scheduler to create these scenarios semi-automatically on a local test network.

@wanderingbort
Copy link
Contributor Author

during the last merge of features from master, there was a non-trivial merge I would like to point out:

https://github.com/wanderingbort/eos/blob/77a521859ec40c2a15a41098815603b87b4bbf84/libraries/chain/chain_controller.cpp#L403-L412

The changes to master dropped transactions from the pending set if they failed during scheduling. In order to maintain this with the schedulable set no longer being just SignedTransaction s and with maintaining proper separation between scheduler input and caller data structures, I opted for a record-then-erase-later solution.

@bytemaster bytemaster merged commit 54b2387 into EOSIO:master Aug 25, 2017
ljrprocc pushed a commit to bithacks-tech/myeosio that referenced this pull request Jul 4, 2018
taokayan pushed a commit to taokayan/eos that referenced this pull request May 15, 2019
NorseGaud pushed a commit that referenced this pull request Jul 30, 2019
# This is the 1st commit message:

various improvements

# This is the commit message #2:

new hash

# This is the commit message #3:

fix for script path

# This is the commit message #4:

fixes

# This is the commit message #5:

fixes

# This is the commit message #6:

fixes

# This is the commit message #7:

fixes

# This is the commit message #8:

fixes

# This is the commit message #9:

fixes

# This is the commit message #10:

fixes

# This is the commit message #11:

fixes

# This is the commit message #12:

fixes

# This is the commit message #13:

fixes

# This is the commit message #14:

fixes

# This is the commit message #15:

fixes

# This is the commit message #16:

fixes

# This is the commit message #17:

fixes

# This is the commit message #18:

fixes

# This is the commit message #19:

fixes

# This is the commit message #20:

fixes

# This is the commit message #21:

fixes

# This is the commit message #22:

fixes

# This is the commit message #23:

fixes

# This is the commit message #24:

fixes

# This is the commit message #25:

fixes

# This is the commit message #26:

testing

# This is the commit message #27:

testing

# This is the commit message #28:

testing

# This is the commit message #29:

testing

# This is the commit message #30:

testing

# This is the commit message #31:

testing

# This is the commit message #32:

testing

# This is the commit message #33:

testing

# This is the commit message #34:

testing

# This is the commit message #35:

testing

# This is the commit message #36:

testing

# This is the commit message #37:

testing

# This is the commit message #38:

testing

# This is the commit message #39:

testing

# This is the commit message #40:

testing

# This is the commit message #41:

testing

# This is the commit message #42:

testing

# This is the commit message #43:

testing

# This is the commit message #44:

fixes

# This is the commit message #45:

fixes

# This is the commit message #46:

fixes

# This is the commit message #47:

fixes

# This is the commit message #48:

fixes

# This is the commit message #49:

fixes

# This is the commit message #50:

fixes

# This is the commit message #51:

fixes

# This is the commit message #52:

fixes

# This is the commit message #53:

fixes

# This is the commit message #54:

fixes

# This is the commit message #55:

fixes

# This is the commit message #56:

fixes

# This is the commit message #57:

fixes

# This is the commit message #58:

fixes

# This is the commit message #59:

fixes

# This is the commit message #60:

fixes

# This is the commit message #61:

fixes

# This is the commit message #62:

fixes

# This is the commit message #63:

fixes

# This is the commit message #64:

fixes

# This is the commit message #65:

fixes

# This is the commit message #66:

fixes

# This is the commit message #67:

fixes

# This is the commit message #68:

fixes

# This is the commit message #69:

fixes

# This is the commit message #70:

fixes

# This is the commit message #71:

fixes

# This is the commit message #72:

fixes

# This is the commit message #73:

fixes

# This is the commit message #74:

fixes

# This is the commit message #75:

fixes

# This is the commit message #76:

fixes

# This is the commit message #77:

fixes

# This is the commit message #78:

fixes

# This is the commit message #79:

more testing

# This is the commit message #80:

testing

# This is the commit message #81:

fixes

# This is the commit message #82:

fixes

# This is the commit message #83:

fixes

# This is the commit message #84:

fixes

# This is the commit message #85:

fixes

# This is the commit message #86:

fixes

# This is the commit message #87:

fixes

# This is the commit message #88:

fixes

# This is the commit message #89:

fixes

# This is the commit message #90:

fixes

# This is the commit message #91:

fixes

# This is the commit message #92:

fixes

# This is the commit message #93:

propagate-environment for buildkite-agent

# This is the commit message #94:

propagate-environment for buildkite-agent

# This is the commit message #95:

propagate-environment for buildkite-agent

# This is the commit message #96:

propagate-environment for buildkite-agent

# This is the commit message #97:

fixes

# This is the commit message #98:

fixes

# This is the commit message #99:

fixes

# This is the commit message #100:

fixes

# This is the commit message #101:

fixes

# This is the commit message #102:

fixes

# This is the commit message #103:

fixes

# This is the commit message #104:

fixes

# This is the commit message #105:

fixes

# This is the commit message #106:

fixes

# This is the commit message #107:

fixes

# This is the commit message #108:

fixes

# This is the commit message #109:

fixes

# This is the commit message #110:

fixes

# This is the commit message #111:

fixes

# This is the commit message #112:

fixes

# This is the commit message #113:

fixes

# This is the commit message #114:

fixes

# This is the commit message #115:

fixes

# This is the commit message #116:

fixes

# This is the commit message #117:

fixes

# This is the commit message #118:

fixes

# This is the commit message #119:

fixes

# This is the commit message #120:

fixes

# This is the commit message #121:

fixes

# This is the commit message #122:

fixes

# This is the commit message #123:

fixes

# This is the commit message #124:

fixes

# This is the commit message #125:

fixes

# This is the commit message #126:

fixes

# This is the commit message #127:

fixes

# This is the commit message #128:

fixes

# This is the commit message #129:

fixes

# This is the commit message #130:

fixes

# This is the commit message #131:

fixes

# This is the commit message #132:

fixes

# This is the commit message #133:

fixes

# This is the commit message #134:

fixes

# This is the commit message #135:

fixes

# This is the commit message #136:

fixes

# This is the commit message #137:

fixes

# This is the commit message #138:

fixes

# This is the commit message #139:

fixes

# This is the commit message #140:

fixes

# This is the commit message #141:

fixes

# This is the commit message #142:

fixes

# This is the commit message #143:

fixes

# This is the commit message #144:

fixes

# This is the commit message #145:

fixes

# This is the commit message #146:

fixes

# This is the commit message #147:

fixes

# This is the commit message #148:

fixes

# This is the commit message #149:

fixes

# This is the commit message #150:

fixes

# This is the commit message #151:

fixes

# This is the commit message #152:

fixes

# This is the commit message #153:

testing

# This is the commit message #154:

fixes

# This is the commit message #155:

fixes

# This is the commit message #156:

fixes

# This is the commit message #157:

fixes

# This is the commit message #158:

fixes

# This is the commit message #159:

fixes

# This is the commit message #160:

fixes

# This is the commit message #161:

fixes

# This is the commit message #162:

fixes

# This is the commit message #163:

fixes

# This is the commit message #164:

fixes

# This is the commit message #165:

fixes

# This is the commit message #166:

fixes

# This is the commit message #167:

fixes

# This is the commit message #168:

fixes

# This is the commit message #169:

fixes

# This is the commit message #170:

fixes

# This is the commit message #171:

fixes

# This is the commit message #172:

fixes

# This is the commit message #173:

fixes

# This is the commit message #174:

fixes

# This is the commit message #175:

fixes

# This is the commit message #176:

fixes

# This is the commit message #177:

fixes

# This is the commit message #178:

fixes

# This is the commit message #179:

fixes

# This is the commit message #180:

fixes

# This is the commit message #181:

fixes

# This is the commit message #182:

fixes

# This is the commit message #183:

fixes

# This is the commit message #184:

fixes

# This is the commit message #185:

fixes

# This is the commit message #186:

fixes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants