Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eos-vm-oc-enable auto mode #1322

Merged
merged 25 commits into from Jun 29, 2023
Merged

Add eos-vm-oc-enable auto mode #1322

merged 25 commits into from Jun 29, 2023

Conversation

heifner
Copy link
Member

@heifner heifner commented Jun 20, 2023

Short Description

In order to enhance the performance of EOS and EOS EVM, we are introducing the use of OC (Optimizing Compiler) for the EOS VM. This initiative aims to improve the overall efficiency and scalability of the platform.

What

The primary goal of this initiative is to leverage OC for the EOS VM by implementing an Auto OC Mode as the default setting. This new mode, eos-vm-oc-enable=auto, will enhance performance. OC will automatically be used based on different contexts, such as building blocks, applying blocks, executing transactions from HTTP or P2P, and various transaction types. OC will also be used for all contracts on eosio.* accounts. This includes: eosio, eosio.token, eosio.ibc, and eosio.evm.

Why

By utilizing OC, we can significantly boost the performance of EOS and EOS EVM. With the Auto OC Mode as the default, the system will automatically apply OC when appropriate, optimizing the execution of transactions and improving overall throughput. The introduction of OC for specific contexts, such as building blocks and speculative transaction executions, further enhances the efficiency of the platform.

Auto OC Mode as default

eos-vm-oc-enable=auto is the new default. Scripts that use --eos-vm-oc-enable will need to be updated to use --eos-vm-oc-enable all to enable OC for all contexts.

  • eos-vm-oc-enable=true => eos-vm-oc-enable=all (eos-vm-oc-enable=true continues to work and is mapped to all)
  • eos-vm-oc-enable=false => eos-vm-oc-enable=none (eos-vm-oc-enable=false continues to work and is mapped to none)
  • --eos-vm-oc-enable will not now work, replace with --eos-vm-oc-enable all

Baseline / OC selection

In Auto OC Mode, leverage EOS VM OC, as follows:

Context OC?
Building block baseline, but OC if account is eosio.*
Applying block OC, or
if producer: baseline, but OC if account is eosio.*
Speculatively executing trx from HTTP baseline, but OC if account is eosio.*
Speculatively executing trx from p2p baseline, but OC if account is eosio.* 💡
Compute transaction baseline, but OC if account is eosio.*
Read only transaction OC

💡 Although enabling OC for P2P would enable quicker validation of speculative trxs as they traverse the network, this PR does not add OC for P2P to help prevent overload on BP nodes. A separate option to enable OC for P2P when in auto is being considered as a follow on enhancement; see #1333.

Resolves #1251

/**
* @class wasm_interface_collection manages the active wasm_interface to use for execution.
*/
class wasm_interface_collection {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no functionality change here, just a refactor out of controller.

@@ -355,6 +355,9 @@ namespace eosio { namespace chain {
void replace_producer_keys( const public_key_type& key );
void replace_account_keys( name account, name permission, const public_key_type& key );

void set_producer_node(bool is_producer_node);
bool is_producer_node()const;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of sad about this change. Until now, the core of leap had no need to know if it was a producer or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I guess depending how much of a purist you want to be here, maybe you could make controller take a std::function<bool(apply_context)> tier_up_with_oc, either as a ctor parameter or a config or something. That way chain_plugin can decide how to set this up based on the various configs. And libtester just always passes in a return false; for example, I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems a bit too far.

* "eosio.any" -> "eosio"
* "eosio" -> "eosio"
*/
constexpr name prefix() const {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation copied from CDT

v = boost::any(wasm_interface::vm_oc_enable::oc_auto);
} else if (s == "all" || s == "true" || s == "on") {
v = boost::any(wasm_interface::vm_oc_enable::oc_all);
} else if (s == "none" || s == "false" || s == "off") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other previously valid choices were 1/0 and yes/no

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

…ime to match genesis used in other tests. This allows for setcode in slow ci/cd env.
@heifner heifner marked this pull request as ready for review June 22, 2023 13:24
@heifner heifner added the OCI Work exclusive to OCI team label Jun 22, 2023
libraries/chain/apply_context.cpp Outdated Show resolved Hide resolved
libraries/chain/controller.cpp Outdated Show resolved Hide resolved
libraries/chain/include/eosio/chain/name.hpp Show resolved Hide resolved
unittests/misc_tests.cpp Show resolved Hide resolved
throw validation_error(validation_error::invalid_option_value);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have fewer options in the future, we could add a deprecation note that only auto, all, and none will be allowed, all others could be deprecated in future releases.

std::unordered_map<std::thread::id, std::unique_ptr<wasm_interface>> threaded_wasmifs; // one for each read-only thread, used by eos-vm and eos-vm-jit
};

} // eosio::chain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this refactoring. It hides those details from the controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for auto, all, and none options to make sure OC is enabled/disabled as configured?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas on how? Unfortunately, oc is not setup to be used in unittests because the destructor of tester -> controller destroys oc and it can't be restarted because it forks the process on start. Seems like it would require a major refactor to allow oc to be used repeatably in unittests.

We could add some logging and look for a particular log in some integration tests. Let me see what I can do for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's surprising to hear -- it was definitely originally designed so that multiple controllers using OC can exist simultaneously: we have unittests that do that.

But, that was for the non-tier-up usage of OC. So maybe there was a shortcoming of the original impl in this area. Or maybe we've regressed some how.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration test added.

@heifner heifner linked an issue Jun 22, 2023 that may be closed by this pull request
…me to be found in logs on slow ci/cd machine
EOS_ASSERT(!is_on_main_thread(), misc_exception, "init_thread_local_data called on the main thread");
if (is_eos_vm_oc_enabled()) {
// EOSVMOC needs further initialization of its thread local data
wasmif.init_thread_local_data();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lost an #ifdef EOSIO_EOS_VM_OC_RUNTIME_ENABLED on these couple lines during the refactor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if most of the #ifdef EOSIO_EOS_VM_OC_RUNTIME_ENABLED could be removed to simplify differences. For now, I'll just revert back to having them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored #ifdefs

@@ -158,6 +162,10 @@ namespace eosio { namespace chain {
return it->module;
}

bool should_always_oc_tierup()const {
return wasm_runtime_time == wasm_interface::vm_type::eos_vm_oc || eosvmoc_tierup == wasm_interface::vm_oc_enable::oc_all;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think wasm_runtime_time == wasm_interface::vm_type::eos_vm_oc should be in here. wasm_runtime_time == wasm_interface::vm_type::eos_vm_oc is meant to be "OC in non-tier up mode" and all the code back here

if(my->eosvmoc && (my->should_always_oc_tierup() || context.should_use_eos_vm_oc())) {

should be skipped entirely in non-tier up mode. It's not clear what the ramifications are for having both a synchronous and asynchronous code_cache at the same time (which is what I think might be happening in this case.. not sure.. regardless it def wasn't originally intended to operate this way)

I think the line above should be something more on the order of

if(my->eosvmoc &&
  (my->eosvmoc_tierup == wasm_interface::vm_oc_enable::oc_all || context.should_use_eos_vm_oc())) { 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I missed that eos_vm_oc didn't flow throw here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a non issue from a functional standpoint -- I remember now that my->eosvmoc isn't created unless in oc_auto or oc_all mode,

if(eosvmoc_tierup != wasm_interface::vm_oc_enable::oc_none) {
EOS_ASSERT(vm != wasm_interface::vm_type::eos_vm_oc, wasm_exception, "You can't use EOS VM OC as the base runtime when tier up is activated");
eosvmoc.emplace(data_dir, eosvmoc_config, d);
}

so nothing wonky would have happened since my->eosvmoc && would have always failed out. But yeah def better now with the tweak.

@@ -405,6 +405,8 @@ namespace eosio { namespace testing {
cfg.eosvmoc_config.cache_size = 1024*1024*8;

for(int i = 0; i < boost::unit_test::framework::master_test_suite().argc; ++i) {
// don't use auto tier up for tests, since the point is to test diff vms
cfg.eosvmoc_tierup = chain::wasm_interface::vm_oc_enable::oc_none;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is inside the for() loop instead of just with the cfg. items a few lines above.

v = boost::any(wasm_interface::vm_oc_enable::oc_auto);
} else if (s == "all" || s == "true" || s == "on" || s == "yes" || s == "1") {
v = boost::any(wasm_interface::vm_oc_enable::oc_all);
} else if (s == "none" || s == "false" || s == "off" || s == "no" || s == "0") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, it appears boost was doing a case insensitive compare; so False, oN, etc was also accepted 😈 depending on how accommodating you want to be...

("eos-vm-oc-enable", bpo::bool_switch(), "Enable EOS VM OC tier-up runtime")
("eos-vm-oc-enable", bpo::value<chain::wasm_interface::vm_oc_enable>()->default_value(chain::wasm_interface::vm_oc_enable::oc_auto),
"Enable EOS VM OC tier-up runtime ('auto', 'all', 'none').\n"
"'auto' - EOS VM OC tier-up is enabled for eosio.* accounts and read-only trxs.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Lin already touched on it some, but I do feel like we ought to try and find a short concise way to sneak in that it's also enabled for applying blocks unless a BP. That aspect is completely missing in this description even without the extra "non BP rule".

Maybe something like

'auto' - EOS VM OC tier-up is enabled for eosio.* accounts, read-only trxs, and applying blocks (excluding on producers).

We already have plenty more wordy config items in read-mode, peer-log-format, etc.. so I feel like we can spend 5 words.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops somehow I was on an older revision when I made that comment. I see now that we did add commentary around applying blocks. Still not sure penny pinching 3 words and this avoiding documenting the producer exclusion is worth it... but what we have in the latest commit is fine to me since at least it includes block application

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of:
'auto' - EOS VM OC tier-up is enabled for eosio.* accounts, read-only trxs, and except on producers applying blocks.. Putting it as a paratheses at the end made it seem like producer excluded all cases.

@@ -0,0 +1,2 @@
#define BOOST_TEST_MODULE state_history_plugin
#include <boost/test/included/unit_test.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're using the header-only variant now, you should remove linking to the library from here

target_link_libraries(test_state_history state_history_plugin eosio_testing eosio_chain_wrap Boost::unit_test_framework)

@heifner heifner requested a review from spoonincode June 28, 2023 16:49
Copy link
Member

@spoonincode spoonincode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@@ -105,6 +105,8 @@ namespace eosio { namespace chain {
once_is_enough = true;
}
if(cd) {
if (!context.is_applying_block())
tlog("${a} speculatively executing ${h} with eos vm oc", ("a", context.get_receiver())("h", code_hash));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this meant to be left in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tlog is trace log, but no, that was for testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes, it is needed for the read_only_trx_test.py test. I'll add a comment about that.

@@ -306,6 +306,12 @@ code_cache_base::code_cache_base(const std::filesystem::path data_dir, const eos
}

void code_cache_base::set_on_disk_region_dirty(bool dirty) {
// tests can remove directory before destructor is called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels pretty weird to me.. it makes it sound like there are cases where fc::temp_dir is dtor'ed before the controller is dtor'ed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll fix the tests and remove this.

@heifner heifner merged commit 91d192b into main Jun 29, 2023
21 checks passed
@heifner heifner deleted the GH-1251-auto-oc branch June 29, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EOS VM OC: Implement new Auto OC Mode
3 participants