New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #125: sign block with old key if signing key updated in same block #1203

Merged
merged 19 commits into from Aug 31, 2018

Conversation

Projects
2 participants
@abitmore
Member

abitmore commented Jul 29, 2018

PR for #125.

This PR also contains a fix about block size check, and a fix about removing item from fork db.

Things to be done:

  • test case for loading object_database from disk which contains signing keys not in genesis
  • test case for witness_update_operation
  • test case for witness_create_operation
  • test case for witness plugin update: I have done some manual tests, it's too much efforts to create such test cases.

abitmore added some commits Jul 29, 2018

Test case to reproduce #125 block producing issue
When signing key is updated in the same block, producing will fail.

@abitmore abitmore added this to New -Awaiting Core Team Evaluation in Project Backlog via automation Jul 29, 2018

@abitmore abitmore removed this from New -Awaiting Core Team Evaluation in Project Backlog Jul 29, 2018

@abitmore abitmore added this to In progress in Feature release (201810) via automation Jul 29, 2018

@abitmore

This comment has been minimized.

Member

abitmore commented Jul 29, 2018

Travis complains:

2795751ms th_a object_database.cpp:106 open ] Opening object database from /tmp/graphene-tmp/e16c-550d-aba8-ade6 ...
chain_test: /home/travis/build/bitshares/bitshares-core/libraries/db/include/graphene/db/simple_index.hpp:67: const graphene::db::object& graphene::db::simple_index::insert(graphene::db::object&&) [with T = graphene::chain::global_property_object]: Assertion `!_objects[instance]' failed.
Running 214 test cases...
unknown location(0): fatal error in "change_signing_key": signal: SIGABRT (application abort requested)
/home/travis/build/bitshares/bitshares-core/tests/common/database_fixture.cpp(271): last checkpoint

Related code:

// close the database, flush all data to disk
db.close();
// reopen database, all data should be unchanged
db.open(data_dir.path(), make_genesis, "TEST");

Failed to reopen database file after closed? Any idea how to fix?

@pmconrad

This comment has been minimized.

pmconrad commented Jul 31, 2018

The database is not meant to be reopened after close. Also see discussion in #436 #687 #689 .

@abitmore

This comment has been minimized.

Member

abitmore commented Jul 31, 2018

Destroyed db object and created a new one. Travis-CI no longer complains.

abitmore added some commits Aug 18, 2018

Initialize witness key cache at correct time
moved `init_witness_key_cache()` call from `plugin_startup()` to `plugin_initialize()`.
private:
void _apply_block( const signed_block& next_block );
processed_transaction _apply_transaction( const signed_transaction& trx );
void _cancel_bids_and_revive_mpa( const asset_object& bitasset, const asset_bitasset_data_object& bad );
/// For tracking signing keys of specified witnesses, only update when applied a block or popped a block
flat_map< witness_id_type, optional<public_key_type> > _witness_key_cache;

This comment has been minimized.

@pmconrad

pmconrad Aug 19, 2018

IMO it is a bad idea to introduce block-dependent state in database that is not covered by object_db. This state must be maintained manually, which is fragile. I think the problem at hand can be solved in a more simple way.

@@ -250,7 +253,7 @@ block_production_condition::block_production_condition_enum witness_plugin::mayb
}
fc::time_point_sec scheduled_time = db.get_slot_time( slot );
graphene::chain::public_key_type scheduled_key = scheduled_witness( db ).signing_key;
graphene::chain::public_key_type scheduled_key = *db.find_witness_key_from_cache( scheduled_witness ); // should be valid

This comment has been minimized.

@pmconrad

pmconrad Aug 19, 2018

I think instead of using the cache, witness_plugin could simply connect to the database::applied_block signal and use that to retrieve the latest keys of the witness(es) it is interested in.

This comment has been minimized.

@abitmore

abitmore Aug 19, 2018

Member

Essentially you mean to move the cache to witness plugin. Because there is no signal when a block is popped out, in order to deal with chain reorganization, the cache should be in (extended) chain state. Then, in order to make it work, need to either replay the whole chain to enable the plugin, or add code to refresh the cache (copy data from main chain state to the extended part) on startup. Then, in case when there is a reorganization to the startup head block, need to make sure the cache don't get emptied.

I didn't see it's simpler so far.

This comment has been minimized.

@pmconrad

pmconrad Aug 19, 2018

The witness plugin is only interested in current head block. Blocks are only popped when switching forks, which means there will be block pushes immediately after the pop, which means the cache will automatically be updated in that case.
This is slightly different from the cache in database itself - that should always be up to date, also after popping blocks.

Overall, I think even if the resulting code is not significantly simpler, the containment in the witness plugin is a big plus.

This comment has been minimized.

@abitmore

abitmore Aug 19, 2018

Member

I agree that ideally it's better to put the logic in witness plugin because block generation is actually witness plugin's business.

Blocks are only popped when switching forks, which means there will be block pushes immediately after the pop, which means the cache will automatically be updated in that case.

One scenario to consider: if put the cache in witness plugin, if we code the plugin to track applied operations only, after a block is popped out, the cache may become inconsistent if the popped out block contains a witness_update_operation, when pushing a new block, if there is no the same witness_update_operation in the new block, the cache would still be inconsistent. To solve this, we need additional code to detect if a chain reorganization has occurred and refresh the cache accordingly, if not simply refresh the whole cache on every block. However, when refreshing the cache, we need to fetch data from main chain state, since signalling is a multi-threading thing, the main state could have changed when fetching data, which means the cache is not 100% reliable.

By putting the cache in database class we can guarantee its consistency. Perhaps there will still be edge cases, for example, witness plugin tries to generate a block in the middle when database is pushing a block.

Another thing is we don't have unit test for witness plugin, so adding more code there means more work / risks.

This comment has been minimized.

@pmconrad

pmconrad Aug 20, 2018

The witness plugin typically needs to keep track of the keys of only one witness. Refreshing this after each pushed block should have minimal cost, and only for the node in question.

This comment has been minimized.

@abitmore

abitmore Aug 22, 2018

Member

I still don't like to refresh from the plugin by querying chain state directly, because chain state may have become dirty already when refreshing. For example, this scenario:

  • there was a witness_update_operation in _pending_transactions
  • a block is pushed to database without that operation
  • that operation got pushed to database soon after finished pushing the block
  • witness plugin received the "applied_block" signal
  • witness plugin tries to refresh local cache, but got updated key (dirty data).

That said, ideally we need to keep a clean state or a subset in database for plugins or other modules to use. This PR is trying in this way, IMHO we can merge it now, then refactor it to a more generic module later when necessary.

This comment has been minimized.

@pmconrad

pmconrad Aug 23, 2018

I think the applied_block signal is processed in sync right after the block has been applied. There shouldn't be any tx's pushed in between.
https://github.com/bitshares/bitshares-core/blob/2.0.180612/libraries/chain/db_block.cpp#L550

This comment has been minimized.

@abitmore

abitmore Aug 23, 2018

Member

Oh, right, I forgot that. The state should be clean when processing applied_block event.

By the way, why we need to sleep here and there in test cases? IIRC sometimes chain_test will segfault without sleeping. E.G.

fc::usleep(fc::milliseconds(200)); // sleep a while to execute callback in another thread

This comment has been minimized.

@pmconrad

pmconrad Aug 27, 2018

I think it's required in some places where API callbacks happen. These are asynchronous.
Shouldn't be necessary in market_rounding_tests IMO.

if( !(skip & skip_witness_signature) )
FC_ASSERT( witness_obj.signing_key == block_signing_private_key.get_public_key() );
{
auto signing_key = find_witness_key_from_cache( witness_id );

This comment has been minimized.

@pmconrad

pmconrad Aug 19, 2018

Instead of using the cache, simply move the existing FC_ASSERT down a couple of lines, after _pending_tx_session.reset();

This comment has been minimized.

@abitmore

abitmore Aug 19, 2018

Member

It's cheaper to check here than checking after called _pending_tx_session.reset();.

This comment has been minimized.

@pmconrad

pmconrad Aug 19, 2018

This will only hurt the node that is trying to produce without proper keys.

This comment has been minimized.

@abitmore

abitmore Aug 19, 2018

Member

Even for nodes with proper keys, it's still slightly cheaper because the cache would be much smaller than object database.

This comment has been minimized.

@pmconrad

pmconrad Aug 27, 2018

For nodes with keys it may be cheaper, but the global cache is a penalty for all nodes.

@abitmore abitmore self-assigned this Aug 27, 2018

@abitmore

This comment has been minimized.

Member

abitmore commented Aug 29, 2018

Moved the signing key cache to witness plugin as recommended by @pmconrad.

Merge branch 'develop' into 125-signing-key-check
Resolved conflicts:
  libraries/chain/db_block.cpp

@abitmore abitmore removed their assignment Aug 29, 2018

abitmore added some commits Aug 29, 2018

Estimates block size with 21-bit transactions.size
which means we assume it will be no more than 3 bytes after serialization. 21 bits means at maximum 2^21-1=2,097,151 transactions in a block, it's practically safe although theoretically the size is unsigned_int which can be 56 bits at maximum.
@pmconrad

Looks good, thanks! (And much simpler than before, IMO.)

The PR contains several unrelated changes. That's OK, IMO - no need to create separate PRs for minor code improvements
We need to keep this balanced though. Perhaps not quite as many unrelated things in the future.

{
// Note: if this check failed (which won't happen in normal situations),
// we would have temporarily broken the invariant that
// _pending_tx_session is the result of applying _pending_tx.

This comment has been minimized.

@pmconrad

pmconrad Aug 30, 2018

At this point _pending_tx_session has been cleared, so the comment makes no sense.

This comment has been minimized.

@abitmore

abitmore Aug 30, 2018

Member

Yes, _pending_tx_session has been cleared, but _pending_tx hasn't, so they're inconsistent. That's why I wrote the comment.

This comment has been minimized.

@pmconrad

pmconrad Aug 30, 2018

Thanks. Never thought of this as an invariant, but I guess it's ok.

@@ -532,7 +538,17 @@ void database::_apply_block( const signed_block& next_block )
uint32_t skip = get_node_properties().skip_flags;
_applied_ops.clear();
FC_ASSERT( (skip & skip_merkle_check) || next_block.transaction_merkle_root == next_block.calculate_merkle_root(), "", ("next_block.transaction_merkle_root",next_block.transaction_merkle_root)("calc",next_block.calculate_merkle_root())("next_block",next_block)("id",next_block.id()) );
if( !(skip & skip_block_size_check) )

This comment has been minimized.

@pmconrad

pmconrad Aug 30, 2018

Should at least soft-fork protect this.

// should fail to push if it's too large
if( fc::raw::pack_size(maybe_large_block) > gpo.parameters.maximum_block_size )
{
BOOST_CHECK_THROW( db.push_block(maybe_large_block), fc::exception );

This comment has been minimized.

@pmconrad

pmconrad Aug 30, 2018

The test should ensure that this is triggered a couple of times. Perhaps increment a counter each time and BOOST_CHECK_EQUAL the counter after the loop ends. (Not sure how often it should trigger...)

This comment has been minimized.

@abitmore

abitmore Aug 30, 2018

Member

Added a check to make sure it has been triggered at least once. I think it's enough. Didn't use BOOST_CHECK_EQUAL because it would be a pain when trying to tweak the loop in the future.

Feature release (201810) automation moved this from In progress to In Testing Aug 30, 2018

@pmconrad

This comment has been minimized.

pmconrad commented Aug 30, 2018

Ok, I won't insist on the soft fork.

@abitmore abitmore merged commit 8189b45 into develop Aug 31, 2018

3 checks passed

ci/dockercloud Your tests passed in Docker Cloud
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

Feature release (201810) automation moved this from In Testing to Done Aug 31, 2018

@abitmore abitmore deleted the 125-signing-key-check branch Aug 31, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment