Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch refactor #1122

Closed

Conversation

oxarbitrage
Copy link
Member

@oxarbitrage oxarbitrage commented Jul 6, 2018

The pull is work in progress for #1103

Plugin was refactored and cleaned making it easier to understand to others and easier to maintain, elasticsearch basic auth was added, new field operation_id_num for easy filtering was added, node will be stopped at any insert failure avoiding operation gap, etc.

Logs were removed as they take too much space and the proper way to test is take snapshots and compare with operations inserted.

By now, tests were made up to block 10 million getting the exact same results in elastic and in RAM. A test in current block number will be done, the bitshares-core team is in the progress to rent a machine for this as a full traditional node is needed.

More commits will be added into this pull request before any merge specially in the es_objects plugin side, not cleaned yet. Also new commits are expected to address all the issues in #1103

elog("Error sending data to database");
//return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either remove the commented-out return statement or add curly braces around these two lines

if(!my->update_account_histories(b))
{
edump(("Error populating ES database, exiting to avoid history gaps."));
throw;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to throw graphene::chain::plugin_exception if you want to re-apply the block.

}

void elasticsearch_plugin_impl::prepareBulk(const account_transaction_history_id_type& ath_id,
const std::string& bulk_line )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bulk_line is a member of the plugin_impl, no need to pass it in as a parameter

Copy link
Contributor

@pmconrad pmconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't test this because I don't have an ES cluster ready. Error handling looks good, except for the throw - see my comment. Since this is probably hard to add to the unit test suite, I'd suggest you run a replay, then shut down your ES while the node is still replaying, then start ES up again and see if the replay goes through.

Generally, I think the code still leaves a lot of room for optimization, but it looks much cleaner now. Good job.

My impression of the es_objects plugin is that it contains a lot of boilerplate code that can probably be written much more tersely using templates. It seems the map<object_id_type,*_struct> maps are always added to but never cleared - this doesn't look like a good idea IMO.

@@ -152,9 +127,65 @@ std::vector<std::string> createBulk(std::string index_name, std::string data, st

bulk.push_back("{ \"index\" : { \"_index\" : \""+index_name+"\", \"_type\" : \"data\" "+create_string+" } }");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better don't create JSON strings like this. Create a variant_object instead and let fc::json serialize it.

@pmconrad
Copy link
Contributor

You are copying a lot of stuff around, might be worthwhile to avoid this wherever possible.
Use move operator when you insert bulk_line into bulk_lines for example.

@abitmore abitmore mentioned this pull request Jul 27, 2018
@oxarbitrage
Copy link
Member Author

oxarbitrage commented Jul 27, 2018

Can't test this because I don't have an ES cluster ready. Error handling looks good, except for the throw - see my comment. Since this is probably hard to add to the unit test suite, I'd suggest you run a replay, then shut down your ES while the node is still replaying, then start ES up again and see if the replay goes through.

Great suggestion. Had been doing some tests with this, now changed to plugin_exception here: 0fe975c

The good thing about this is that when doing a replay and ES gets down the plugin will keep trying until it gets up again. Doing some testing in the first index created by the plugin(bitshares-2015-10) when doing all straight the index end up with 161,004 records. By stopping the ES service and restarting several times the index ends up with the exact same number of ops.

Only difference is it seems it send some duplicates but this gets ignored by the database as the indexing is done by operation ID.

Will keep working in the other suggestions, very appreciated.

std::string trx_id = "";
if(!b.transactions.empty() && oho->trx_in_block < b.transactions.size()) {
if(!b.transactions.empty() && oho->trx_in_block < b.transactions.size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!b.transactions.empty() && is not needed.

const account_statistics_object& elasticsearch_plugin_impl::getStatsObject(const account_id_type account_id)
{
graphene::chain::database& db = database();
const auto &stats_obj = account_id(db).statistics(db);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase to latest develop branch, use database::get_account_stats_by_owner() for better performance.

fc::mutable_variant_object bulk_header;
bulk_header["_index"] = index_name;
bulk_header["_type"] = "data";
bulk_header["_id"] = fc::to_string(ath_id.space_id) + "." + fc::to_string(ath_id.type_id) + "." + ath_id.instance;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use something like fc::to_string(ath) or even directly use ath?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i don't like this line either. if i do fc::to_string(ath.id) it will add double " to the final string. if i do ath_id directly the compiler complains that cant convert it. looking for alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks. Current code is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants