Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch saving strategy that does not require unique indexes #15627

Conversation

Ladas
Copy link
Contributor

@Ladas Ladas commented Jul 21, 2017

Batch saving strategy that does not require unique indexes. Using batched SQL we are able to achieve huge saving time improvement.

@Ladas
Copy link
Contributor Author

Ladas commented Jul 21, 2017

@miq-bot assign @agrare
@miq-bot add_label enhancement

Add default_batch strategy allowing to do batched SQL while
not being concurrent safe.
Add batch_extra_attributes parameter
Rename default_batch to batch saver strategy
Extract unique_index_columns and on_conflict_update as these
are only 2 parts that differ between concurrent safe batch strategy
using unique indexes and basic batch strategy.
Add batch_extra_attributes to the list of all_attribute_keys, which
will allow us to save attributes that were not explicitely defined,
but were created by model side effect.
Add basic batch strategy, this strategy does not require unique
indexes. But it is not concurrent safe.
Do not duplicate deleted records
Always call Returning so we are able to ouptu :created_records
@Ladas Ladas force-pushed the batch_saving_strategy_that_does_not_require_unique_indexes branch from 94c93dd to f59cfcb Compare July 24, 2017 12:57
def initialize(model_class: nil, manager_ref: nil, association: nil, parent: nil, strategy: nil, saved: nil,
custom_save_block: nil, delete_method: nil, data_index: nil, data: nil, dependency_attributes: nil,
attributes_blacklist: nil, attributes_whitelist: nil, complete: nil, update_only: nil,
check_changed: nil, custom_manager_uuid: nil, custom_db_finder: nil, arel: nil, builder_params: {},
inventory_object_attributes: nil, unique_index_columns: nil, name: nil, saver_strategy: nil,
parent_inventory_collections: nil, manager_uuids: [], all_manager_uuids: nil, targeted_arel: nil,
targeted: nil, manager_ref_allowed_nil: nil, secondary_refs: {}, use_ar_object: nil,
custom_reconnect_block: nil)
custom_reconnect_block: nil, batch_extra_attributes: [])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there are a ridiculous number of parameters to this initialize method. Sandy Metz recommends no more than 4. Feels like this is a bad pattern that can be refactored in future PRs. /cc @agrare

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chessbyte indeed, we are at the point where this should be a final list, that should cover all the crazy corner cases we have. :-)

So next will be to refactor these, part will got to the Settings, rest should be divided into more objects we will pass here(also consuming settings). There are already several areas we can group, e..g Saver, DatabaseLoader, AttributesBuilder, RecordsMatcher, etc.

@cben
Copy link
Contributor

cben commented Jul 24, 2017 via email

Use primary_key for batch update matching, because it's simpler
and much faster for processing.
@miq-bot
Copy link
Member

miq-bot commented Jul 24, 2017

Checked commits Ladas/manageiq@72e9c38~...3127e3d with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
4 files checked, 2 offenses detected

app/models/manager_refresh/save_collection/saver/sql_helper.rb

@Ladas
Copy link
Contributor Author

Ladas commented Jul 24, 2017

@cben yes, #15627 (comment)

then the interface would look something like:

saver = ManagerRefresh::Saver.new(:strategy => batch, :check_changed => true, :delete_method => nil) 
InventoryCollection.new(:parent => ems, :association => vms, :manager_ref => [:ems_ref], :saver => saver)

# then extract another group ...

I've already did a quick walk-through with @agrare , it will get few more iterations to get the grouping and the naming right. I should have time now, since we are near feature complete solution. :-)

Also, I am still thinking about the right place to pass Settings, since that is tied to EMS, so maybe we will be passing ManagerRefresh::Settings.new(ems) and we will extract the right section there. Also I am thinking what everything should be in the settings and what belongs to Persistor. (so what needs to be dynamically changeable)

# @param batch_extra_attributes [Array] Array of symbols marking which extra attributes we want to store into the
# db. These extra attributes might be a product of :use_ar_object assignment and we need to specify them
# manually, if we want to use a batch saving strategy and we have models that populate attributes as a side
# effect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ladas do you have an example of needing this currently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for vm, when we change :raw_power_state, other attributes are set, so we need to have

:batch_extra_attributes => [:power_state, :state_changed_on, :previous_state],

hopefully this can be autodiscovered later, while passing the same specs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks

end

def on_conflict_update
false
Copy link
Member

@agrare agrare Jul 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do on_conflict_update with this strategy?

Scratch that, needs unique indexes https://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT

Copy link
Member

@agrare agrare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me, I think we all know the parameter list is out of control and needs refactoring and we have plans for doing so.

@agrare agrare merged commit 182b16b into ManageIQ:master Jul 25, 2017
@agrare agrare added this to the Sprint 66 Ending Aug 7, 2017 milestone Jul 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants