-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Add bulk_operations invalidation limit #31
Conversation
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
a8598a2
to
5525c7a
Compare
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
update |
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
5525c7a
to
df61e80
Compare
Codecov Report
@@ Coverage Diff @@
## base/donaldong/feature_add_bulk_o76d8ee5 #31 +/- ##
============================================================================
- Coverage 97.05% 96.97% -0.09%
============================================================================
Files 32 32
Lines 1901 1916 +15
============================================================================
+ Hits 1845 1858 +13
- Misses 56 58 +2
Continue to review full report at Codecov.
|
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
2b8173f
to
d1f5d54
Compare
add a test case |
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
df61e80
to
7772396
Compare
or_chain = or_chain.or(model_class.where(conditions)) | ||
|
||
record_count = RedisMemo.without_memo { or_chain.count } | ||
if record_count > bulk_operations_invalidation_limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it'd be better to just look at the total # of records being imported, to avoid this extra query / extra computation to iterate through all the records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra computation to iterate through all the records.
Well this will send a SELECT count(*)
query.
By sending this query we can avoid the expensive payload transfer and activerecord record installation.
I wonder if it'd be better to just look at the total # of records being imported
I don't think this would do. For example, if we on_duplicate_key_update: [:visibility]
, and we are also memorizing visibility
, we could still be getting a lot of records back from the database
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I think about it, we can still query by uniq_by if we're really querying by uniq_by instead of those columns to update! That should be more efficient and we don't need to worry about the size limit since the uniq_by set is smaller or equal to the record set we're importing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm -- we cannot really query by uniq_by
only because the uniq_by value might or might not exist -- it could be something filled by the database. So yeah I think query by the columns_to_update is still the best we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the best we can do
not really the best we can do, but I think it's good enough for now. Let's save more optimizations for later
conditions = {} | ||
unique_by.each do |column| | ||
conditions[column] = record.send(column) | ||
columns_to_select = columns_to_update & RedisMemo::MemoizeQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add unit tests / modify existing ones to check the logic we originally missed here?
E.g. if we're importing Site.import(records, on_duplicate_key_update: [:a, :b])
, but we only memoize the column :a, we shouldn't be querying / invalidating :b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Will add this test case.
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
modify test case |
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
7772396
to
7687c3a
Compare
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74 Pull Request Link: #31
7687c3a
to
2c80ca1
Compare
### Features - Support Rails 6 versions - Support memoizing a method conditionally (#28) - Add Redis connection pool (#29) - Add bulk_operations invalidation limit (#31) - Add an env var to skip memoize_table_column (#30) ### Bug Fixes - Avoid fetching too many records for bulk_operations invalidation (#31)
Summary
This adds a limit to the number of records we would select from the database. Previously I thought we're selecting by uniq_by columns -- I was confused and wrong. We're actually selecting by the columns_to_update so the number of records can be super big.
This PR:
Test Plan