Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add bulk_operations invalidation limit #31

Merged
merged 1 commit into from
Feb 11, 2021

Conversation

donaldong
Copy link
Contributor

@donaldong donaldong commented Feb 11, 2021

Summary

This adds a limit to the number of records we would select from the database. Previously I thought we're selecting by uniq_by columns -- I was confused and wrong. We're actually selecting by the columns_to_update so the number of records can be super big.

This PR:

  • Sets such limit. If we're going to load many records; simply invalidate everything
  • We only care about the columns_to_update if they overlap with the columns we're memorizing
  • Adds monitoring

Test Plan

  • ci

@donaldong donaldong changed the base branch from main to base/donaldong/feature_add_bulk_o76d8ee5 February 11, 2021 00:40
donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong donaldong force-pushed the donaldong/feature_add_bulk_operat97f3f74 branch from a8598a2 to 5525c7a Compare February 11, 2021 00:40
donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong
Copy link
Contributor Author

update

donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong donaldong force-pushed the donaldong/feature_add_bulk_operat97f3f74 branch from 5525c7a to df61e80 Compare February 11, 2021 00:46
@codecov-io
Copy link

codecov-io commented Feb 11, 2021

Codecov Report

Merging #31 (df61e80) into base/donaldong/feature_add_bulk_o76d8ee5 (2b8173f) will decrease coverage by 0.08%.
The diff coverage is 94.87%.

Impacted file tree graph

@@                             Coverage Diff                              @@
##           base/donaldong/feature_add_bulk_o76d8ee5      #31      +/-   ##
============================================================================
- Coverage                                     97.05%   96.97%   -0.09%     
============================================================================
  Files                                            32       32              
  Lines                                          1901     1916      +15     
============================================================================
+ Hits                                           1845     1858      +13     
- Misses                                           56       58       +2     
Impacted Files Coverage Δ
lib/redis_memo/memoize_query/invalidation.rb 93.82% <93.54%> (-1.77%) ⬇️
lib/redis_memo/memoize_query.rb 98.52% <100.00%> (+0.02%) ⬆️
lib/redis_memo/options.rb 85.71% <100.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b8173f...df61e80. Read the comment docs.

donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong donaldong force-pushed the base/donaldong/feature_add_bulk_o76d8ee5 branch from 2b8173f to d1f5d54 Compare February 11, 2021 01:12
@donaldong
Copy link
Contributor Author

add a test case

donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong donaldong force-pushed the donaldong/feature_add_bulk_operat97f3f74 branch from df61e80 to 7772396 Compare February 11, 2021 01:12
or_chain = or_chain.or(model_class.where(conditions))

record_count = RedisMemo.without_memo { or_chain.count }
if record_count > bulk_operations_invalidation_limit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it'd be better to just look at the total # of records being imported, to avoid this extra query / extra computation to iterate through all the records.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra computation to iterate through all the records.

Well this will send a SELECT count(*) query.

By sending this query we can avoid the expensive payload transfer and activerecord record installation.

I wonder if it'd be better to just look at the total # of records being imported

I don't think this would do. For example, if we on_duplicate_key_update: [:visibility], and we are also memorizing visibility, we could still be getting a lot of records back from the database

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I think about it, we can still query by uniq_by if we're really querying by uniq_by instead of those columns to update! That should be more efficient and we don't need to worry about the size limit since the uniq_by set is smaller or equal to the record set we're importing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm -- we cannot really query by uniq_by only because the uniq_by value might or might not exist -- it could be something filled by the database. So yeah I think query by the columns_to_update is still the best we can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the best we can do

not really the best we can do, but I think it's good enough for now. Let's save more optimizations for later

conditions = {}
unique_by.each do |column|
conditions[column] = record.send(column)
columns_to_select = columns_to_update & RedisMemo::MemoizeQuery
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add unit tests / modify existing ones to check the logic we originally missed here?

E.g. if we're importing Site.import(records, on_duplicate_key_update: [:a, :b]), but we only memoize the column :a, we shouldn't be querying / invalidating :b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Will add this test case.

donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong
Copy link
Contributor Author

modify test case

donaldong added a commit that referenced this pull request Feb 11, 2021
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong donaldong force-pushed the donaldong/feature_add_bulk_operat97f3f74 branch from 7772396 to 7687c3a Compare February 11, 2021 18:57
Pull Request Branch: donaldong/feature_add_bulk_operat97f3f74
Pull Request Link: #31
@donaldong donaldong changed the base branch from base/donaldong/feature_add_bulk_o76d8ee5 to main February 11, 2021 19:02
@donaldong donaldong force-pushed the donaldong/feature_add_bulk_operat97f3f74 branch from 7687c3a to 2c80ca1 Compare February 11, 2021 19:02
@donaldong donaldong merged commit 27da298 into main Feb 11, 2021
@donaldong donaldong deleted the donaldong/feature_add_bulk_operat97f3f74 branch February 11, 2021 19:06
@donaldong donaldong mentioned this pull request Feb 11, 2021
donaldong added a commit that referenced this pull request Feb 11, 2021
### Features
- Support Rails 6 versions
- Support memoizing a method conditionally (#28)
- Add Redis connection pool (#29)
- Add bulk_operations invalidation limit (#31)
- Add an env var to skip memoize_table_column (#30)

### Bug Fixes
- Avoid fetching too many records for bulk_operations invalidation (#31)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants