-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update aggregate user repopulate to use the past two weeks #2069
Conversation
most_recent_indexed_aggregate_block = 0 | ||
session.execute("TRUNCATE TABLE {}".format(table_name)) | ||
elif latest_indexed_block_num % REFRESH_COUNTER == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not liking this random behavior anymore. Gonna change this to use a last completed timestamp saved in redis, then update based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question about a potential bug but otherwise looks good!
most_recent_indexed_aggregate_block = 0 | ||
session.execute("TRUNCATE TABLE {}".format(table_name)) | ||
redis.set( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should move these redis.set calls after the session.execute. otherwise i think this could cause some inconsistency? like to avoid the case where a condition like variable gets set in redis but the worker gets restarted before completing so the db isn't actually updated. we should move it near the other redis.set after the execute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea moved it
Description
Running a full refresh with a truncate blocks other queries on aggregate_user. Since there are several dependencies on aggregate_user, we cannot simply do a drop and replace on that. We'd have to drop and recreate those dependencies too.
This change reduces the frequency of refreshes to about once a day and only refreshes user data from the past two weeks, reducing the frequency and duration of potential blocking.
Tests
Run locally against prod snapshot.
Verified updates in different states: refresh past two weeks, full recreation when cache is empty, regular updates.
How will this change be monitored?