New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout to index plays lock #912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lock times out, shouldn't we try and cancel the db writes to the plays table?
Good callout, I passed the lock through and checked if it's owned before trying to write. I tested this by placing wait calls to ensure it wasn't owned to check that the bool was false. |
@jowlee so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. this makes a lot of sense
@dmanjunath you're correct -- we run every minute. On the whole, things should be fast enough, but perhaps during downtime or an outage or something like that, the lock never got released, so this would make us fall at most 10 minutes behind.
I'm going to open a separate PR to add a health check for this indexing
Going to merge this & deploy to staging so we can start testing with it! |
Trello Card Link
na
Description
The job to index plays on discovery provider occasionally gets stuck because the lock cannot be acquired.
This change adds a timeout of 10 minutes to the redis lock for plays so that it will only be stuck for 10 minutes at most.
NOTE: if the job were to be run twice, meaning it for some reason takes over 10 minutes to make the queries to update the plays table, there would be an inconsistency in the table for the number of plays.
Something interesting to note: I queried DP 1 for the message
Failed to acquire update_play_count_lock
and it seems to show up for about 4 min every 3 hrs. I think this is b/c DP restarts in kuberentes? There are logs for the amount of time the job takes, but there was a bug w/ how it was logged, so I can't get the avg, max time the indexing plays takes.Services
Discovery Provider
Does it touch a critical flow like Discovery indexing, Creator Node track upload, Creator Node gateway, or Creator Node file system?
Delete an option.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide repro instructions & any configuration.
Include log analysis if applicable.
I ran it locally and added time.sleep to the indexing job to force it to take a while to check that the lock was released.