Add timeout to index plays lock #912

jowlee · 2020-10-09T18:18:02Z

Trello Card Link

na

Description

The job to index plays on discovery provider occasionally gets stuck because the lock cannot be acquired.
This change adds a timeout of 10 minutes to the redis lock for plays so that it will only be stuck for 10 minutes at most.
NOTE: if the job were to be run twice, meaning it for some reason takes over 10 minutes to make the queries to update the plays table, there would be an inconsistency in the table for the number of plays.
Something interesting to note: I queried DP 1 for the message Failed to acquire update_play_count_lock and it seems to show up for about 4 min every 3 hrs. I think this is b/c DP restarts in kuberentes? There are logs for the amount of time the job takes, but there was a bug w/ how it was logged, so I can't get the avg, max time the indexing plays takes.

Services

Discovery Provider

Does it touch a critical flow like Discovery indexing, Creator Node track upload, Creator Node gateway, or Creator Node file system?

Delete an option.

✅ Nope

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide repro instructions & any configuration.
Include log analysis if applicable.

I ran it locally and added time.sleep to the indexing job to force it to take a while to check that the lock was released.

raymondjacobson

If the lock times out, shouldn't we try and cancel the db writes to the plays table?

jowlee · 2020-10-09T18:52:40Z

If the lock times out, shouldn't we try and cancel the db writes to the plays table?

Good callout, I passed the lock through and checked if it's owned before trying to write. I tested this by placing wait calls to ensure it wasn't owned to check that the bool was false.
There's also an extend method on the lock to extend the timeout when writing but wasn't sure if it was necessary in case the next job started the moment after the lock was checked for ownership be before the write happened.

dmanjunath · 2020-10-09T21:36:08Z

@jowlee so blocking_timeout is how long it spends trying to acquire the timeout and timeout is the ttl on the lock itself. so before we could be in a situation where we never releasing the lock if we got into a bad state. now it expires every 10 minutes regardless of if the task finishes or not? also this gets run once a minute right?

raymondjacobson

Nice. this makes a lot of sense

@dmanjunath you're correct -- we run every minute. On the whole, things should be fast enough, but perhaps during downtime or an outage or something like that, the lock never got released, so this would make us fall at most 10 minutes behind.

I'm going to open a separate PR to add a health check for this indexing

raymondjacobson · 2020-10-09T22:58:58Z

Going to merge this & deploy to staging so we can start testing with it!

Add timeout to index plays lock

b9979a3

jowlee requested a review from raymondjacobson October 9, 2020 18:18

jowlee assigned dmanjunath and piazzatron Oct 9, 2020

raymondjacobson reviewed Oct 9, 2020

View reviewed changes

Pass lock through to check if owned

cfbeb54

jowlee requested a review from raymondjacobson October 9, 2020 19:01

raymondjacobson approved these changes Oct 9, 2020

View reviewed changes

dmanjunath approved these changes Oct 9, 2020

View reviewed changes

raymondjacobson merged commit d8f5a52 into master Oct 9, 2020

raymondjacobson deleted the jowlee-timeout-plays branch October 9, 2020 22:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout to index plays lock #912

Add timeout to index plays lock #912

jowlee commented Oct 9, 2020

raymondjacobson left a comment

jowlee commented Oct 9, 2020 •

edited

dmanjunath commented Oct 9, 2020 •

edited

raymondjacobson left a comment

raymondjacobson commented Oct 9, 2020

Add timeout to index plays lock #912

Add timeout to index plays lock #912

Conversation

jowlee commented Oct 9, 2020

Trello Card Link

Description

Services

Does it touch a critical flow like Discovery indexing, Creator Node track upload, Creator Node gateway, or Creator Node file system?

How Has This Been Tested?

raymondjacobson left a comment

Choose a reason for hiding this comment

jowlee commented Oct 9, 2020 • edited

dmanjunath commented Oct 9, 2020 • edited

raymondjacobson left a comment

Choose a reason for hiding this comment

raymondjacobson commented Oct 9, 2020

jowlee commented Oct 9, 2020 •

edited

dmanjunath commented Oct 9, 2020 •

edited