Skip to content

Retry 20 times instead of 10#58

Merged
bbuchalter merged 1 commit into
masterfrom
even_more_retries
Oct 11, 2018
Merged

Retry 20 times instead of 10#58
bbuchalter merged 1 commit into
masterfrom
even_more_retries

Conversation

@bbuchalter

Copy link
Copy Markdown

No description provided.

@bbuchalter bbuchalter requested review from insom and sroysen October 11, 2018 19:38
@bbuchalter bbuchalter merged commit e08b5db into master Oct 11, 2018
@bbuchalter bbuchalter deleted the even_more_retries branch October 11, 2018 19:48
@shopify-shipit shopify-shipit Bot temporarily deployed to production October 11, 2018 20:23 Inactive
@sroysen

sroysen commented Oct 11, 2018

Copy link
Copy Markdown

Sorry I am late to the party.
Increasing this value to 20 is fine for me.

But I want to change something else, probably more radically: https://github.com/Shopify/lhm/blob/master/lib/lhm/invoker.rb#L18

I propose to change that value to a positive one: 10 seconds instead of -2.

With a -2 there (lock wait timeout for the lhm session two second below anybody else), it means that in the case of a queue of clients waiting to grab the lock, the lhm client is going to be the one giving up first.
Which is why we had so many issues completing this lhm/ptosc on collects (and it will likely happen again if we had to run another one on that table): https://github.com/Shopify/datastores/issues/2803

We first decided to add this delta after an incident in 2014, where we were still using the default lock wait timeout for the version of mysql we were using at that time (5.5): One year.

Looking at that from another perspective, between the LHM client getting the lock, instead of regular web/job client, I'd rather have the LHM succeeding: A single web or job worker timing out is no big deal.
It's not like if we are risking all shopify to lock down by increasing this value: That was the case in 2013/2014, where clients waited to acquire the lock forever, not like now, for ten seconds.

bbuchalter pushed a commit that referenced this pull request Oct 11, 2018
Per #58 (comment)

With a -2 there (lock wait timeout for the lhm session two second below anybody else), it means that in the case of a queue of clients waiting to grab the lock, the lhm client is going to be the one giving up first.
Which is why we had so many issues completing this lhm/ptosc on collects (and it will likely happen again if we had to run another one on that table): Shopify/datastores#2803

We first decided to add this delta after an incident in 2014, where we were still using the default lock wait timeout for the version of mysql we were using at that time (5.5): One year.

Looking at that from another perspective, between the LHM client getting the lock, instead of regular web/job client, I'd rather have the LHM succeeding: A single web or job worker timing out is no big deal.
It's not like if we are risking all shopify to lock down by increasing this value: That was the case in 2013/2014, where clients waited to acquire the lock forever, not like now, for ten seconds.
bbuchalter pushed a commit that referenced this pull request Oct 11, 2018
Per #58 (comment)

With a -2 there (lock wait timeout for the lhm session two second below anybody else), it means that in the case of a queue of clients waiting to grab the lock, the lhm client is going to be the one giving up first.
Which is why we had so many issues completing this lhm/ptosc on collects (and it will likely happen again if we had to run another one on that table): Shopify/datastores#2803

We first decided to add this delta after an incident in 2014, where we were still using the default lock wait timeout for the version of mysql we were using at that time (5.5): One year.

Looking at that from another perspective, between the LHM client getting the lock, instead of regular web/job client, I'd rather have the LHM succeeding: A single web or job worker timing out is no big deal.
It's not like if we are risking all shopify to lock down by increasing this value: That was the case in 2013/2014, where clients waited to acquire the lock forever, not like now, for ten seconds.
bbuchalter pushed a commit that referenced this pull request Oct 11, 2018
Per #58 (comment)

With a -2 there (lock wait timeout for the lhm session two second below anybody else), it means that in the case of a queue of clients waiting to grab the lock, the lhm client is going to be the one giving up first.
Which is why we had so many issues completing this lhm/ptosc on collects (and it will likely happen again if we had to run another one on that table): Shopify/datastores#2803

We first decided to add this delta after an incident in 2014, where we were still using the default lock wait timeout for the version of mysql we were using at that time (5.5): One year.

Looking at that from another perspective, between the LHM client getting the lock, instead of regular web/job client, I'd rather have the LHM succeeding: A single web or job worker timing out is no big deal.
It's not like if we are risking all shopify to lock down by increasing this value: That was the case in 2013/2014, where clients waited to acquire the lock forever, not like now, for ten seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants