Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-18011 improve lock handling for mysql 5.7.5+ #13854

Merged
merged 1 commit into from Apr 1, 2019

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Mar 18, 2019

Overview

Use mysql locks properly for sites on mysql or Maria DB versions that support them (mysql 5.7.5+ and MariaDB 10.0.2+)

https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_get-lock
https://mariadb.com/kb/en/library/get_lock/

Before

Multiple are not supported - only one lock is respected at a time - giving us limited ability to prevent resource contention

After

Multiple mysql locks are supported on mysql 5.7.5+ and MariaDB 10.0.2+

Currently if a process uses multiple locks - e.g to start running mailing jobs and in the process to lock a cache the lock on the cache will be ignored & contention will still occur. With this both locks are respected

Technical Details

I don't personally think this has risk - what it does is bypasses a hack that causes us to NOT ACTUALLY GET A LOCK if one has already been reserved & instead GET THE LOCK. This works on mysql versions that support it

UPDATE - this is currently opt in via a define -

define('CIVICRM_SUPPORT_MULTIPLE_LOCKS', TRUE);

Comments

5.7.5+ supports multiple mysql threads in a process, provided civicrm.settings.php
has the line

define('CIVICRM_SUPPORT_MULTIPLE_LOCKS', TRUE);

. We can be non-hacky once that is in play

@civibot
Copy link

civibot bot commented Mar 18, 2019

(Standard links)

@civibot civibot bot added the master label Mar 18, 2019
@eileenmcnaughton
Copy link
Contributor Author

@mattwire you should try adding this & see if it helps with your deadlocks - assuming recent civi version

@eileenmcnaughton
Copy link
Contributor Author

test this please

1 similar comment
@eileenmcnaughton
Copy link
Contributor Author

test this please

@mattwire
Copy link
Contributor

@eileenmcnaughton Thankyou I will test. Though I had a chat with @mfb on mattermost and we were wondering whether it might be better to rewrite (ie. copy) the civicrm lock class to do basically the same implemention as drupal (which uses a table to store the locks). Then we could have proper locking independent of database version.

@eileenmcnaughton
Copy link
Contributor Author

@mattwire yeah I saw that - I don't actually know that I see an advantage but if going that way the task is not to replace it but to make it swappable as something that implements LockInterface & follows the patterns that our cache interfaces to do to be swappable. I'm also not sure whether the alternate lock mechanism would be in core or not at the moment

@mattwire
Copy link
Contributor

I guess it depends on timescales for dropping support for older versions of mysql which is probably a long way off given about half the civi installs seem to be on 5.5 or equivalent (https://stats.civicrm.org/json/active-sites-server-mysql.json). The two sites that have had really major issues have both been on mysql 5.5 (though latest civi) where this patch wouldn't have helped but an alternative locking mechanism probably would have done.

@eileenmcnaughton
Copy link
Contributor Author

@mattwire true - but they have the option of upgrading & that's a business decision they can make.

@mfb
Copy link
Contributor

mfb commented Mar 19, 2019

Hrm It seems more like CiviCRM's "business decision" whether or not to fix bugs. Generally I lean towards fixing bugs as it helps out all the folks on older versions, but I could certainly see a case being made to drop support for older MySQL versions.

@eileenmcnaughton
Copy link
Contributor Author

Well CiviCRM CT & volunteers make decisions about which of the many things they could fix to put it's time into. In general I believe review of bug fixes & PRs in general is the main priority after fixing regressions but I'm increasingly doubting that we should be attempting to do that because it seems unachievable and I'd rather try to define the goals as something we can achieve

@eileenmcnaughton eileenmcnaughton added this to Main Review to-do-list in review board Mar 19, 2019
@mattwire
Copy link
Contributor

Couple of thoughts overnight:

  1. Just need someone to confirm they've tested this on a suitable system (the systems that had issues for me were on mysql 5.5 where it won't help).
  2. Is there a similar version for perconadb that we should be checking?
    3) I was going to say there should be a separate function to check the actual support but you've already done that in supportsMultipleLocks()

@eileenmcnaughton
Copy link
Contributor Author

Re percona - I think we can add that as it occurs

@mattwire
Copy link
Contributor

@eileenmcnaughton Can you just add a comment for now about percona in supportsMultipleLocks():
@fixme consider adding support for percona (https://www.percona.com/doc/percona-server/5.6/scalability/multiple_user_level_locks.html#multiple-user-level-locks)

@mattwire
Copy link
Contributor

@bhahumanists I think you are using this patch - are you able to confirm that nothing has broken?

Then, with the fixme comment about perconadb, we can get it merged.

5.7.5+ supports multiple mysql threads in a process. We can be non-hacky once that is in play
@eileenmcnaughton
Copy link
Contributor Author

@mattwire I updated the comment I also

  1. tested it in php storm - you can run tests in 2 tabs simultaneously & test that the same lock is not acquired but a different one it
  2. added an opt in define - @totten didn't merge this before because he wanted more robust production testing & suggested an opt in as a way to handle that.

@totten
Copy link
Member

totten commented Mar 29, 2019

Yeah, I think QA on this is difficult. (Double-whammy: environment-dependent issue; complicated runtime scenarios which are hard to simulate).

There is some test-coverage that can be relevant to lock behavior (e.g. the JobProcessMailingTest does multithreaded mail delivery with various permutations of config options; and IIRC the correctness of those options depends on lock behavior). We'd probably get a signal if the patch totally boinked the default lock behavior - but maybe there's some important scenarios not covered.

IMHO, given the QA challenges, the standard of care is to keep the old lock behavior while letting some folks phase-in more experimental mechanisms. After we have some reports from the bleeding-edge, then we can rejigger the default policy.

we were wondering whether it might be better to rewrite (ie. copy) the civicrm lock class to do basically the same implemention as drupal (which uses a table to store the locks). Then we could have proper locking independent of database version.

That's an excellent idea. Make a table. Make a new class. That was exactly the aim in setting up LockInterface. After creating the class, you can enable it via civicrm.settings.php:

define('CIVICRM_DATA_LOCK', 'myFactoryFunction');
define('CIVICRM_WORK_LOCK', 'myFactoryFunction');
define('CIVICRM_CACHE_LOCK', 'myFactoryFunction');
function myFactoryFunction($name) { return new My\Spiffy\Lock(...$name...); }

Is that better? I dunno. Three years ago (when I last poked into this), I started working on a SQL driver -- and then got side-tracked in trying to find the best way to do the SQL. (To select for update nowait or not to select for update nowait, that is the question.)

The reality is that there's a whole bunch of options, and I don't think any is best. It's an ambiguous question. There are just trade-offs (performance, correctness, difficulty, compatibility). (Witness: Symfony Lock: Reliability and the 30 million red flags) As long as we meet the standard of care, it's OK to merge any of the approaches.

If you're keen to find the best despite the ambiguity, then it's probably gonna take a couple hours work to find an answer, e.g.

  • Write a list or table with 5-10 paths (different backends; different styles of queries; different PHP libraries which we could wrap or port to) and the trade-offs (performance / compatibility / difficulty / refactoring risks / etc). Vet with a colleague to see if the list+trade offs seem right.
  • Make a small survey (eg a google form) to rate each option
  • Send to prod mtce team
  • Pick whichever rates highest. Use that as the starting point for work.

@totten
Copy link
Member

totten commented Mar 29, 2019

Oh, to be clear, I think this PR meets the standard of care. It may or may not be the best; but IMHO the bar is easy to meet because it's presented an experimental opt-in.

@eileenmcnaughton
Copy link
Contributor Author

@totten shall we merge this then since I think it takes us forward & we have languished here....

@seamuslee001
Copy link
Contributor

I think this has enough agreement around it that it can be merged, also the unit tests are passing and there are some there that suggest that if anything had broken it would show, also as this is opt in i think it is a safe patch to go in

@seamuslee001 seamuslee001 merged commit 9a642ff into civicrm:master Apr 1, 2019
review board automation moved this from Main Review to-do-list to done Apr 1, 2019
@eileenmcnaughton eileenmcnaughton deleted the lock branch April 1, 2019 20:30
@homotechsual
Copy link
Contributor

Docs issue on this PR could use some love: civicrm/civicrm-sysadmin-guide/issues/158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants