Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-21109 Add test & consolidate code controlling cache clearing, specific improvement on cli imports #10943

Merged
merged 4 commits into from Sep 29, 2017

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Sep 5, 2017

Overview

Add test, & consolidate code controlling cache clearance via the api. An alternate approach to #10905 by @MegaphoneJon. This also achieves Jon's original goal of less frequent cache clearing during cli imports

Before

CLI imports clear group cache once per line. No test

After

CLI imports clear group cache once. Test exists.

This also has the impact ok making cache clearing more consistent & overall slightly less as the
$config->doNotReset is respected by acl cache & group membership changes now too

Technical Details

CRM-21109 references a specific area of concern to reduce clearing of caches when running a CLI import & @MegaphoneJon has created #10905 to suggest that we set $config->doNotReset when looking into this.

My initial feeling was that we should allow 'cache instructions' to be passed in via api calls - ie. if you are doing a long running process then it should be possible to quash cache-clearing somewhat during this. As it turns out looking at the code it may also make sense to request more aggressive cache clearing in some (as yet undreamt) circumstances.

Comments

Looking at the code I have a number of notes - mostly due to refreshing my memory.

  • There are 3 caches that are potentially cleared when editing a contact:

    • group_contact_cache
    • acl_contact_cache (& acl_cache which is less important)
    • prevnext_cache
  • The prevnext_cache never seems to be a performance concern

- The group contact cache goes through a number of checks before clearing

  1. is $config->doNotReset set (this is not centralised so there may be bypasses). Import uses this.
  2. is the civicrm setting 'smart_group_cache_refresh_mode' set to 'opportunistic' (the default).
  3. it then checks a php static variable that makes sure the cache is not cleared more than once in a process (I wasn't clear about this - in the case of an import it means that if they are cleared at the start they probably won't be at the beginning
  4. it then checks for a mysql lock from another process. This is something that is not 100% reliable at the moment but could be made reliable for 5.7.5+ / MariaDB 10.0.2 + fairly trivially. The lock is released once the cache is cleared
    -5) after all this it will clear the cache, removing entries based on the timestamp of the group & the smartGroupCacheTimeout . There is an opportunity for optimisation in this query.

Based on the above I suspect it is more likely that if there is a WMF problem with the cache clearing it is the query running slow rather than it being too frequent

Opportunistic

To alter that setting & not clear the smart group in-process it is necessary to set the setting- potentially in civicrm.settings.php with the following:

global $civicrm_setting;
$civicrm_setting['CiviCRM Preferences']['smart_group_cache_refresh_mode'] = 'deterministic'; 

It is also necessary to set up a cron to clear the caches outside of processes (calling api job.group_cache_flush). If the smart group cache time out is set to 0 TRUNCATE will be used in the cron (rather then DELETE FROM). This is faster for many sites, but not WMF

optimisation

Acl_contact_cache

  • the only check done before clearing this is the doNotClearCache setting

This problem

Most of the above is by way of background. Specifically to this PR is the question

Should we allow the api to pass in parameters regarding how aggressively caches should be cleared, and if so what format. ie. do we want granular control over the 3 caches or simply a boolean whether to clear or not.

I realise there was some push back on cache management via the api in discussions about v4 api - but my sense is that was more about whether the api should do caching, rather than whether it should be able to tune-down cache clearing.

@totten @colemanw @xurizaemon @MegaphoneJon @ejegg ping


@xurizaemon
Copy link
Member

I think it's backwards that we write cache entries on contact.create rather than on read operations, and that if we reversed that we wouldn't need to tackle this sort of thing. But that's a bigger story than this.

For the above, I'd feel more comfortable if it said API calls could defer cache clearing until after a set of operations was completed - rather than just switch it off, which feels more likely to land in an inconsistent state. If the API has any command over cache at all (🤢) then perhaps it could do so in a way that has less side-effects, eg "pause cache updates; write 100 contacts; unpause".

@eileenmcnaughton
Copy link
Contributor Author

eileenmcnaughton commented Sep 5, 2017

OK based on further discussion with @xurizaemon here is my proposal

  1. Introduce a new setting
    ['cache_refresh_mode'] = 'deterministic|opportunistic';
    This would mean all 3 mentioned caches. If we want to later add a more nuanced option for the other 2 we can but the existing name is not generic enough for all 3

  2. Add a new function (maybe an api - e.g civicrm_api3('System', 'cache', array('on_hold' =>1 );
    which would set the setting above using a global - meaning it would only last for the session

  3. consolidate the handling of all three caches to respect the setting

  4. scripts using the above would need to self-clear the caches at the end -ie
    civicrm_api3('System', 'flush', array( );

@eileenmcnaughton
Copy link
Contributor Author

If we don't think an api for turning the cache status on & off is OK we could just have it as a function - but then we are saying 'hey external script people, call this, but it's not really supported'

@eileenmcnaughton
Copy link
Contributor Author

@MegaphoneJon I'd like to propose this as an alternate fix to your PR. It does SOME of what I have suggested above - specifically

  1. wrapping the doNotResetCache into a new function
    CRM_Core_Config::isPermitCacheFlushMode()
  • which allows for it to be changed in a follow up
  1. wrapping checks for doNotResetCache into
    CRM_Core_Config::isPermitCacheFlushMode()
  2. I moved the checks for isPermitCacheFlushMode into clearContactCaches - this has the impact of causing doNotResetCache to apply to all 3 caches
  3. I have started calling clearContactCaches instead of separate calls to flush caches. This has the possibly positive effect of respecting the doNotResetCache option in those scenarios where it currently isn't (addToGroup/RemoveToGroup). Overall it standardises & my feeling is that there is no case for different levels of cache flushing for addToGroup/RemoveToGroup vs editing contacts.
  4. it adds those functions into the cli class.

I am inclined to add a setting 'cache_refresh_mode' & use that rather than doNotResetCache - but think it makes sense as a follow up. This PR as it sits is IMHO cleanup + fix rather than a real change

@totten
Copy link
Member

totten commented Sep 6, 2017

Random reactions -- no particular order:

  • Naming convention -- in https://docs.civicrm.org/dev/en/latest/api/options/ , there's only one other multi-word option (options.match-mandatory). It uses -, so then options.do-not-reset-cache would match better.
  • Positive vs Negative -- Personally, I find it more natural to use more affirmative wording (eg cache-reset => true, cache-reset => false) rather than negative (eg do-not-reset-cache => false, do-not-reset-cache => true).
  1. Introduce a new setting ['cache_refresh_mode'] = 'deterministic|opportunistic';
    This would mean all 3 mentioned caches. If we want to later add a more nuanced option for the other 2 we can but the existing name is not generic enough for all 3

  2. Add a new function (maybe an api - e.g civicrm_api3('System', 'cache', array('on_hold' =>1 );
    which would set the setting above using a global - meaning it would only last for the session

  3. consolidate the handling of all three caches to respect the setting

I like the idea of consolidating on a setting. The System.cache API call doesn't appeal as much, but let me suggest SettingsStack and SettingsStackTest. This idea came up while writing unit-tests -- basically, you can temporarily override any setting. The service should already be pretty workable for PHP-based consumers.

For non-PHP consumers... how would you map into APIv3? Perhaps options.settings? eg

civicrm_api3('Contact', 'create', array(
  'options' => array('settings' => array('cache_refresh_mode' => 'deterministic')),
  'first_name' => '...',
  ...,
));

I like this from a flexibility+consistency perspective. However, you'd need some kind of permissioning -- not every API consumer should be trusted with access to every setting. But where to do draw the line...

@eileenmcnaughton
Copy link
Contributor Author

eileenmcnaughton commented Sep 6, 2017

At this stage I have done

consolidate the handling of all three caches to respect the setting

and it seems there is enough agreement about

Introduce a new setting ['cache_refresh_mode'] = 'deterministic|opportunistic';
This would mean all 3 mentioned caches. If we want to later add a more nuanced option for the other 2 we can but the existing name is not generic enough for all 3

for me to do that.

I'm inclined to punt the api function for now. Note two existing related api functions exist

  • job.cleanup
  • job.group_cache_flush

Copy link
Contributor

@MegaphoneJon MegaphoneJon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eileenmcnaughton setPermitCacheFlushMode is being set TRUE when I think it should be FALSE and vice versa - EXCEPT in the new instance (class.cli.php), which looks right. However, with the docblock missing, it's hard to say. Could you please look those calls over and ensure they're correct?

/**
* Is the system permitted to flush caches at the moment.
*
* This can be turned off by the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two docblocks are incomplete.

@eileenmcnaughton
Copy link
Contributor Author

@MegaphoneJon you are right - I changed from double negative somewhere along the way to positive & didn't correct it properly. Jenkins also picked it up.

Hopefully fixed now - Jenkins is passing opinion as we speak

@colemanw
Copy link
Member

colemanw commented Sep 7, 2017

Womp womp, 5 failures :(

Copy link
Contributor

@MegaphoneJon MegaphoneJon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes @eileenmcnaughton made; the remaining issues are in the test suite. They should be fixed, but then I'd say this should merge-on-pass.

public function testCreateIndividualNoCacheClear() {

$contact = $this->callAPISuccess('contact', 'create', $this->_params);
$groupID = $this->groupCreate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't getting cleaned up in tearDown() which causes the next test to use this method to fail because the group already exists.

@eileenmcnaughton eileenmcnaughton changed the title CRM-21109 Add test for controlling cache clearing CRM-21109 Add test & consolidate code controlling cache clearing, specific improvement on cli imports Sep 8, 2017
@eileenmcnaughton
Copy link
Contributor Author

@colemanw @MegaphoneJon got there - last changes were to caching & tests because I made CRM_Contact_BAO_GroupContactCache::remove() protected & had to adjust tests to that.

If anyone has any enthusiasm to review a follow up about half of the remove() function can go now as it is no longer being called without $groupID being passed in anymore (yay)

@eileenmcnaughton
Copy link
Contributor Author

test this please

@@ -678,7 +588,8 @@ public static function load(&$group, $force = FALSE) {
AND civicrm_group_contact.group_id = $groupID ";

$groupIDs = array($groupID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can now be moved down a few lines (to above self::updateCacheTime($groupIDs, $processed);), which I think would make it easier to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MegaphoneJon thanks for taking a look!. I fixed the issue you identified & also tried to make the array vs int clearer in a few places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! If you push that commit into this PR I'll take a look.


CRM_Core_DAO::executeQuery($query, $params);
if ($refresh) {
CRM_Core_DAO::executeQuery($refresh, $params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrelevant since we're deleting it - but this query will always be overwritten by the $update query. Oops.

@@ -217,7 +217,7 @@ public static function loadAll($groupIDs = NULL, $limit = 0) {
*/
public static function add($groupID) {
// first delete the current cache
self::remove($groupID);
self::clearGroupContactCache($groupID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing the test failure, because $groupID is (incorrectly) an array here. remove() said it accepted an int as the first argument, but could also accept an array. To fix this, I think you need to iterate through the array at line 208 so that add() is only passed an int.

Almost all the code in the function was to support a path no longer used in core, or seamingly anywhere
outside CiviHR.
@eileenmcnaughton
Copy link
Contributor Author

unrelated fail - @MegaphoneJon fixes are squashed into that last patch

@eileenmcnaughton
Copy link
Contributor Author

test this please

@eileenmcnaughton
Copy link
Contributor Author

@monishdeb @JoeMurray we should get this merged before you do any further work on groups!

@eileenmcnaughton
Copy link
Contributor Author

(I think it has been fairly thoroughly reviewed by @MegaphoneJon but it seems stalled - not quite sure why)

@MegaphoneJon
Copy link
Contributor

My last issue was a "merge on pass" change, so it's fine with me if this gets merged. FWIW though - the rebase before approval makes it hard to review just the changes that were requested. I'm not sure if there's an optimal workflow here though.

@eileenmcnaughton
Copy link
Contributor Author

test this please

@eileenmcnaughton
Copy link
Contributor Author

test fail was unrelated

@eileenmcnaughton
Copy link
Contributor Author

@monishdeb @colemanw @totten @yashodha Jon has reviewed this & I altered the pr to incorporate Tim's suggestion - merger needed

@monishdeb
Copy link
Member

I did some thorough testing in my local and my feedback aligns with what others have commented above. Unit tests are covered, test build passes. Merging ..

@eileenmcnaughton
Copy link
Contributor Author

Thanks @monishdeb @MegaphoneJon - I think this is a big improvement in readability - turns out a lot of code wasn't doing much!

@monishdeb
Copy link
Member

monishdeb commented Sep 29, 2017

Similarly loadAll(...) needs treatment

@eileenmcnaughton
Copy link
Contributor Author

loadAll is barely ever called ... and that is too often

@monishdeb
Copy link
Member

monishdeb commented Sep 29, 2017

Hmm agree, called only on contact summary page >> on clicking smart group pane under 'Group' tab. This is a good improvement and hopefully will resolve the issues we faced during group-contact-cache-clear. Thanks @eileenmcnaughton for your work :)

@eileenmcnaughton
Copy link
Contributor Author

@monishdeb you can actually disable people's ability to expand that tab on sites with lots of smart groups - there is a setting - pretty sure @seamuslee001 uses it

eileenmcnaughton pushed a commit to eileenmcnaughton/civicrm-core that referenced this pull request Feb 1, 2018
…p cache rebuild

This patch is merged upstream & I am submitting it &
civicrm#10943

to our codebase.

Doing this
a) syncs us with upstream in the area we are working on
b) adds some code consolidation making it easier for us to tinker with the impact of
cache tables.

Note that this change switches from one sql call to many when clearing groups from
the group_contact_cache table. I'm thinking this is probably not a problem for us as we have
not that many groups (& hopefully we can further reduce them).

Also the apparent COMMIT is a bit of a a myth - CRM_Core_Transaction does weird complex
transaction nesting stuff...

Change-Id: I3d6505b04af1134884f660d71316afb2490221d1
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Feb 1, 2018
This is a combination of the commits in
civicrm#10943

Bringing this in does not make any actual change for us. However, it makes it easier for us
to play with the following options

1) setting cache mode to FALSE on a per process basis.
Currently there are statics to try to ensure that caches are flushed
only once per process, but without this the add to group & remove from
group functions bypass them. From a script
CRM_Core_Config::setPermitCacheFlushMode(FALSE);
would prevent cache flushing in the session for group_contact_cache & acl caches until the
script ends or it is retoggled. Also acl caches were not previously subject to the statics.
2)We would consider flushing caches by cron rather than during script runtime. There
is a setting for that & we could extend to cover acl cache. Flushing by cron
permits (optionally) the use of TRUNCATE - which appears to be faster for everyone in the world but us :-)

Change-Id: Ib790939421d3b7ce5d2e258154a8015b8652234f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants