Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRM-21260: CiviMail compose UI very slow to initialize, CRM-21316: Refactor getRecipients fn #11142

Merged
merged 3 commits into from
Jan 18, 2018

Conversation

monishdeb
Copy link
Member

@monishdeb monishdeb commented Oct 16, 2017

Overview

This PR is intended to reduce slowness of CiviMail screen during initialization and drafting, by addressing various issues:

Before

  1. AngularJs use Mailing.create API to fetch recipients in an inefficient way, which are:
    a) Create unnecessary mailing Job later used by getRecipients(...) to fetch recipients
    b) Cause rollback on DB changes made after creating dummy mailing, it's jobs and mailing_recipients records. See here
    c) getRecipients(...) is an expensive function as in underlying code it executes some queries needlessly and also dependent on mailing job (only used in temp table name) which is unnecessary.
  2. Reduce AJAX calls while CiviMail initialization and drafting. Some of the REST API are triggered needlessly and causes an issue like:
    REST API call for Mailing.getlist({"params":{"id":{"IN":[]}},"extra":["is_hidden"]})
    Response: {"is_error":1,"error_message":"invalid criteria for IN"}

After

  1. getMailRecipients(...) is created after simplifying the code of self::getRecipients(..) and these are the changes made:
    a. Takes saved mailing object as the parameter later used mostly to build filters.
    b. This function ONLY fetch and store mailing recipients, doesn't handle SMS mode
    c. Use fewer JOINs in underlying queries.
    d. Simplifies code to fetch contacts from selected mailing smart group(s).
    e. Doesn't execute queries needlessly, e.g.
    i) if SG (smart groups) are selected then ONLY queries responsible to fetch contacts from SG are executed
    ii) If no 'Include' groups are selected then simply return as this also means no groups are selected to fetch recipients.
    f. Refreshes contact cache of all smart groups, at once.
    g. Kept both functions for now to reduce any unintended regressions
  2. This function is later called via mailing.create API using a special parameter
    {get_recipients: TRUE} and removed force_rollback and removed dummy mailing, job create
  3. Reduce AJAX calls (In progress.. for now fixed the error mentioned in 2nd issue)

Technical Details

Marked with [WIP] as optimisation needed and reduce more AJAX calls to improve performance.


@monishdeb monishdeb added the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Oct 16, 2017
@monishdeb monishdeb changed the title CRM-21260: CiviMail compose UI very slow to initialize, CRM-21316: Si… CRM-21260: CiviMail compose UI very slow to initialize, CRM-21316: Simplify getRecipients fn Oct 16, 2017
@monishdeb monishdeb force-pushed the CRM-21260 branch 4 times, most recently from ed4c6f8 to 0033ace Compare October 16, 2017 07:38
* @param bool $storeRecipients
* @param bool $dedupeEmail
* @param null $mode
* This function is created after simplifying the code of self::getRecipients(..) and these are the changes made
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would only put permanent documentation in the PR, and put descriptions of changes / improvements that are relevant while reviewing into comments on the PR. That way, the repo only has comments that continue to be relevant. So, for example, the PR should contain:
This function provides the recipients of a CiviMail mailing.
It provides a simplified implementation of an earlier function, now removed, self::getRecipients(..).
NB: The Smart Group contact cache is refreshed immediately by this call for all included and excluded smart groups in the mailing.

In comments on the PR, put in 1 - 5b.

Now, a question: do we really want this code to override the smart cache timeout on groups and force their refresh? Wouldn't it be better to rely on the cache than to force it to be rebuilt? We could provide an additional button for those rare cases where someone is concerned about forcing a rebuild, and willing to accept the possibly long rebuild time. Recall that this can take 5 - 10 minutes for some large complex hierarchical smart groups.

Copy link
Member Author

@monishdeb monishdeb Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we are not forcing to rebuilt group contact cache because CRM_Contact_BAO_GroupContactCache::load($groupDAO) respects the cache and refresh time, and on basis of which it will decide to rebuild contact cache for smart group identified by $groupDAO which is how it's been done in other places.

$job = CRM_Mailing_BAO_MailingJob::getTableName();
$mg = CRM_Mailing_DAO_MailingGroup::getTableName();
$eq = CRM_Mailing_Event_DAO_Queue::getTableName();
// there is no need to proceed further as no mailing group is selected to include recipients,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change 'as' to 'if', or put this comment inside the if
// there is no need to proceed further if no mailing group is selected to include recipients,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do


$groupDAO = CRM_Core_DAO::executeQuery($sql);
while ($groupDAO->fetch()) {
if ($groupDAO->cache_date == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you are not rebuilding the cache for all groups, just for ones where the cache is empty/expired? If so, please clarify the DocBlock for the function.

Copy link
Member Author

@monishdeb monishdeb Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if cache is empty as per earlier code in getRecipients(...), I haven't changed this logic. Will mention it in doc block

}

// Create a temp table for contact exclusion.
$excludeTempTablename = "excluded_recipients_mailing_temp";
Copy link
Contributor

@JoeMurray JoeMurray Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause errors if two people are sending at the same time. Could you make unique name using a UUID or something?

Copy link
Member Author

@monishdeb monishdeb Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will use "excluded_recipients_mailing_temp" . substr(sha1(rand()), 0, 4)


// Create a temp table for contact exclusion.
$excludeTempTablename = "excluded_recipients_mailing_temp";
$includedTempTablename = "included_recipients_mailing_temp";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make unique eg by using UUID

);
if (!empty($recipientsGroup['Exclude'])) {
$sql = sprintf("
INSERT INTO %s (contact_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you estimate how much memory these two temp tables might take based on number of groups and size of db (eg for 600k records, with a group involving 10 groups and total of 1.8M members), then post into dev channel for feedback on if this is problem? There are currently around 20 places in codebase where temp tables using ENGINE=HEAP are created.

INSERT INTO %s (contact_id)
SELECT DISTINCT gc.contact_id
FROM civicrm_group_contact gc
WHERE gc.status = 'Added' AND gc.id IN (%s)", $excludeTempTablename, implode(',', $recipientsGroup['Exclude']));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the names of Groups sanitized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the query-builder can handle INSERT...SELECT... with validation/escaping/interpolation:

CRM_Utils_SQL_Select::from('civicrm_group_contact')
  ->select('DISTINCT gc.contact_id')
  ->where('gc.status = "Added" AND gc.id IN (#groups)')
  ->param('#groups', $recipientsGroup['Exclude'])
  ->insertInto($excludeTempTablename, array('contact_id'))
  ->execute();

$storeRecipients = FALSE,
$dedupeEmail = FALSE,
$mode = NULL) {
public static function getMailRecipients($mailingObj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you put into function signature DocBlock an explanation of what $mailingObj is, eg the current mailing object (as opposed to a former excluded mailing object)?

FROM civicrm_mailing_event_queue meq
INNER JOIN civicrm_mailing_job mj ON meq.job_id = mj.id
INNER JOIN civicrm_mailing_group mg ON mj.mailing_id = mg.entity_id AND mg.entity_table = 'civicrm_mailing'
WHERE mg.mailing_id = %d AND mg.group_type = 'Exclude'", $excludeTempTablename, $mailingID);
Copy link
Contributor

@JoeMurray JoeMurray Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably confused, but isn't $mailingID the ID for the current mailing, so the comment on L211 doesn't make sense to me. If $mailingID is the current mailing, then it seems to imply that the Mailing Event Queue has been built, which could be used to retrieve the recipients.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, why refactoring, I overlooked this query and simply included it from here without any thought. I will remove this query as it is no longer necessary here because we are building recipients new without any job or event queue.

$mailing_id = NULL,
$storeRecipients = FALSE,
$dedupeEmail = FALSE,
$mode = NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sending via SMS currently using this function? Wouldn't the performance improvements be equally useful for SMS recipients?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now to keep it simple I have created this new fn, only to fetch mail recipients, and for civiSMS it will use the old getRecipients() fn. Also this new fn is only called via civiMail screen, so once I am done with the refactoring will figure out a way to extend support for phone recipients.

// Get all the group contacts we want to include.
$mailingGroup->query(
"CREATE TEMPORARY TABLE $includedTempTablename
(email_id int, contact_id int primary key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to follow the convention of always putting primary key first to avoid confusing people, as you rely on contact_id being primary below when using a REPLACE query:
(contact_id int primary key, email_id int)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok .. will swap.

$groupBy = $groupJoin = '';
$orderBy = "i.contact_id, i.email_id";
if ($mailingObj->dedupe_email) {
$orderBy = "MIN(i.contact_id), MIN(i.email_id)";
Copy link
Contributor

@JoeMurray JoeMurray Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using DISTINCT in the SELECT clause, a synonym for DISTINCTROW, is more efficient than doing this order by / group by as a method of eliminating duplicates. You'll still need the group by, but you can remove the effort to sort/order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old cruft code :( will change

@monishdeb monishdeb changed the title CRM-21260: CiviMail compose UI very slow to initialize, CRM-21316: Simplify getRecipients fn CRM-21260: CiviMail compose UI very slow to initialize, CRM-21316: Refactor getRecipients fn Oct 18, 2017
@monishdeb monishdeb force-pushed the CRM-21260 branch 2 times, most recently from d1a9119 to 7375f51 Compare October 18, 2017 16:26
@monishdeb
Copy link
Member Author

Jenkins test this please

1 similar comment
@seamuslee001
Copy link
Contributor

Jenkins test this please

@monishdeb monishdeb removed the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Oct 24, 2017
@davejenx
Copy link
Contributor

@monishdeb I have tested this patch on the site that I mentioned in CRM-21260. I used Chrome, using Inspector -> Network to check the time elapsed until the UI finishes loading, including the WYSIWYG: in my testing this is always the last thing to load, immediately after the recipient count changes from "(Estimating)" to "No recipients".

Previous testing for CRM-21260 showed that the number of message templates makes a big difference. So I tested with & without the patch with different numbers of message templates enabled (all vs just the reserved ones). 3 trials in each condition.

Results

Testing 4.7.26RC-201710090252

Message templates: 31:
41s average (38s, 40s, 44s)
Total transferred: 150KB XHR / 1.3MB All
XHR requests: 22, biggest: 106KB builder?an=angular-modules.json
Biggest API response: 18KB (Mailing.getfields)

Message templates: 1212:
117s average (115s, 110s, 127s)
Total transferred: 150KB XHR / 1.4MB All
XHR requests: 22, biggest: 106KB builder?an=angular-modules.json
Biggest API response: 18KB (Mailing.getfields)

Testing 4.7.26RC-201710090252 + PR 11142

Message templates: 31:
38s average (38s, 35s, 40s)
Total transferred: 144KB XHR / 1.3MB All
XHR requests: 10, biggest: 106KB builder?an=angular-modules.json
Biggest API response: 18KB (Mailing.getfields)

Message templates: 1212:
115s average (114s, 122s, 110s)
Total transferred: 144KB XHR / 1.4MB All
XHR requests: 10, biggest: 106KB builder?an=angular-modules.json
Biggest API response: 18KB (Mailing.getfields)

Results Summary

The patch reduced the average initialization time by 2-3 seconds and the number of AJAX requests by 12.
The number of enabled message templates makes a much bigger difference to the initialization time.
Even with the patch and with only 31 message templates enabled, the load time of 38s seems very slow.
The great majority of the load time, as described in CRM-21260, seems to be client-side: during most of the initialization time the server is idle, no AJAX requests are in progress and the browser is using 100% CPU. With the patch in place, the biggest delay occurs between the end of the Mailing.getfields request and the start of the 2nd (of 3) Mailing.create POST.

@davejenx
Copy link
Contributor

@monishdeb See #11267 for @nbrettell's work on loading message templates on demand. Getting excellent improvements in load time here.

@eileenmcnaughton
Copy link
Contributor

test this please

@eileenmcnaughton
Copy link
Contributor

@davejenx @monishdeb @mattwire @seamuslee001 (cc @totten since could affect mosaico)

A lot of time seems to have gone into testing & discussing this & my take is that it is possibly mergeable on test - pass (which needs to be confirmed) & that it has possibly stalled due to lack of clarity about status

@seamuslee001
Copy link
Contributor

@monishdeb can you rebase this please

@monishdeb
Copy link
Member Author

@mlutfy
Copy link
Member

mlutfy commented Dec 19, 2017

Code seems good to me. Big patch, but a lot of cleanup. However, I did not test.

@mattwire @davejenx Can you have one last look & confirm it's working with master?

Is anyone using this patch in production?

@lcdservices
Copy link
Contributor

@mlutfy we have implemented the patch for NYSS and are planning to roll to production this week.

@JKingsnorth
Copy link
Contributor

@lcdservices Has this been running OK in production?

@lcdservices
Copy link
Contributor

@JKingsnorth yes, we've been running in a high volume multi-site environment for a couple weeks without issue

@JKingsnorth
Copy link
Contributor

Great, is there any further testing or review required to merge this?

@mlutfy
Copy link
Member

mlutfy commented Jan 18, 2018

Merging based on the comments/reviews/testing. Thanks everyone!

@mlutfy mlutfy merged commit ad853d8 into civicrm:master Jan 18, 2018
@monishdeb monishdeb deleted the CRM-21260 branch January 18, 2018 17:00
@monishdeb
Copy link
Member Author

Thank you everyone for your feedback and effort in getting this PR merged :)

@davejenx
Copy link
Contributor

@monishdeb Thanks for all your work!

@eileenmcnaughton
Copy link
Contributor

well done all

if (count($includeSmartGroupIDs)) {
CRM_Utils_SQL_Select::from($contact)
->select("$contact.id as contact_id, $email.id as email_id")
->join($email, " INNER JOIN $email ON $email.contact_id = $contact.id ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @monishdeb I've just spotted something - I think these changes will break trying to include a smart group in an SMS type mailing? There is no if statement, so this will insert email IDs into the 'phone ID' column in the temporary table?

I haven't proved this definitively yet, working on setting up my localhost to test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @JKingsnorth for notifying this break. I am going to a create a separate PR for this fix.

@JKingsnorth
Copy link
Contributor

Unfortunately I haven't been able to test this on dmaster, because every time I try to create a new SMS mailing I hit a DB error. I'm not sure whether that's related to this PR or not yet.

However, from reading that part of the code, I think that this PR will introduce a problem in smart groups being included in SMS mailings.

@@ -1788,7 +1609,7 @@ public static function create(&$params, $ids = array()) {

// Create parent job if not yet created.
// Condition on the existence of a scheduled date.
if (!empty($params['scheduled_date']) && $params['scheduled_date'] != 'null' && empty($params['_skip_evil_bao_auto_schedule_'])) {
if (!empty($params['scheduled_date']) && $params['scheduled_date'] != 'null') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened #11653 to e-instate this check - I assume is was removed on the assumption that it was not being used but it's actually accessible via the api. We should merge that to 4.7.31

@mlutfy mlutfy added this to the 4.7.31 milestone Feb 9, 2018
totten added a commit to totten/civicrm-core that referenced this pull request Mar 30, 2018
In `CRM_Mailing_BAO_Mailing::create` and `Mailing.create` API, there is a
(*ahem*) special behavior where setting the `scheduled_date` will
immediately trigger scheduling.  One shouldn't submit `scheduled_date` for a
preview.

This was not symptomatic before civicrm#11142/v4.7.31 because all preview
operations were wrapped in a transaction and rolled back.  But now previews
are allowed to have side-effects, so we need some other means to prevent.

This copies the workaround from `crmMailingMgr.save()` and applies
it to `crmMailingMgr.preview()` (etal).
jmcclelland pushed a commit to jmcclelland/civicrm-core that referenced this pull request Apr 10, 2018
In `CRM_Mailing_BAO_Mailing::create` and `Mailing.create` API, there is a
(*ahem*) special behavior where setting the `scheduled_date` will
immediately trigger scheduling.  One shouldn't submit `scheduled_date` for a
preview.

This was not symptomatic before civicrm#11142/v4.7.31 because all preview
operations were wrapped in a transaction and rolled back.  But now previews
are allowed to have side-effects, so we need some other means to prevent.

This copies the workaround from `crmMailingMgr.save()` and applies
it to `crmMailingMgr.preview()` (etal).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.