Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dupe improve custom data handling #17060

Merged
merged 1 commit into from
Apr 13, 2020
Merged

Conversation

eileenmcnaughton
Copy link
Contributor

Overview

Reduce code complexity & locking queries when merging contacts

Before

When merging contacts and transferring custom data 3 locking queries run with 2 of them cancelling each other out

  1. INSERT row - the purpose being to explicitly cause the failure of
  2. UPDATE IGNORE
  3. DELETE row

After

None of the above are called - the custom data transfer is already done elsewhere in the process.

Note this does mean a change in behaviour from 'custom data inconsistently deleted from deleted contact' to 'custom data not deleted' - see technical details for more

Technical Details

The current custom data handling code does the following

  1. For normal single rows it first inserts a row. This has the impact of rendering the
    update that follows meaningless (this was an intentional change). It then deletes the row.
    Hence the upshot is simply that it deletes the row. A separate process transfers the custom
    data for the row. In other words we are engaging in 3 queries with a fairly high chance of
    causing deadlocks in order to just delete the row.

  2. For single rows where the entity reference refers to the merged contact the row is
    updated to refer to the merged contact (without the insert this works) and a further unnecessary delete follows

  3. For custom groups supporting multiple rows the rows are updated to have the new entity id. An unnecessary delete follows.

This change only affects the first of these. I would like to, in a future PR, change UPDATE IGNORE to just UPDATE &
remove the unnecessary delete - with more testing.

Note that this does include a slight change of behaviour. Currently if ANY fields in a custom group
are transferred from one contact to another during merge the row is deleted (with all the custom fields in it).
However, if no fields in a set are deleted then the row is not deleted.

This felt like it was a bit short on consistency. If has a potential advantage from a DB size point of view (any
deleting is better than none) but it also increases the number of locking queries in a process that is fairly
prone to cause DB locks. Based on these considerations I didn't think it worth re-adding code complexity to
retain inconsistent deletion.

A note on tests - I pre-added a bunch of tests into _api3_ContactTest to cover the 3 scenarios above.

Comments

@pfigel @lcdservices could you try to wrap you heads around this - it follows on from analysis that @lcdservices did in gitlab which I'll try to find & link now

@civibot
Copy link

civibot bot commented Apr 12, 2020

(Standard links)

@civibot civibot bot added the master label Apr 12, 2020
@@ -2471,6 +2471,7 @@ public static function appendCustomContactReferenceFields(&$cidRefs) {
$fields = civicrm_api3('CustomField', 'get', [
'return' => ['column_name', 'custom_group_id.table_name'],
'data_type' => 'ContactReference',
'options' => ['limit' => 0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, nasty APIv3 gotcha. Good catch.

The  current  custom data handling code does the  following

1) For normal single rows it first inserts a row. This has the impact of rendering the
update that follows meaningless (this was an intentional change). It then deletes the row.
Hence the upshot is simply that it deletes the row. A separate process transfers the custom
data for the row. In other words we are engaging in 3 queries with a fairly high chance of
causing deadlocks in order to just delete the row.

2) For single rows where the entity reference refers to the merged contact the row is
updated to refer to the merged contact (without the insert this works) and a further unnecessary delete follows

3) For custom groups supporting multiple rows the rows are updated to have the new entity id. An unnecessary delete follows.

This change only affects the first of these. I would like to, in a future PR, change UPDATE IGNORE to just UPDATE &
remove the unnecessary delete - with more testing.

Note that this does include a slight change of behaviour. Currently if ANY fields in a custom group
are transferred from one contact to another during merge the row is deleted (with all the custom fields in it).
However, if no fields in a set are deleted then the row is not deleted.

This felt like it was a bit short on consistency. If has a potential advantage from a DB size point of view (any
deleting is better than none) but it also increases the number of locking queries in a process that is fairly
prone to cause DB locks. Based on these considerations I didn't think it worth re-adding code complexity to
retain inconsistent deletion.

A note on tests - I pre-added a bunch of tests into _api3_ContactTest to cover the 3 scenarios above.
@eileenmcnaughton
Copy link
Contributor Author

Test failure was related - it was an edge-case test wisely added by @pfigel - should pass now

Copy link
Contributor

@pfigel pfigel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice improvement. Overall, merging is still quite convoluted, so it's hard to reason about, but based on r-run, existing test coverage and improved code quality, this should work. 👍

  • General standards
    • (r-explain) PASS
    • (r-user) PASS: Practically no user-visible impact for merge functionality as such, other than the change in behaviour for deleted entities (which is very edge-casey).
    • (r-doc) PASS
    • (r-run) PASS: I focused on two things during r-run - no issues found:
      • General merge behaviour - do (custom) fields still get moved, do multi-row custom fields work, are contact references updated to point to the "surviving" contact
      • Some light integration testing with XDedupe
  • Developer standards
    • (r-tech) PASS: No real contract changes. Minor change in behaviour for an edge-case that was already weird. Likely no impact on extensions.
    • (r-code) PASS
    • (r-maint) PASS
    • (r-test) PASS: Existing test coverage was fairly good & was improved in advance of this.

@colemanw colemanw merged commit 8ff77d8 into civicrm:master Apr 13, 2020
@eileenmcnaughton eileenmcnaughton deleted the dupefix branch April 14, 2020 01:26
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Apr 14, 2020
This turns out to have been a missing piece from
civicrm#17060 as revealed from
civicrm#17072
eileenmcnaughton added a commit to eileenmcnaughton/civicrm-core that referenced this pull request Apr 14, 2020
This turns out to have been a missing piece from
civicrm#17060 as revealed from
civicrm#17072
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants