Adds meta flag 'skip_diff' to enable skipping of diff operations #1045

matthewhegarty · 2019-11-30T16:16:05Z

Problem

When importing a large number of rows (using this patch), disabling the diff operation results in ~30% improvement in import time.

Solution

This PR involves adding a meta attribute to the Resource, which gives the option of disabling diffing operations. The default value is False meaning existing behaviour is unchanged unless the flag is explicitly set to True.

Acceptance Criteria

There is a new TestCase in test_resources.py which uses patches to verify that diff and copy functions are not called.

The flag is documented in resources.py.

coveralls · 2019-12-01T11:34:33Z

Coverage increased (+0.1%) to 96.243% when pulling dd342f5 on matthewhegarty:skip-diff into 7c58899 on django-import-export:master.

manelclos · 2019-12-05T20:44:55Z

import_export/resources.py

@@ -520,7 +535,7 @@ def import_row(self, row, instance_loader, using_transactions=True, dry_run=Fals
                    # validate_instance(), where they can be combined with model
                    # instance validation errors if necessary
                    import_validation_errors = e.update_error_dict(import_validation_errors)
-                if self.skip_row(instance, original):
+                if not skip_diff and self.skip_row(instance, original):


@matthewhegarty I don't understand why you are adding not skip_diff here, can you please explain?

This is an optimisation because at this point (if skip_diff is True), the original var is not set.

If we allow skip_row() to run then (assuming skip_unchanged is True) it will iterate over all model fields, and throw AttributeError for each when it tries to read a value from a None reference.

I could potentially move the if not skip_diff check to inside skip_row() which might be a bit cleaner.

I now see what you mean, thanks.

So the problem is that setting skip_unchanged to True is not compatible with settings skip_diff to True because we need original to diff it, which is not what you want because you asked to skip diffs operations.

I'd say that we should check about this situation early, like in the __init__ method and fail with a descriptive message when both options are set to True.

It would be good to know what you and others think about this solution.

I can add an __init__ check if required.

However, I think since users have to opt in to skip_unchanged, I think it is ok to add a check in skip_row():

if not self._meta.skip_unchanged or self._meta.skip_diff: return False

(I'll then remove the other check)

I will also document in skip_row() why this check is present and why the two flags are not compatible.

The other option is to always set original, but I don't like this because it will remove some of the performance improvement.

Thanks for reviewing!

I think your proposal for skip_row() is a good solution, developers using both variables set to True should be able to understand the behaviour and identify the solution.

Let's wait for others to comment before you do any modification about this.

Thanks for the PR, I think it is a good first step into optimising import speed.

I've updated the PR as discussed and added an extra test.

tests/core/tests/test_resources.py

matthewhegarty · 2020-04-22T20:41:02Z

@manelclos You kindly reviewed this at the end of last year. Please can we get a decision on merging to master?

andrewgy8

LGTM. I see no issue with merging this

matthewhegarty · 2020-04-23T11:06:06Z

Thanks @andrewgy8 and @manelclos

andrewgy8 · 2020-04-23T11:08:43Z

Thank you!

…ngo-import-export#1045)

matthewhegarty force-pushed the skip-diff branch from 75e024d to 68cd4dd Compare December 1, 2019 11:21

manelclos reviewed Dec 5, 2019

View reviewed changes

matthewhegarty force-pushed the skip-diff branch from 68cd4dd to 416c367 Compare December 6, 2019 10:20

matthewhegarty force-pushed the skip-diff branch from 716338c to f6a0da8 Compare December 27, 2019 17:29

matthewhegarty added 3 commits April 22, 2020 20:31

adds meta attribute flag 'skip_diff'

0d04929

reformatted code to match Django style guidelines

36f9ca3

added explicit check for skip_diff inside skip_row()

dd342f5

matthewhegarty force-pushed the skip-diff branch from f6a0da8 to dd342f5 Compare April 22, 2020 19:32

andrewgy8 approved these changes Apr 23, 2020

View reviewed changes

andrewgy8 changed the title ~~Adds meta flag 'skip_diff' to enable skipping of the diff operations~~ Adds meta flag 'skip_diff' to enable skipping of diff operations Apr 23, 2020

andrewgy8 merged commit 24c99ef into django-import-export:master Apr 23, 2020

matthewhegarty deleted the skip-diff branch April 23, 2020 11:07

ZuluPro pushed a commit to ZuluPro/django-import-export that referenced this pull request Dec 23, 2020

Adds meta flag 'skip_diff' to enable skipping of diff operations (dja…

e68ad2d

…ngo-import-export#1045)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds meta flag 'skip_diff' to enable skipping of diff operations #1045

Adds meta flag 'skip_diff' to enable skipping of diff operations #1045

matthewhegarty commented Nov 30, 2019

coveralls commented Dec 1, 2019 •

edited

manelclos Dec 5, 2019

matthewhegarty Dec 6, 2019 •

edited

manelclos Dec 6, 2019

matthewhegarty Dec 6, 2019

manelclos Dec 6, 2019

matthewhegarty Dec 27, 2019

matthewhegarty commented Apr 22, 2020

andrewgy8 left a comment

matthewhegarty commented Apr 23, 2020

andrewgy8 commented Apr 23, 2020

Adds meta flag 'skip_diff' to enable skipping of diff operations #1045

Adds meta flag 'skip_diff' to enable skipping of diff operations #1045

Conversation

matthewhegarty commented Nov 30, 2019

coveralls commented Dec 1, 2019 • edited

manelclos Dec 5, 2019

Choose a reason for hiding this comment

matthewhegarty Dec 6, 2019 • edited

Choose a reason for hiding this comment

manelclos Dec 6, 2019

Choose a reason for hiding this comment

matthewhegarty Dec 6, 2019

Choose a reason for hiding this comment

manelclos Dec 6, 2019

Choose a reason for hiding this comment

matthewhegarty Dec 27, 2019

Choose a reason for hiding this comment

matthewhegarty commented Apr 22, 2020

andrewgy8 left a comment

Choose a reason for hiding this comment

matthewhegarty commented Apr 23, 2020

andrewgy8 commented Apr 23, 2020

coveralls commented Dec 1, 2019 •

edited

matthewhegarty Dec 6, 2019 •

edited