Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds meta flag 'skip_diff' to enable skipping of diff operations #1045

Merged
merged 3 commits into from Apr 23, 2020

Conversation

matthewhegarty
Copy link
Contributor

Problem

When importing a large number of rows (using this patch), disabling the diff operation results in ~30% improvement in import time.

Solution

This PR involves adding a meta attribute to the Resource, which gives the option of disabling diffing operations. The default value is False meaning existing behaviour is unchanged unless the flag is explicitly set to True.

Acceptance Criteria

There is a new TestCase in test_resources.py which uses patches to verify that diff and copy functions are not called.

The flag is documented in resources.py.

@coveralls
Copy link

coveralls commented Dec 1, 2019

Coverage Status

Coverage increased (+0.1%) to 96.243% when pulling dd342f5 on matthewhegarty:skip-diff into 7c58899 on django-import-export:master.

@@ -520,7 +535,7 @@ def import_row(self, row, instance_loader, using_transactions=True, dry_run=Fals
# validate_instance(), where they can be combined with model
# instance validation errors if necessary
import_validation_errors = e.update_error_dict(import_validation_errors)
if self.skip_row(instance, original):
if not skip_diff and self.skip_row(instance, original):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewhegarty I don't understand why you are adding not skip_diff here, can you please explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optimisation because at this point (if skip_diff is True), the original var is not set.

If we allow skip_row() to run then (assuming skip_unchanged is True) it will iterate over all model fields, and throw AttributeError for each when it tries to read a value from a None reference.

I could potentially move the if not skip_diff check to inside skip_row() which might be a bit cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now see what you mean, thanks.

So the problem is that setting skip_unchanged to True is not compatible with settings skip_diff to True because we need original to diff it, which is not what you want because you asked to skip diffs operations.

I'd say that we should check about this situation early, like in the __init__ method and fail with a descriptive message when both options are set to True.

It would be good to know what you and others think about this solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add an __init__ check if required.

However, I think since users have to opt in to skip_unchanged, I think it is ok to add a check in skip_row():

        if not self._meta.skip_unchanged or self._meta.skip_diff:
            return False

(I'll then remove the other check)

I will also document in skip_row() why this check is present and why the two flags are not compatible.

The other option is to always set original, but I don't like this because it will remove some of the performance improvement.

Thanks for reviewing!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your proposal for skip_row() is a good solution, developers using both variables set to True should be able to understand the behaviour and identify the solution.

Let's wait for others to comment before you do any modification about this.

Thanks for the PR, I think it is a good first step into optimising import speed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR as discussed and added an extra test.

tests/core/tests/test_resources.py Show resolved Hide resolved
tests/core/tests/test_resources.py Show resolved Hide resolved
tests/core/tests/test_resources.py Show resolved Hide resolved
@matthewhegarty
Copy link
Contributor Author

@manelclos You kindly reviewed this at the end of last year. Please can we get a decision on merging to master?

Copy link
Member

@andrewgy8 andrewgy8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I see no issue with merging this

@andrewgy8 andrewgy8 changed the title Adds meta flag 'skip_diff' to enable skipping of the diff operations Adds meta flag 'skip_diff' to enable skipping of diff operations Apr 23, 2020
@andrewgy8 andrewgy8 merged commit 24c99ef into django-import-export:master Apr 23, 2020
@matthewhegarty
Copy link
Contributor Author

Thanks @andrewgy8 and @manelclos

@matthewhegarty matthewhegarty deleted the skip-diff branch April 23, 2020 11:07
@andrewgy8
Copy link
Member

Thank you!

ZuluPro pushed a commit to ZuluPro/django-import-export that referenced this pull request Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants