New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds meta flag 'skip_diff' to enable skipping of diff operations #1045
Adds meta flag 'skip_diff' to enable skipping of diff operations #1045
Conversation
75e024d
to
68cd4dd
Compare
import_export/resources.py
Outdated
@@ -520,7 +535,7 @@ def import_row(self, row, instance_loader, using_transactions=True, dry_run=Fals | |||
# validate_instance(), where they can be combined with model | |||
# instance validation errors if necessary | |||
import_validation_errors = e.update_error_dict(import_validation_errors) | |||
if self.skip_row(instance, original): | |||
if not skip_diff and self.skip_row(instance, original): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matthewhegarty I don't understand why you are adding not skip_diff
here, can you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an optimisation because at this point (if skip_diff
is True), the original
var is not set.
If we allow skip_row()
to run then (assuming skip_unchanged
is True) it will iterate over all model fields, and throw AttributeError
for each when it tries to read a value from a None
reference.
I could potentially move the if not skip_diff
check to inside skip_row()
which might be a bit cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now see what you mean, thanks.
So the problem is that setting skip_unchanged
to True
is not compatible with settings skip_diff
to True
because we need original
to diff it, which is not what you want because you asked to skip diffs operations.
I'd say that we should check about this situation early, like in the __init__
method and fail with a descriptive message when both options are set to True
.
It would be good to know what you and others think about this solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add an __init__
check if required.
However, I think since users have to opt in to skip_unchanged
, I think it is ok to add a check in skip_row()
:
if not self._meta.skip_unchanged or self._meta.skip_diff:
return False
(I'll then remove the other check)
I will also document in skip_row()
why this check is present and why the two flags are not compatible.
The other option is to always set original
, but I don't like this because it will remove some of the performance improvement.
Thanks for reviewing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your proposal for skip_row()
is a good solution, developers using both variables set to True
should be able to understand the behaviour and identify the solution.
Let's wait for others to comment before you do any modification about this.
Thanks for the PR, I think it is a good first step into optimising import speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR as discussed and added an extra test.
68cd4dd
to
416c367
Compare
716338c
to
f6a0da8
Compare
f6a0da8
to
dd342f5
Compare
@manelclos You kindly reviewed this at the end of last year. Please can we get a decision on merging to master? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I see no issue with merging this
Thanks @andrewgy8 and @manelclos |
Thank you! |
Problem
When importing a large number of rows (using this patch), disabling the diff operation results in ~30% improvement in import time.
Solution
This PR involves adding a meta attribute to the Resource, which gives the option of disabling diffing operations. The default value is
False
meaning existing behaviour is unchanged unless the flag is explicitly set toTrue
.Acceptance Criteria
There is a new TestCase in
test_resources.py
which uses patches to verify thatdiff
andcopy
functions are not called.The flag is documented in
resources.py
.