Document best practice for concurrent writes (#1668)

* added documentation and helper method for concurrent writes * documentation update
django-import-export · Nov 3, 2023 · dc354ef · dc354ef
1 parent 55e42cc
commit dc354ef
Show file tree

Hide file tree

Showing 5 changed files with 56 additions and 4 deletions.
diff --git a/docs/_static/images/non-field-specific-validation-error.png b/docs/_static/images/non-field-specific-validation-error.png
diff --git a/docs/advanced_usage.rst b/docs/advanced_usage.rst
@@ -645,6 +645,44 @@ To hook in the import-export workflow, you can connect to ``post_import``,
         # model is the actual model instance which after export
         pass
 
+.. _concurrent-writes:
+
+Concurrent writes
+=================
+
+There is specific consideration required if your application allows concurrent writes to data during imports.
+
+For example, consider this scenario:
+
+#. An import process is run to import new books identified by title.
+#. The :meth:`~import_export.resources.Resource.get_or_init_instance` is called and identifies that there is no
+   existing book with this title, hence the import process will create it as a new record.
+#. At that exact moment, another process inserts a book with the same title.
+#. As the row import process completes, :meth:`~import_export.resources.Resource.save` is called and an error is thrown
+   because the book already exists in the database.
+
+By default, import-export does not prevent this situation from occurring, therefore you need to consider what processes
+might be modifying shared tables during imports, and how you can mitigate risks.  If your database enforces integrity,
+then you may get errors raised, if not then you may get duplicate data.
+
+Potential solutions are:
+
+* Use one of the :doc:`import workflow<import_workflow>` methods to lock a table during import if the database supports
+  it.
+
+  * This should only be done in exceptional cases because there will be a performance impact.
+  * You will need to release the lock both in normal workflow and if there are errors.
+
+* Override :meth:`~import_export.resources.Resource.do_instance_save` to perform a
+  `update_or_create() <https://docs.djangoproject.com/en/stable/ref/models/querysets/#update_or_create>`_.
+  This can ensure that data integrity is maintained if there is concurrent access.
+
+* Modify working practices so that there is no risk of concurrent writes. For example, you could schedule imports to
+  only run at night.
+
+This issue may be more prevalent if using :doc:`bulk imports<bulk_import>`.  This is because instances are held in
+memory for longer before being written in bulk, therefore there is potentially more risk of another process modifying
+an instance before it has been persisted.
 
 .. _admin-integration:
 

diff --git a/docs/bulk_import.rst b/docs/bulk_import.rst
@@ -35,9 +35,12 @@ Caveats
   the database for each row.  If this is an issue then create a subclass which caches ``get_queryset()`` results rather
   than reading for each invocation.
 
+* If there is the potential for concurrent writes to a table during a bulk operation, then you need to consider the
+  potential impact of this.  Refer to :ref:`concurrent-writes` for more information.
+
 For more information, please read the Django documentation on
-`bulk_create() <https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create>`_ and
-`bulk_update() <https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-update>`_.
+`bulk_create() <https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-create>`_ and
+`bulk_update() <https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update>`_.
 
 .. _performance_tuning
 Performance tuning
@@ -62,4 +65,4 @@ Consider the following if you need to improve the performance of imports.
 Testing
 =======
 
-Scripts are provided to enable testing and benchmarking of bulk imports.  See :ref:`testing:Bulk testing`.
+Scripts are provided to enable testing and benchmarking of bulk imports.  See :ref:`testing:Bulk testing`.
diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -17,6 +17,7 @@ Please refer to :doc:`release notes<release_notes>`.
 - Clarified ``skip_diff`` documentation (#1655)
 - Standardised interface of :meth:`~import_export.widgets.Widget.render` (#1657)
 - Improved documentation relating to validation on import (#1665)
+- Added :meth:`~import_export.resources.Resource.do_instance_save` helper method (#1668)
 - Refactored test_admin_integration: split into smaller test modules (#1662)
 - Refactored test_resources: split into smaller test modules (#1672)
 

diff --git a/import_export/resources.py b/import_export/resources.py
@@ -553,9 +553,19 @@ def save_instance(self, instance, is_create, row, **kwargs):
                 # we don't have transactions and we want to do a dry_run
                 pass
             else:
-                instance.save()
+                self.do_instance_save(instance)
         self.after_save_instance(instance, row, **kwargs)
 
+    def do_instance_save(self, instance):
+        """
+        A method specifically to provide a single overridable hook for the instance
+        save operation.
+        For example, this can be overridden to implement update_or_create().
+
+        :param instance: The model instance to be saved.
+        """
+        instance.save()
+
     def before_save_instance(self, instance, row, **kwargs):
         r"""
         Override to add additional logic. Does nothing by default.