Skip to content

Commit

Permalink
Document best practice for concurrent writes (#1668)
Browse files Browse the repository at this point in the history
* added documentation and helper method for concurrent writes

* documentation update
  • Loading branch information
matthewhegarty committed Nov 3, 2023
1 parent 55e42cc commit dc354ef
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 4 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 38 additions & 0 deletions docs/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -645,6 +645,44 @@ To hook in the import-export workflow, you can connect to ``post_import``,
# model is the actual model instance which after export
pass

.. _concurrent-writes:

Concurrent writes
=================

There is specific consideration required if your application allows concurrent writes to data during imports.

For example, consider this scenario:

#. An import process is run to import new books identified by title.
#. The :meth:`~import_export.resources.Resource.get_or_init_instance` is called and identifies that there is no
existing book with this title, hence the import process will create it as a new record.
#. At that exact moment, another process inserts a book with the same title.
#. As the row import process completes, :meth:`~import_export.resources.Resource.save` is called and an error is thrown
because the book already exists in the database.

By default, import-export does not prevent this situation from occurring, therefore you need to consider what processes
might be modifying shared tables during imports, and how you can mitigate risks. If your database enforces integrity,
then you may get errors raised, if not then you may get duplicate data.

Potential solutions are:

* Use one of the :doc:`import workflow<import_workflow>` methods to lock a table during import if the database supports
it.

* This should only be done in exceptional cases because there will be a performance impact.
* You will need to release the lock both in normal workflow and if there are errors.

* Override :meth:`~import_export.resources.Resource.do_instance_save` to perform a
`update_or_create() <https://docs.djangoproject.com/en/stable/ref/models/querysets/#update_or_create>`_.
This can ensure that data integrity is maintained if there is concurrent access.

* Modify working practices so that there is no risk of concurrent writes. For example, you could schedule imports to
only run at night.

This issue may be more prevalent if using :doc:`bulk imports<bulk_import>`. This is because instances are held in
memory for longer before being written in bulk, therefore there is potentially more risk of another process modifying
an instance before it has been persisted.

.. _admin-integration:

Expand Down
9 changes: 6 additions & 3 deletions docs/bulk_import.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,12 @@ Caveats
the database for each row. If this is an issue then create a subclass which caches ``get_queryset()`` results rather
than reading for each invocation.

* If there is the potential for concurrent writes to a table during a bulk operation, then you need to consider the
potential impact of this. Refer to :ref:`concurrent-writes` for more information.

For more information, please read the Django documentation on
`bulk_create() <https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create>`_ and
`bulk_update() <https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-update>`_.
`bulk_create() <https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-create>`_ and
`bulk_update() <https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update>`_.

.. _performance_tuning
Performance tuning
Expand All @@ -62,4 +65,4 @@ Consider the following if you need to improve the performance of imports.
Testing
=======

Scripts are provided to enable testing and benchmarking of bulk imports. See :ref:`testing:Bulk testing`.
Scripts are provided to enable testing and benchmarking of bulk imports. See :ref:`testing:Bulk testing`.
1 change: 1 addition & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Please refer to :doc:`release notes<release_notes>`.
- Clarified ``skip_diff`` documentation (#1655)
- Standardised interface of :meth:`~import_export.widgets.Widget.render` (#1657)
- Improved documentation relating to validation on import (#1665)
- Added :meth:`~import_export.resources.Resource.do_instance_save` helper method (#1668)
- Refactored test_admin_integration: split into smaller test modules (#1662)
- Refactored test_resources: split into smaller test modules (#1672)

Expand Down
12 changes: 11 additions & 1 deletion import_export/resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -553,9 +553,19 @@ def save_instance(self, instance, is_create, row, **kwargs):
# we don't have transactions and we want to do a dry_run
pass
else:
instance.save()
self.do_instance_save(instance)
self.after_save_instance(instance, row, **kwargs)

def do_instance_save(self, instance):
"""
A method specifically to provide a single overridable hook for the instance
save operation.
For example, this can be overridden to implement update_or_create().
:param instance: The model instance to be saved.
"""
instance.save()

def before_save_instance(self, instance, row, **kwargs):
r"""
Override to add additional logic. Does nothing by default.
Expand Down

0 comments on commit dc354ef

Please sign in to comment.