Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix escaping to support escape of XLSX formulae #1568

Merged
56 changes: 56 additions & 0 deletions docs/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -743,3 +743,59 @@ return books for the publisher::

class Meta:
model = Book

.. _admin_security:

Security
--------

Enabling the Admin interface means that you should consider the security implications. Some or all of the following
points may be relevant:

Is there potential for untrusted imports?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* What is the source of your import file?

* Is this coming from an external source where the data could be untrusted?

* Could source data potentially contain malicious content such as script directives or Excel formulae?

* Even if data comes from a trusted source, is there any content such as HTML which could cause issues when rendered
in a web page?

What is the potential risk for exported data?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* If there is malicious content in stored data, what is the risk of exporting this data?

* Could untrusted input be executed within a spreadsheet?

* Are spreadsheets sent to other parties who could inadvertently execute malicious content?

* Could data be exported to other formats, such as CSV, TSV or ODS, and then opened using Excel?

* Could any exported data be rendered in HTML? For example, csv is exported and then loaded into another
web application. In this case, untrusted input could contain malicious code such as active script content.

Mitigating security risks
^^^^^^^^^^^^^^^^^^^^^^^^^

By default, import-export does not sanitize or process imported data. Malicious content, such as script directives,
can be imported into the database, and can be exported without any modification.

You can optionally configure import-export to sanitize data on export. There are two settings which enable this:

#. :ref:`IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT`
#. :ref:`IMPORT_EXPORT_ESCAPE_FORMULAE_ON_EXPORT`

.. warning::

Enabling these settings only sanitizes data exported using the Admin Interface.
If exporting data :ref:`programmatically<exporting_data>`, then you will need to apply your own sanitization.

You should in all cases review `Django security documentation <https://docs.djangoproject.com/en/dev/topics/security/>`_
before deploying a live Admin interface instance.

Please refer to `SECURITY.md <https://github.com/django-import-export/django-import-export/blob/main/SECURITY.md>`_ for
details on how to escalate security issues.
14 changes: 12 additions & 2 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@ Changelog
3.2.0 (unreleased)
------------------

- Escape formulae on export to XLSX (#1568)

- This includes deprecation of :ref:`IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT`.

Refer to :ref:`installation` for alternatives.

- :meth:`import_export.formats.TablibFormat.export()`: ``escape_output`` flag now deprecated in favour of
``escape_html`` and ``escape_formulae``.

- Refactor methods so that ``args`` are declared correctly (#1566)

- This includes deprecations to be aware of if you have overridden :meth:`~import_export.resources.Resource.export`
Expand All @@ -15,11 +24,12 @@ Changelog
passed as a named parameter.

- Updated ``setup.py`` (#1564)
- Added ``SECURITY.md``
- updated FAQ to include workaround for `RelatedObjectDoesNotExist` exception (#1562)
- Added ``SECURITY.md`` (#1563)
- Updated FAQ to include workaround for `RelatedObjectDoesNotExist` exception (#1562)
- Prevent error comparing m2m field of the new objects (#1560)
- Add documentation for passing data from admin form to Resource (#1555)
- Added new translations to Spanish and Spanish (Argentina) (#1552)
- Pass kwargs to import_set function (#1448)

3.1.0 (2023-02-21)
------------------
Expand Down
7 changes: 7 additions & 0 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ that have their column ``delete`` set to ``1``::
model = Book


.. _exporting_data:

Exporting data
==============

Expand All @@ -141,3 +143,8 @@ we can export books::
id,name,author,author_email,imported,published,price,categories
2,Some book,1,,0,2012-12-05,8.85,1

.. warning::

Data exported programmatically is not sanitized for malicious content.
You will need to understand the implications of this and handle accordingly.
See :ref:`admin_security`.
13 changes: 12 additions & 1 deletion docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,13 +121,24 @@ does not support transactions), then validation errors will still be presented t
but valid rows will have imported.

``IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If set to ``True``, export output will be sanitized. By default this is set to ``False``.

Note: currently this only works for ``HTML`` output, and only for exports done via the admin UI.

This setting is deprecated and will be replaced by ``IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT``.

``IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If set to ``True``, strings will be HTML escaped. By default this is ``False``.

``IMPORT_EXPORT_ESCAPE_FORMULAE_ON_EXPORT``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If set to ``True``, strings will be sanitized by removing any leading '=' character. This is to prevent execution of
Excel formulae. By default this is ``False``.

.. _exampleapp:

Expand Down
5 changes: 4 additions & 1 deletion import_export/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -679,7 +679,10 @@ def get_export_data(self, file_format, queryset, *args, **kwargs):
raise PermissionDenied

data = self.get_data_for_export(request, queryset, *args, **kwargs)
export_data = file_format.export_data(data, escape_output=self.should_escape_output)
export_data = file_format.export_data(data,
escape_output=self.should_escape_output,
escape_html=self.should_escape_html,
escape_formulae=self.should_escape_formulae)
encoding = kwargs.get("encoding")
if not file_format.is_binary() and encoding:
export_data = export_data.encode(encoding)
Expand Down
37 changes: 30 additions & 7 deletions import_export/formats/base_formats.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import html
import warnings

import tablib
from tablib.formats import registry
Expand Down Expand Up @@ -84,7 +85,13 @@ def get_title(self):
def create_dataset(self, in_stream, **kwargs):
return tablib.import_set(in_stream, format=self.get_title())

def export_data(self, dataset, escape_output=False, **kwargs):
def export_data(self, dataset, **kwargs):
# remove the deprecated `escape_output` param if present
kwargs.pop("escape_output", None)
if kwargs.pop("escape_html", None):
self._escape_html(dataset)
if kwargs.pop("escape_formulae", None):
self._escape_formulae(dataset)
matthewhegarty marked this conversation as resolved.
Show resolved Hide resolved
return dataset.export(self.get_title(), **kwargs)

def get_extension(self):
Expand All @@ -99,6 +106,21 @@ def can_import(self):
def can_export(self):
return hasattr(self.get_format(), 'export_set')

def _escape_html(self, dataset):
for _ in dataset:
row = dataset.lpop()
row = [html.escape(str(cell)) for cell in row]
matthewhegarty marked this conversation as resolved.
Show resolved Hide resolved
dataset.append(row)

def _escape_formulae(self, dataset):
def _do_escape(s):
return s.replace("=", "", 1) if s.startswith("=") else s

for _ in dataset:
row = dataset.lpop()
row = [_do_escape(str(cell)) for cell in row]
dataset.append(row)


class TextFormat(TablibFormat):

Expand Down Expand Up @@ -149,11 +171,13 @@ class HTML(TextFormat):

def export_data(self, dataset, escape_output=False, **kwargs):
if escape_output:
for _ in dataset:
row = dataset.lpop()
row = [html.escape(str(cell)) for cell in row]
dataset.append(row)
return dataset.export(self.get_title(), **kwargs)
warnings.warn(
"escape_output flag now deprecated - "
"this will be removed in a future release",
DeprecationWarning,
)
super()._escape_html(dataset)
return super().export_data(dataset, **kwargs)


class XLS(TablibFormat):
Expand Down Expand Up @@ -202,7 +226,6 @@ def create_dataset(self, in_stream):
dataset.append(row_values)
return dataset


#: These are the default formats for import and export. Whether they can be
#: used or not is depending on their implementation in the tablib library.
DEFAULT_FORMATS = [fmt for fmt in (
Expand Down
25 changes: 25 additions & 0 deletions import_export/mixins.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import logging
import warnings

from django.conf import settings
Expand All @@ -10,6 +11,8 @@
from .resources import modelresource_factory
from .signals import post_export

logger = logging.getLogger(__name__)


class BaseImportExportMixin:
formats = base_formats.DEFAULT_FORMATS
Expand Down Expand Up @@ -91,11 +94,33 @@ def choose_import_resource_class(self, form):
class BaseExportMixin(BaseImportExportMixin):
model = None
escape_exported_data = False
escape_html = False
escape_formulae = False

@property
def should_escape_output(self):
if hasattr(settings, 'IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT'):
warnings.warn(
"IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT will be deprecated in a future release. "
"Refer to docs for new attributes.",
DeprecationWarning,
)
return getattr(settings, 'IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT', self.escape_exported_data)

@property
def should_escape_html(self):
v = getattr(settings, 'IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT', self.escape_html)
if v is True:
logger.debug('IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT is enabled')
return v

@property
def should_escape_formulae(self):
v = getattr(settings, 'IMPORT_EXPORT_ESCAPE_FORMULAE_ON_EXPORT', self.escape_formulae)
if v is True:
logger.debug('IMPORT_EXPORT_ESCAPE_FORMULAE_ON_EXPORT is enabled')
return v

def get_export_formats(self):
"""
Returns available export formats.
Expand Down
2 changes: 1 addition & 1 deletion requirements/base.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Django>=3.2
tablib[html,ods,xls,xlsx,yaml]>=3.2.1
tablib[html,ods,xls,xlsx,yaml]>=3.4.0
diff-match-patch
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
install_requires = [
'diff-match-patch',
'Django>=3.2',
'tablib[html,ods,xls,xlsx,yaml]>=3.2.1',
'tablib[html,ods,xls,xlsx,yaml]>=3.4.0',
]


Expand Down
54 changes: 54 additions & 0 deletions tests/core/tests/test_admin_integration.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import os.path
import warnings
from datetime import datetime
from io import BytesIO
from unittest import mock
from unittest.mock import MagicMock, patch

Expand All @@ -16,6 +17,7 @@
from django.test.testcases import TestCase, TransactionTestCase
from django.test.utils import override_settings
from django.utils.translation import gettext_lazy as _
from openpyxl.reader.excel import load_workbook
from tablib import Dataset

from import_export import formats
Expand Down Expand Up @@ -433,6 +435,58 @@ def test_returns_xlsx_export(self):
self.assertEqual(response['Content-Type'],
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')

@override_settings(IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT=True)
@patch("import_export.mixins.logger")
def test_export_escape_html(self, mock_logger):
Book.objects.create(id=1, name="=SUM(1+1)")
Book.objects.create(id=2, name="<script>alert(1)</script>")
response = self.client.get('/admin/core/book/export/')
self.assertEqual(response.status_code, 200)

xlsx_index = self._get_input_format_index("xlsx")
data = {'file_format': str(xlsx_index)}
response = self.client.post('/admin/core/book/export/', data)
self.assertEqual(response.status_code, 200)
content = response.content
wb = load_workbook(filename=BytesIO(content))
self.assertEqual('&lt;script&gt;alert(1)&lt;/script&gt;', wb.active['B2'].value)
self.assertEqual('=SUM(1+1)', wb.active['B3'].value)

mock_logger.debug.assert_called_once_with("IMPORT_EXPORT_ESCAPE_HTML_ON_EXPORT is enabled")

@override_settings(IMPORT_EXPORT_ESCAPE_FORMULAE_ON_EXPORT=True)
@patch("import_export.mixins.logger")
def test_export_escape_formulae(self, mock_logger):
Book.objects.create(id=1, name="=SUM(1+1)")
Book.objects.create(id=2, name="<script>alert(1)</script>")
response = self.client.get('/admin/core/book/export/')
self.assertEqual(response.status_code, 200)

xlsx_index = self._get_input_format_index("xlsx")
data = {'file_format': str(xlsx_index)}
response = self.client.post('/admin/core/book/export/', data)
self.assertEqual(response.status_code, 200)
content = response.content
wb = load_workbook(filename=BytesIO(content))
self.assertEqual('<script>alert(1)</script>', wb.active['B2'].value)
self.assertEqual('SUM(1+1)', wb.active['B3'].value)

mock_logger.debug.assert_called_once_with("IMPORT_EXPORT_ESCAPE_FORMULAE_ON_EXPORT is enabled")

@override_settings(IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT=True)
def test_export_escape_deprecation_warning(self):
response = self.client.get('/admin/core/book/export/')
self.assertEqual(response.status_code, 200)

xlsx_index = self._get_input_format_index("xlsx")
data = {'file_format': str(xlsx_index)}
with self.assertWarnsRegex(
DeprecationWarning,
r"IMPORT_EXPORT_ESCAPE_OUTPUT_ON_EXPORT will be deprecated in a future release. "
r"Refer to docs for new attributes."
):
self.client.post('/admin/core/book/export/', data)

def test_import_export_buttons_visible_without_add_permission(self):
# issue 38 - Export button not visible when no add permission
original = BookAdmin.has_add_permission
Expand Down
25 changes: 23 additions & 2 deletions tests/core/tests/test_base_formats.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,28 @@ def setUp(self):
self.dataset.append(('2', 'evil_user', '<script>alert("I want to steal your credit card data")</script>'))

def test_export_data_escape(self):
res = self.format.export_data(self.dataset, escape_output=True)
# deprecated
# this test can be removed in a future release because the param
# is replaced by `escape_html`
with self.assertWarnsRegex(
DeprecationWarning,
"escape_output flag now deprecated - this will be removed in a future release"
):
res = self.format.export_data(self.dataset, escape_output=True)
self.assertIn(
(
"<tr><td>1</td>\n"
"<td>good_user</td>\n"
"<td>John Doe</td></tr>\n"
"<tr><td>2</td>\n"
"<td>evil_user</td>\n"
"<td>&lt;script&gt;alert(&quot;I want to steal your credit card data&quot;)&lt;/script&gt;</td></tr>\n"
),
res
)

def test_export_html_escape(self):
res = self.format.export_data(self.dataset, escape_html=True)
self.assertIn(
(
"<tr><td>1</td>\n"
Expand All @@ -241,4 +262,4 @@ def test_export_data_no_escape(self):
"<td><script>alert(\"I want to steal your credit card data\")</script></td></tr>\n"
),
res
)
)