[1.11.x] Fixed CVE-2019-14233 -- Prevented excessive HTMLParser recur…

…sion in strip_tags() when handling incomplete HTML entities.

Thanks to Guido Vranken for initial report.
apollo13 authored and carltongibson committed Jul 15, 2019
@@ -169,8 +169,8 @@ def strip_tags(value):
value = force_text(value)
while '<' in value and '>' in value:
new_value = _strip_once(value)
if len(new_value) >= len(value):
# _strip_once was not able to detect more tags or length increased
if len(new_value) >= len(value) or value.count('<') == new_value.count('<'):
# _strip_once wasn't able to detect more tags, or line length increased.
# due to
# (affects Python 2 < 2.7.7 and Python 3 < 3.3.5)
@@ -19,3 +19,20 @@ filters, which were thus vulnerable.
The regular expressions used by ``Truncator`` have been simplified in order to
avoid potential backtracking issues. As a consequence, trailing punctuation may
now at times be included in the truncated output.

CVE-2019-14233: Denial-of-service possibility in ``strip_tags()``

Due to the behavior of the underlying ``HTMLParser``,
:func:`django.utils.html.strip_tags` would be extremely slow to evaluate
certain inputs containing large sequences of nested incomplete HTML entities.
The ``strip_tags()`` method is used to implement the corresponding
:tfilter:`striptags` template filter, which was thus also vulnerable.

``strip_tags()`` now avoids recursive calls to ``HTMLParser`` when progress
removing tags, but necessarily incomplete HTML entities, stops being made.

Remember that absolutely NO guarantee is provided about the results of
``strip_tags()`` being HTML safe. So NEVER mark safe the result of a
``strip_tags()`` call without escaping it first, for example with
@@ -86,6 +86,8 @@ def test_strip_tags(self):
# caused infinite loop on Pythons not patched with
('&gotcha&#;<>', '&gotcha&#;<>'),
('><!' + ('&' * 16000) + 'D', '><!' + ('&' * 16000) + 'D'),
('X<<<<br>br>br>br>X', 'XX'),
for value, output in items:
self.check_output(f, value, output)

