Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] html_text: Do not test the specific exception #1149

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 5 additions & 9 deletions html_text/tests/test_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
# © 2016 Grupo ESOC Ingeniería de Servicios, S.L.U. - Jairo Llopis
# License AGPL-3.0 or later (http://www.gnu.org/licenses/agpl.html).

from lxml import etree
from openerp.tests.common import TransactionCase
from openerp.tools import mute_logger


class ExtractorCase(TransactionCase):
Expand Down Expand Up @@ -40,20 +40,16 @@ def test_excerpts(self):
self.text_from_html(html, 7, ellipsis=""),
u"I'm a title I'm a paragraph ¡Pues")

@mute_logger("openerp.addons.html_text.models.ir_fields_converter")
def test_empty_html(self):
"""Empty HTML handled correctly."""
self.assertEqual(self.text_from_html(""), "")
with self.assertRaises(etree.XMLSyntaxError):
with self.assertRaises(Exception):
self.text_from_html("", fail=True)

@mute_logger("openerp.addons.html_text.models.ir_fields_converter")
def test_false_html(self):
"""``False`` HTML handled correctly."""
self.assertEqual(self.text_from_html(False), "")
with self.assertRaises(TypeError):
with self.assertRaises(Exception):
self.text_from_html(False, fail=True)

def test_bad_html(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because on newer LXML versions, that code is considered "good" HTML (remember that those rules are less strict than XML ones) and thus it doesn't raise an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't better to provide a bad HTML?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess yes, but not sure on what is now considered bad exactly... I opted for this quicker solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we are reducing test scope. For sure multiple unclosed tags will be considered illegal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it seems now it eats anything:

>>> html.fromstring("<<bad></p><a=neneeñ/>").xpath("//text()")
['<']
>>> html.fromstring("<<bad></p><a=neneeñ<ñ>/>").xpath("//text()")
['<', '/>']
>>> html.fromstring("<<bad></p><a=neneeñ<ñ>/>{}&al").xpath("//text()")
['<', '/>{}&al']
>>> html.fromstring("<<bad></p><a=neneeñ<ñ>/>{}&al;").xpath("//text()")
['<', '/>{}&al;']
>>> html.fromstring("<<bad></p><a=neneeñ<ñ>/>{}&x00185;").xpath("//text()")
['<', '/>{}&x00185;']
>>> html.fromstring("<<bad></p><a=neneeñ<ñ>/>{}&nbsp;").xpath("//text()")
['<', '/>{}\xa0']

It's a good thing, since HTML parsing should never raise exceptions, but then yes, we have to reduce the test scope now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's accept that

"""Bad HTML handled correctly."""
self.assertEqual(self.text_from_html("<<bad>"), "")
with self.assertRaises(etree.ParserError):
self.text_from_html("<<bad>", fail=True)