Newest html5lib 0.999999999 breaks rendering #334

Closed
OktarinTentakel opened this Issue Jul 15, 2016 · 9 comments

Comments

Projects
None yet
5 participants
@OktarinTentakel

OktarinTentakel commented Jul 15, 2016

Tested on OS X according to install instructions with Python 2.7 and 3.4

Every rendering exits with:

Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/wsgiref/handlers.py", line 85, in run
    self.result = application(self.environ, self.start_response)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/weasyprint/navigator.py", line 143, in app
    return make_response(render_template(url))
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/weasyprint/navigator.py", line 65, in render_template
    html = HTML(url)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/weasyprint/__init__.py", line 92, in __init__
    namespaceHTMLElements=False)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/html5parser.py", line 35, in parse
    return p.parse(doc, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/html5parser.py", line 235, in parse
    self._parse(stream, False, None, *args, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/html5parser.py", line 85, in _parse
    self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/_tokenizer.py", line 36, in __init__
    self.stream = HTMLInputStream(stream, **kwargs)
  File "/Users/aw-sebastianschlapkohl/own-projects/weasyprint-test/venv/lib/python2.7/site-packages/html5lib/_inputstream.py", line 151, in HTMLInputStream
    return HTMLBinaryInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'encoding'

Going back two versions (seven 9s after the dot oO) fixes this.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jul 15, 2016

Member

This might be

https://github.com/html5lib/html5lib-python/blob/master/CHANGES.rst#user-content-09999999910b9

Replace the charset keyword argument on parse and related methods with a set of keyword arguments: override_encoding, transport_encoding, same_origin_parent_encoding, likely_encoding, and default_encoding.

Although that says charset an the backtrack says encoding.

Anyway, I think WeasyPrint should be updated for the newer html5lib and the requirements in setup.py changed accordingly. CC @liZe

Member

SimonSapin commented Jul 15, 2016

This might be

https://github.com/html5lib/html5lib-python/blob/master/CHANGES.rst#user-content-09999999910b9

Replace the charset keyword argument on parse and related methods with a set of keyword arguments: override_encoding, transport_encoding, same_origin_parent_encoding, likely_encoding, and default_encoding.

Although that says charset an the backtrack says encoding.

Anyway, I think WeasyPrint should be updated for the newer html5lib and the requirements in setup.py changed accordingly. CC @liZe

@liZe liZe closed this in f1019b8 Jul 15, 2016

@liZe

This comment has been minimized.

Show comment
Hide comment
@liZe

liZe Jul 15, 2016

Member

The argument was encoding, and we now have to pick one from (override | transport | same_origin_parent | likely | default)_encoding. We should read the documentation, but, well, there's no documentation, of course. So, it's default_encoding now, and if it breaks anything in WeasyPrint, we'll randomly change that 😉.

Member

liZe commented Jul 15, 2016

The argument was encoding, and we now have to pick one from (override | transport | same_origin_parent | likely | default)_encoding. We should read the documentation, but, well, there's no documentation, of course. So, it's default_encoding now, and if it breaks anything in WeasyPrint, we'll randomly change that 😉.

@liZe liZe added the bug label Jul 15, 2016

@liZe liZe referenced this issue in Kozea/Flask-WeasyPrint Jul 15, 2016

Closed

__init__() got an unexpected keyword argument 'encoding' #12

@liZe liZe added the crash label Jul 15, 2016

@liZe

This comment has been minimized.

Show comment
Hide comment
@liZe

liZe Jul 15, 2016

Member

It doesn't work, because it crashes when input is unicode (we can't blame them for that).

Member

liZe commented Jul 15, 2016

It doesn't work, because it crashes when input is unicode (we can't blame them for that).

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jul 15, 2016

Member

The terminology is based on https://html.spec.whatwg.org/multipage/#determining-the-character-encoding

  • override_encoding is probably appropriate for the encoding parameter of WeasyPrint’s HTML class.
  • transport_encoding is the one from HTTP Content-Type. (The encoding key in "URL fetchers" return values.)
  • same_origin_parent refers to parent is the sense that an <iframe> element can introduced a child HTML document. This can be ignored since we don’t implement <iframe>. Or the cross-origin policy.
  • likely_encoding, the spec says “if the user agent has information on the likely encoding for this page”. Safe to declare we don’t.
  • default_encoding, the spec suggests UTF-8 in "controlled environment", but I don’t think WeasyPrint can make that kind of assumption about how it’s used. Then “In other environments, the default encoding is typically dependent on the user's locale” which is kinda terrible since it can lead to “works on my machine” sites broken for users with a different locale. But that’s what many browsers do today. I’ve heard of ideas/experiments to use the site’s top-level-domain instead but I don’t know where that’s at. Let’s not bother with any of this.

So I think we should override and transport instead of default.

@gsnedders How does this sound?

Member

SimonSapin commented Jul 15, 2016

The terminology is based on https://html.spec.whatwg.org/multipage/#determining-the-character-encoding

  • override_encoding is probably appropriate for the encoding parameter of WeasyPrint’s HTML class.
  • transport_encoding is the one from HTTP Content-Type. (The encoding key in "URL fetchers" return values.)
  • same_origin_parent refers to parent is the sense that an <iframe> element can introduced a child HTML document. This can be ignored since we don’t implement <iframe>. Or the cross-origin policy.
  • likely_encoding, the spec says “if the user agent has information on the likely encoding for this page”. Safe to declare we don’t.
  • default_encoding, the spec suggests UTF-8 in "controlled environment", but I don’t think WeasyPrint can make that kind of assumption about how it’s used. Then “In other environments, the default encoding is typically dependent on the user's locale” which is kinda terrible since it can lead to “works on my machine” sites broken for users with a different locale. But that’s what many browsers do today. I’ve heard of ideas/experiments to use the site’s top-level-domain instead but I don’t know where that’s at. Let’s not bother with any of this.

So I think we should override and transport instead of default.

@gsnedders How does this sound?

liZe added a commit that referenced this issue Jul 15, 2016

@liZe

This comment has been minimized.

Show comment
Hide comment
@liZe

liZe Jul 15, 2016

Member

I've used override for our encoding parameter and transport for our protocol (given by the URL fetcher). Tests pass, but I'm not really sure…

Member

liZe commented Jul 15, 2016

I've used override for our encoding parameter and transport for our protocol (given by the URL fetcher). Tests pass, but I'm not really sure…

@gsnedders

This comment has been minimized.

Show comment
Hide comment
@gsnedders

gsnedders Jul 15, 2016

@SimonSapin that sounds right

And yes, @liZe, docs are one of the two big things blocking 1.0 now…

@SimonSapin that sounds right

And yes, @liZe, docs are one of the two big things blocking 1.0 now…

@liZe

This comment has been minimized.

Show comment
Hide comment
@liZe

liZe Jul 15, 2016

Member

@gsnedders No offense, that's a problem for other projects too 😉. Thanks for your hard work!

Member

liZe commented Jul 15, 2016

@gsnedders No offense, that's a problem for other projects too 😉. Thanks for your hard work!

@opoudjis opoudjis referenced this issue in t3nsor/quora-backup Sep 12, 2016

Closed

converter.py crashes #11

@dustMason dustMason referenced this issue in aquavitae/docker-weasyprint Oct 15, 2016

Closed

Build failing? #1

jsonn pushed a commit to jsonn/pkgsrc that referenced this issue Jan 15, 2017

kleink
Update py-weasyprint to 0.34.
Version 0.34
------------

Released on 2016-12-21.

Bug fixes:

* `#398 <Kozea/WeasyPrint#398>`_:
  Honor the presentational_hints option for PDFs.
* `#399 <Kozea/WeasyPrint#399>`_:
  Avoid CairoSVG-2.0.0rc* on Python 2.
* `#396 <Kozea/WeasyPrint#396>`_:
  Correctly close files open by mkstemp.
* `#403 <Kozea/WeasyPrint#403>`_:
  Cast the number of columns into int.
* Fix multi-page multi-columns and add related tests.


Version 0.33
------------

Released on 2016-11-28.

New features:

* `#393 <Kozea/WeasyPrint#393:
  Add tests on MacOS.
* `#370 <Kozea/WeasyPrint#370>`_:
  Enable @font-face on MacOS.

Bug fixes:

* `#389 <Kozea/WeasyPrint#389>`_:
  Always update resume_at when splitting lines.
* `#394 <Kozea/WeasyPrint#394>`_:
  Don't build universal wheels.
* `#388 <Kozea/WeasyPrint#388>`_:
  Fix logic when finishing block formatting context.


Version 0.32
------------

Released on 2016-11-17.

New features:

* `#28 <Kozea/WeasyPrint#28>`_:
  Support @font-face on Linux.
* Support CSS fonts level 3 almost entirely, including OpenType features.
* `#253 <Kozea/WeasyPrint#253>`_:
  Support presentational hints (optional).
* Support break-after, break-before and break-inside for pages and columns.
* `#384 <Kozea/WeasyPrint#384:
  Major performance boost.

Bux fixes:

* `#368 <Kozea/WeasyPrint#368>`_:
  Respect white-space for shrink-to-fit.
* `#382 <Kozea/WeasyPrint#382>`_:
  Fix the preferred width for column groups.
* Handle relative boxes in column-layout boxes.

Documentation:

* Add more and more documentation about Windows installation.
* `#355 <Kozea/WeasyPrint#355:
  Add fonts requirements for tests.


Version 0.31
------------

Released on 2016-08-28.

New features:

* `#124 <Kozea/WeasyPrint#124>`_:
  Add MIME sniffing for images.
* `#60 <Kozea/WeasyPrint#60>`_:
  CSS Multi-column Layout.
* `#197 <Kozea/WeasyPrint#197>`_:
  Add hyphens at line breaks activated by a soft hyphen.

Bux fixes:

* `#132 <Kozea/WeasyPrint#132>`_:
  Fix Python 3 compatibility on Windows.

Documentation:

* `#329 <Kozea/WeasyPrint#329>`_:
  Add documentation about installation on Windows.


Version 0.30
------------

Released on 2016-07-18.

WeasyPrint now depends on html5lib-0.999999999.

Bux fixes:

* Fix Acid2
* `#325 <Kozea/WeasyPrint#325>`_:
  Cutting lines is broken in page margin boxes.
* `#334 <Kozea/WeasyPrint#334>`_:
  Newest html5lib 0.999999999 breaks rendering.


Version 0.29
------------

Released on 2016-06-17.

Bug fixes:

* `#263 <Kozea/WeasyPrint#263:
  Don't crash with floats with percents in positions.
* `#323 <Kozea/WeasyPrint#323>`_:
  Fix CairoSVG 2.0 pre-release dependency in Python 2.x.


Version 0.28
------------

Released on 2016-05-16.

Bug fixes:

* `#189 <Kozea/WeasyPrint#189>`_:
  ``white-space: nowrap`` still wraps on hyphens
* `#305 <Kozea/WeasyPrint#305>`_:
  Fix crashes on some tables
* Don't crash when transform matrix isn't invertible
* Don't crash when rendering ratio-only SVG images
* Fix margins and borders on some tables


Version 0.27
------------

Released on 2016-04-08.

New features:

* `#295 <Kozea/WeasyPrint#295>`_:
  Support the 'rem' unit.
* `#299 <Kozea/WeasyPrint#299>`_:
  Enhance the support of SVG images.

Bug fixes:

* `#307 <Kozea/WeasyPrint#307>`_:
  Fix the layout of cells larger than their tables.

Documentation:

* The website is now on GitHub Pages, the documentation is on Read the Docs.
* `#297 <Kozea/WeasyPrint#297>`_:
  Rewrite the CSS chapter of the documentation.
@mwangikinuthia

This comment has been minimized.

Show comment
Hide comment
@mwangikinuthia

mwangikinuthia Jan 9, 2018

so what should i do if this error arises?

so what should i do if this error arises?

@liZe

This comment has been minimized.

Show comment
Hide comment
@liZe

liZe Jan 9, 2018

Member

so what should i do if this error arises?

Use the latest versions of WeasyPrint and html5lib.

Member

liZe commented Jan 9, 2018

so what should i do if this error arises?

Use the latest versions of WeasyPrint and html5lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment