New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MIME sniffing for images. #124
Comments
Any chances of this being solved at some point? I'm running into the #103 issue and I have A TON of images with wrong Edit: As a suggestion, and ignoring how WeasyPrint works internally, maybe a 'quick' fix could be to raise an exception when this happens so I can treat it. As of today, no exception is raised at all and the PDF is still generated, but without the problematic images. |
Current implementation:
According to the sniffing "spec":
Without implementing the "spec", here's what I've done:
It's closer to what's done by browsers and should be OK for "the real life cases". Problems left:
@alejoar I suppose that your problem is non-PNG files served with a PNG
Yes, that's what browsers do too. Adding an option to raise an error when an image is not found is possible, but relying on the logs is probably easier and let the user filter the warnings that are important for him. |
I’ve thought before of adding a "strict mode" flag that makes any error fatal (including fetching and decoding images but also stylesheets) but never got around to it. With web browsers there is typically a human looking at the screen, and if something goes wrong it’s better to show them as much as possible rather than just this: They can hit the "Refresh" button to try again. WeasyPrint however is more typically used in automated systems that may be unattended, where it’s better for errors not to pass silently, so that we can realize more easily they’re happening. |
@liZe this was exactly our situation. JPEG files served with PNG Content-Type. I ended up finding the error on our code and managed to solve it, and as for the images already in our server I wrote a simple script to go over all of them and fix the Content-Type. What I actually did is download each image and convert everything to JPEG, as there was a mix of JPEG and PNG (although everything served as PNG) and set proper Content-Type (we use Azure storage, which let me easily mess up the Content-Type months ago and not realize till now when I needed to automatically generate PDFs). Took about 16 hours for the script to finish, but wasn't as bad as I first thought. Even though this fixed my problem, I really appreciate your effort so I just tried installing from your commit and tested your fix: Works flawlessly! Here's the detail of my test just so you know: Test HTML:
Finally: Results in the following PDF: https://wt3002.blob.core.windows.net/static/uploads_dev/weasytest.pdf Just what was expected. @SimonSapin the strict mode you propose is exactly what I thought of when I was thinking in potential solutions to the issue. Thank you both for the help. |
@alejoar |
Version 0.34 ------------ Released on 2016-12-21. Bug fixes: * `#398 <https://github.com/Kozea/WeasyPrint/issues/398>`_: Honor the presentational_hints option for PDFs. * `#399 <https://github.com/Kozea/WeasyPrint/pull/399>`_: Avoid CairoSVG-2.0.0rc* on Python 2. * `#396 <https://github.com/Kozea/WeasyPrint/issues/396>`_: Correctly close files open by mkstemp. * `#403 <https://github.com/Kozea/WeasyPrint/issues/403>`_: Cast the number of columns into int. * Fix multi-page multi-columns and add related tests. Version 0.33 ------------ Released on 2016-11-28. New features: * `#393 <https://github.com/Kozea/WeasyPrint/issues/393>`_: Add tests on MacOS. * `#370 <https://github.com/Kozea/WeasyPrint/issues/370>`_: Enable @font-face on MacOS. Bug fixes: * `#389 <https://github.com/Kozea/WeasyPrint/issues/389>`_: Always update resume_at when splitting lines. * `#394 <https://github.com/Kozea/WeasyPrint/issues/394>`_: Don't build universal wheels. * `#388 <https://github.com/Kozea/WeasyPrint/issues/388>`_: Fix logic when finishing block formatting context. Version 0.32 ------------ Released on 2016-11-17. New features: * `#28 <https://github.com/Kozea/WeasyPrint/issues/28>`_: Support @font-face on Linux. * Support CSS fonts level 3 almost entirely, including OpenType features. * `#253 <https://github.com/Kozea/WeasyPrint/issues/253>`_: Support presentational hints (optional). * Support break-after, break-before and break-inside for pages and columns. * `#384 <https://github.com/Kozea/WeasyPrint/issues/384>`_: Major performance boost. Bux fixes: * `#368 <https://github.com/Kozea/WeasyPrint/issues/368>`_: Respect white-space for shrink-to-fit. * `#382 <https://github.com/Kozea/WeasyPrint/issues/382>`_: Fix the preferred width for column groups. * Handle relative boxes in column-layout boxes. Documentation: * Add more and more documentation about Windows installation. * `#355 <https://github.com/Kozea/WeasyPrint/issues/355>`_: Add fonts requirements for tests. Version 0.31 ------------ Released on 2016-08-28. New features: * `#124 <https://github.com/Kozea/WeasyPrint/issues/124>`_: Add MIME sniffing for images. * `#60 <https://github.com/Kozea/WeasyPrint/issues/60>`_: CSS Multi-column Layout. * `#197 <https://github.com/Kozea/WeasyPrint/pull/197>`_: Add hyphens at line breaks activated by a soft hyphen. Bux fixes: * `#132 <https://github.com/Kozea/WeasyPrint/pull/132>`_: Fix Python 3 compatibility on Windows. Documentation: * `#329 <https://github.com/Kozea/WeasyPrint/issues/329>`_: Add documentation about installation on Windows. Version 0.30 ------------ Released on 2016-07-18. WeasyPrint now depends on html5lib-0.999999999. Bux fixes: * Fix Acid2 * `#325 <https://github.com/Kozea/WeasyPrint/issues/325>`_: Cutting lines is broken in page margin boxes. * `#334 <https://github.com/Kozea/WeasyPrint/issues/334>`_: Newest html5lib 0.999999999 breaks rendering. Version 0.29 ------------ Released on 2016-06-17. Bug fixes: * `#263 <https://github.com/Kozea/WeasyPrint/pull/263>`_: Don't crash with floats with percents in positions. * `#323 <https://github.com/Kozea/WeasyPrint/pull/323>`_: Fix CairoSVG 2.0 pre-release dependency in Python 2.x. Version 0.28 ------------ Released on 2016-05-16. Bug fixes: * `#189 <https://github.com/Kozea/WeasyPrint/issues/189>`_: ``white-space: nowrap`` still wraps on hyphens * `#305 <https://github.com/Kozea/WeasyPrint/issues/305>`_: Fix crashes on some tables * Don't crash when transform matrix isn't invertible * Don't crash when rendering ratio-only SVG images * Fix margins and borders on some tables Version 0.27 ------------ Released on 2016-04-08. New features: * `#295 <https://github.com/Kozea/WeasyPrint/pull/295>`_: Support the 'rem' unit. * `#299 <https://github.com/Kozea/WeasyPrint/pull/299>`_: Enhance the support of SVG images. Bug fixes: * `#307 <https://github.com/Kozea/WeasyPrint/issues/307>`_: Fix the layout of cells larger than their tables. Documentation: * The website is now on GitHub Pages, the documentation is on Read the Docs. * `#297 <https://github.com/Kozea/WeasyPrint/issues/297>`_: Rewrite the CSS chapter of the documentation.
The web has lots of broken legacy content served with incorrect MIME types in the HTTP
Content-Type
header. Web browsers "sniff" the actual type of images by looking at the first few bytes:http://mimesniff.spec.whatwg.org/
WeasyPrint currently trusts the header for PNG and SVG, and gives everything to GDK-PixBuf which does some form of sniffing which might not be the same as in the WHATWG spec.
Re-implementing sniffing in WeasyPrint is probably overkill, but we could at least give GDK-PixBuf a try when explicit PNG or SVG decoding fails.
The text was updated successfully, but these errors were encountered: