Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add MIME sniffing for images. #124
The web has lots of broken legacy content served with incorrect MIME types in the HTTP
WeasyPrint currently trusts the header for PNG and SVG, and gives everything to GDK-PixBuf which does some form of sniffing which might not be the same as in the WHATWG spec.
Re-implementing sniffing in WeasyPrint is probably overkill, but we could at least give GDK-PixBuf a try when explicit PNG or SVG decoding fails.
Any chances of this being solved at some point?
I'm running into the #103 issue and I have A TON of images with wrong
Edit: As a suggestion, and ignoring how WeasyPrint works internally, maybe a 'quick' fix could be to raise an exception when this happens so I can treat it. As of today, no exception is raised at all and the PDF is still generated, but without the problematic images.
According to the sniffing "spec":
Without implementing the "spec", here's what I've done:
It's closer to what's done by browsers and should be OK for "the real life cases". Problems left:
@alejoar I suppose that your problem is non-PNG files served with a PNG
Yes, that's what browsers do too. Adding an option to raise an error when an image is not found is possible, but relying on the logs is probably easier and let the user filter the warnings that are important for him.
added a commit
Aug 26, 2016
I’ve thought before of adding a "strict mode" flag that makes any error fatal (including fetching and decoding images but also stylesheets) but never got around to it.
With web browsers there is typically a human looking at the screen, and if something goes wrong it’s better to show them as much as possible rather than just this:
They can hit the "Refresh" button to try again.
WeasyPrint however is more typically used in automated systems that may be unattended, where it’s better for errors not to pass silently, so that we can realize more easily they’re happening.
referenced this issue
Aug 26, 2016
@liZe this was exactly our situation. JPEG files served with PNG Content-Type. I ended up finding the error on our code and managed to solve it, and as for the images already in our server I wrote a simple script to go over all of them and fix the Content-Type. What I actually did is download each image and convert everything to JPEG, as there was a mix of JPEG and PNG (although everything served as PNG) and set proper Content-Type (we use Azure storage, which let me easily mess up the Content-Type months ago and not realize till now when I needed to automatically generate PDFs). Took about 16 hours for the script to finish, but wasn't as bad as I first thought.
Even though this fixed my problem, I really appreciate your effort so I just tried installing from your commit and tested your fix: Works flawlessly!
Here's the detail of my test just so you know:
Results in the following PDF: https://wt3002.blob.core.windows.net/static/uploads_dev/weasytest.pdf
Just what was expected.
@SimonSapin the strict mode you propose is exactly what I thought of when I was thinking in potential solutions to the issue.
Thank you both for the help.