New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Links in generated PDF #1
Comments
Hi, We would very much like to have that feature ourselves, but implementing is not obvious at best. PDF documents generated by WeasyPrint currently come straight from cairo. Hyperlinks are on the cairo roadmap, but unfortunately it looks like it has been there for a few years already, without much progress. An alternative could be to post-process the PDF’s to add the hyperlinks. I guess this would involve patching PyPDF heavily (Current it looks like it does not know much about the content of pages.) and some familiarity with the PDF spec. (I spend plenty of time with the CSS specs already...) So, currently there are two ideas on how to make this happen, but none of them are easy. Any suggestion or patch is welcome ;) |
Hi Simon, I've looked into the post-parsing idea but without success. PyPDF has a Reportlab supports hyperlinks with the I think that you are right in that much effort would be required in order to add this functionality to WeasyPrint. So far I've only looked on pure Python implementations, maybe there are other solutions available that can be implemented with subprocesses hooks. I will let you know if I stumble upon something that can be used for this in the interim period until cairo adds this functionality. References: |
Thanks for sharing your research. Extracting text from a PDF would not get us very far, it skips all of the layout/formatting. If you want the unformatted text, getting it from WeasyPrint’s boxes without going through PDF is probably easier anyway. Switching entirely from cairo+Pango to ReportLab would be possible while keeping most of the code. But it is still a huge change with many implications. I have not looked the pros and cons. Your second link hints that links in the PDF format are clickable rectangles. It would be "easy" for WeasyPrint to get a list of (rectangle, URL) pairs. After that we "just" need to find a way to add them in the PDFs. Ideally this should really happen in cairo. So if anyone feels like reading the PDF spec and writing C code ... :) |
You can try the "links" branch adding internal and external links. That's an experimental feature (broken in some cases such as links hidden behind boxes or css transforms), but it seems to work quite well now. Comments are welcome! |
Wow! This is great news! 🍰 |
I just released WeasyPrint 0.9 (yet to be announced on the mailing-list) with support for PDF hyperlinks and bookmarks. Check out the demo: http://weasyprint.org/samples/CSS21-intro.pdf In the end we bit the bullet and read the PDF spec. 0.9 parses the PDF files produced by cairo and use the incremental update mechanism to add metadata. The parser makes a lot of assumptions based on cairo’s output. It is not suitable for reading any PDF in the wild. This is all in the In the end it was not even that hard, it just took time to get familiar with the PDF spec. (And some courage/procrastination to dive in.) |
@SimonSapin and @liZe Sorry for lagging behind on this. I just installed version 👍 It looks and works beautifully! Thanks for all your hard work and for keeping this project open source! |
I'm using WeasyPrint 47. My result doesn't include clickable links. Is the hyperlink feature still available?
Demo: selector.pdf |
Yes, it is, but you need at least Cairo 1.15.4. If you can't install a recent version of Cairo, you can use WeasyPrint 0.42.3 and get links (but miss the features added since). |
@liZe Thanks for the very fast response. I installed pycairo-1.18.1 but WeasyPrint still didn't generate clickable hyperlinks. Anyways, the workaround by WeasyPrint 0.42.3 you mentioned works well. I chose the version because I need only simple features. Thank you so much :) |
You need a recent version of Cairo, not PyCairo (WeasyPrint doesn't use PyCairo). But if 0.42.3 works for you, no problem 😉. |
@liZe Oh, that information of Cairo was confusing for me but now it's clear. libcairo2-1.14.6 has been installed in my system. Perhaps to install libcairo2-1.15.4+ is more complicated than WeasyPrint-0.42.3. So I decided to keep using WeasyPrint-0.42.3 as you recommended. Thank you for clarifying! |
Hi!
I really find the cleanliness of the generated .pdf's astounding! Well done!
Is there a way to generate pdf's with
links
either reffering to other places in the document or externally (http://...)?If not, is that something that's on the roadmap?
The text was updated successfully, but these errors were encountered: