Links in generated PDF #1

waawal · 2012-04-16T15:28:02Z

Hi!

I really find the cleanliness of the generated .pdf's astounding! Well done!

Is there a way to generate pdf's with links either reffering to other places in the document or externally (http://...)?
If not, is that something that's on the roadmap?

The text was updated successfully, but these errors were encountered:

SimonSapin · 2012-04-17T13:09:15Z

Hi,

We would very much like to have that feature ourselves, but implementing is not obvious at best.

PDF documents generated by WeasyPrint currently come straight from cairo. Hyperlinks are on the cairo roadmap, but unfortunately it looks like it has been there for a few years already, without much progress.

An alternative could be to post-process the PDF’s to add the hyperlinks. I guess this would involve patching PyPDF heavily (Current it looks like it does not know much about the content of pages.) and some familiarity with the PDF spec. (I spend plenty of time with the CSS specs already...)

So, currently there are two ideas on how to make this happen, but none of them are easy. Any suggestion or patch is welcome ;)

waawal · 2012-04-26T08:39:54Z

Hi Simon,
thanks for your response.

I've looked into the post-parsing idea but without success. PyPDF has a extractText method but I haven't been able to get it to work on cairo generated PDF's.

Reportlab supports hyperlinks with the <link href=""></link> tag but since I've not found a sane way to to parse the cairo pdf's yet it's really of no use.

I think that you are right in that much effort would be required in order to add this functionality to WeasyPrint. So far I've only looked on pure Python implementations, maybe there are other solutions available that can be implemented with subprocesses hooks.

I will let you know if I stumble upon something that can be used for this in the interim period until cairo adds this functionality.

References:

SimonSapin · 2012-04-26T08:59:11Z

Thanks for sharing your research.

Extracting text from a PDF would not get us very far, it skips all of the layout/formatting. If you want the unformatted text, getting it from WeasyPrint’s boxes without going through PDF is probably easier anyway.

Switching entirely from cairo+Pango to ReportLab would be possible while keeping most of the code. But it is still a huge change with many implications. I have not looked the pros and cons.
Alternatively, ReportLab could post-process cairo’s PDF. But according to this only the paid version can "Reuse existing PDFs".

Your second link hints that links in the PDF format are clickable rectangles. It would be "easy" for WeasyPrint to get a list of (rectangle, URL) pairs. After that we "just" need to find a way to add them in the PDFs. Ideally this should really happen in cairo. So if anyone feels like reading the PDF spec and writing C code ... :)

liZe · 2012-05-14T23:23:25Z

You can try the "links" branch adding internal and external links. That's an experimental feature (broken in some cases such as links hidden behind boxes or css transforms), but it seems to work quite well now. Comments are welcome!

waawal · 2012-05-18T14:17:22Z

Wow! This is great news!
I will for sure try it out asap, just need to get an environment I can install WeasyPrint in.

🍰

SimonSapin · 2012-06-04T18:05:31Z

I just released WeasyPrint 0.9 (yet to be announced on the mailing-list) with support for PDF hyperlinks and bookmarks. Check out the demo: http://weasyprint.org/samples/CSS21-intro.pdf

In the end we bit the bullet and read the PDF spec. 0.9 parses the PDF files produced by cairo and use the incremental update mechanism to add metadata. The parser makes a lot of assumptions based on cairo’s output. It is not suitable for reading any PDF in the wild. This is all in the weasyprint/pdf.py file.

In the end it was not even that hard, it just took time to get familiar with the PDF spec. (And some courage/procrastination to dive in.)

waawal · 2012-07-02T09:49:51Z

@SimonSapin and @liZe

Sorry for lagging behind on this. I just installed version 0.10 on a clean Ubuntu 12.04 and tested to weasyprint http://www.w3.org/TR/CSS21/intro.html and some Sphinx-generated html-output.

👍 It looks and works beautifully!

Thanks for all your hard work and for keeping this project open source!

sublee · 2019-06-08T18:42:47Z

I'm using WeasyPrint 47. My result doesn't include clickable links. Is the hyperlink feature still available?

$ pip install weasyprint==47
$ weasyprint https://www.w3.org/TR/CSS21/selector.html selector.pdf

Demo: selector.pdf

liZe · 2019-06-08T19:19:47Z

Yes, it is, but you need at least Cairo 1.15.4. If you can't install a recent version of Cairo, you can use WeasyPrint 0.42.3 and get links (but miss the features added since).

sublee · 2019-06-08T19:40:23Z

@liZe Thanks for the very fast response.

I installed pycairo-1.18.1 but WeasyPrint still didn't generate clickable hyperlinks. Anyways, the workaround by WeasyPrint 0.42.3 you mentioned works well. I chose the version because I need only simple features.

Thank you so much :)

liZe · 2019-06-08T19:42:08Z

I installed pycairo-1.18.1 but WeasyPrint still didn't generate clickable hyperlinks.

You need a recent version of Cairo, not PyCairo (WeasyPrint doesn't use PyCairo). But if 0.42.3 works for you, no problem 😉.

sublee · 2019-06-09T10:14:49Z

@liZe Oh, that information of Cairo was confusing for me but now it's clear. libcairo2-1.14.6 has been installed in my system. Perhaps to install libcairo2-1.15.4+ is more complicated than WeasyPrint-0.42.3. So I decided to keep using WeasyPrint-0.42.3 as you recommended. Thank you for clarifying!

SimonSapin closed this as completed Jun 4, 2012

berend mentioned this issue Oct 17, 2013

Flask-WeasyPrint: <br/> seemingly not rendered #128

Closed

berndnicklas mentioned this issue Oct 6, 2017

CSS @page rule seems to be ignored if imported by @import rule #520

Closed

iiDeSTRoYeR mentioned this issue May 9, 2022

Arabic and Overlapping when text is justified and Bold (or underline) is used #1640

Open

alevals mentioned this issue May 18, 2022

Soft hyphen not displayed at line break #1214

Closed

liZe added a commit that referenced this issue Oct 2, 2022

Clean text layout step #1

3e0b56a

zl945 mentioned this issue Sep 21, 2023

Cannot load default config file #1969

Closed

LukasKlement mentioned this issue May 2, 2024

Fonts breaking in v62 #2144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Links in generated PDF #1

Links in generated PDF #1

waawal commented Apr 16, 2012

SimonSapin commented Apr 17, 2012

waawal commented Apr 26, 2012

SimonSapin commented Apr 26, 2012

liZe commented May 14, 2012

waawal commented May 18, 2012

SimonSapin commented Jun 4, 2012

waawal commented Jul 2, 2012

sublee commented Jun 8, 2019

liZe commented Jun 8, 2019

sublee commented Jun 8, 2019

liZe commented Jun 8, 2019

sublee commented Jun 9, 2019