Improve PDF export #1350

timobrembeck · 2022-04-11T14:24:34Z

Motivation

At the moment, the PDF export feels a bit rudimentary and buggy.

Proposed Solution

~~Add cover page with city & integreat logos~~ polish table of contents - on hold until requirements are clarified
Visualize the hierarchy in the table of contents
Fix font of headlines (should be raleway)
Fix font of mixed alphabets (e.g. links to Arabic pages in German content) - some kind of fallback-font
~~Fix images & icons in content~~ - Rejected: content issue
Fix line breaks:

Content	PDF

Alternatives

Additional Context

osmers · 2022-04-19T08:34:08Z

The Malte PDF should have the Malte Logo in the bottom right hand of the PDF, right?

timobrembeck · 2022-04-19T08:38:22Z

The Malte PDF should have the Malte Logo in the bottom right hand of the PDF, right?

At the moment, we don't have the possibility to set individual logos other than the Integreat logo, but it's planned (see issue #1093).
Did the PDF export in WordPress have a Malte logo?

osmers · 2022-04-19T08:41:27Z

I don't remember whether it had a Malte Logo - I just found the Integreat Logo in the Malte PDF confusing...especially for people who only know Malte...
Does the individual logo issue include Kommunen logos? Bcs for Integreat it would be great if they could appear additionally to the Integreat Logo in the top right hand corner (or bottom left hand maybe?)

timobrembeck · 2022-04-19T22:09:47Z

Regarding the Malte Logo I opened PR #1364 which handles all logos in the CMS, so as soon as this is merged, there will the Malte logo in the PDF footer.
And the individual region logos are already shown in the top right corner, they just have to be set in the region form.

timobrembeck · 2022-04-24T10:02:58Z

@osmers the PR is now merged and released, so the Malte CMS is now styled in the Malte design which includes the logo in the footer of exported PDFs.

ulliholtgrave · 2022-04-27T09:58:38Z

@timoludwig I guess the problems with the images and icons are related to our implementations, but our file migration. It seems a few static files are also linking to the old TUM-Server, which are not redirected on the test system. With new or updated images this is no longer an issue.

BUT we seem to have an issue with the handling of malformed HTML. As you can see in the picture some items are missing because there are some unnecessary HTML entities in the content which are only visible in the source code.
See:

Meldebescheinigung, An- und Abmelden des Wohnsitzes
Aufenthaltstitel und Niederlassungserlaubnis

Verlängerung des Ankunftsnachweises


Aufenthaltsgestattung ausstellen und verlängern (Asyl)


Beantragung der Arbeitserlaubnis


Aufenthaltserlaubnis und Ausnahmegenehmigungen für Reisen

@svenseeberg @osmers As these are pure content issues: Should we handle them somehow in the implementation or is this something our users are supposed to fix by editing the content?

timobrembeck · 2022-04-27T10:12:37Z

@timoludwig I guess the problems with the images and icons are related to our implementations, but our file migration. It seems a few static files are also linking to the old TUM-Server, which are not redirected on the test system. With new or updated images this is no longer an issue.

What do you mean by files linking to the old TUM-server? We also have this problem on Malte which never was on the servers of TUM afaik, so I don't think this has anything to do with it. What you probably mean are the old wordpress-styled image paths with /wp-content/uploads/, but this is totally expected (since we don't want to run a search/replace on the whole content but rather solve this by apache redirects which support the legacy urls). Our PDF library throws an error because it thinks that /wp-content/uploads/ is not in the static directory /static/ and it thinks that something suspicious is going on, so we probably have to implement a whitelist or something in the link_callback() function.

@svenseeberg @osmers As these are pure content issues: Should we handle them somehow in the implementation or is this something our users are supposed to fix by editing the content?

In my opinion, this should be fixed in the content (and proper <ul>/<li> elements should be used insted of literal * characters).

ulliholtgrave · 2022-04-27T10:50:20Z

Okay, I see. I found some Sources linking to the "vkcmar..." URL that weren't working, but right: If the /wp-content/uploads URLs aren't working there seems to be another issue.

@timoludwig Regarding the other issue. Unfortunately Markdown doesn't support raw HTML, but as you can see in the Screenshot we also have this issue with the div inside the list items that are causing troubles. The * are another troubling problem 🙈

timobrembeck · 2022-04-27T11:26:25Z

@ulliholtgrave you can insert html code into markdown with three backticks:

<li>
    <div>test</div>
</li>

Since div tags inside li seem to be valid html, we should open an issue for this in xhtml2pdf.
Edit: Maybe this is the same problem as xhtml2pdf/xhtml2pdf#145

ulliholtgrave · 2022-04-27T11:45:07Z

Okay, I see, but this Issue is from 2013 🙈 We might want to fill a new one :D

timobrembeck · 2022-04-27T11:48:49Z

Okay, I see, but this Issue is from 2013 see_no_evil We might want to fill a new one :D

Yes, or just update the existing issue with a very detailed explanation how to reproduce the problem and what the expected output would be, probably the original issue was not tackled because it's too vague.

ulliholtgrave · 2022-04-29T11:47:33Z

I created a new Issue for it in the upstream repository (see xhtml2pdf/xhtml2pdf#611)

ulliholtgrave · 2022-10-11T18:09:53Z

At least for the issue with the line breaks I see 3 possible solutions:

Reproduce the handling of
s inside
from TinyMCE or our app and bring this handling to the upstream repo (see Using <div> elements in lists breaks bullet points xhtml2pdf/xhtml2pdf#611)
Reproduce the handling of
s inside
from TinyMCE or our app and apply this filter before we pass our content to the PDF generation function.
Try to sanitize our content on a database level. I am aware of the fact that this is valid HTML, but this could be a quickfix for the issue and we could use our clean_content-Method to prevent this problem from occurring (see

integreat-cms/integreat_cms/cms/forms/custom_content_model_form.py

Line 106 in a10d0b8

for monospaced in content.iter("pre", "code"):

)

charludo · 2022-10-19T08:35:17Z

* [ ]  ~Add cover page with city & integreat logos~ polish table of contents

Is this still current? ToC looks fine, what changes are intended?

* [ ]  Fix font of mixed alphabets (e.g. links to Arabic pages in German content) - some kind of fallback-font

Is this still current? Latin characters in arabic text appear to have the font we also use in latin PDFs:

* [ ]  warning **High prio:** Fix images & icons in the content as well as line breaks:

Unable to reproduce. Would be great to get an example of a page where this is happening.

I created a new Issue for it in the upstream repository (see xhtml2pdf/xhtml2pdf#611)
Should be fixed. Also created a PR upstream.

timobrembeck · 2022-10-19T09:16:07Z

Is this still current? ToC looks fine, what changes are intended?

No, I think the service team does not yet know what they want here 😉

Is this still current? Latin characters in arabic text appear to have the font we also use in latin PDFs:

I think it's the other way round: Arabic letters in e.g. German pages are not working, e.g.:

Editor	PDF

Unable to reproduce. Would be great to get an example of a page where this is happening.

I took the screenshot from here:
https://malteapp.de/schwerin/de/umgebung/beratungsangebote/migrationsberatung
And the resulting PDF is:
https://cms.malteapp.de/pdf/62a5949595/Malte%20-%20Deutsch%20-%20Migrationsberatung.pdf

Could also be an infrastructure problem with some broken media links or similar, didn't have time yet to look into it.
Edit: Probably this is due to the /wp-content/uploads/ media links which work on web because we created an Apache redirect for it, but it does not work in the PDF library, because it thinks the files are not in the static directory.

Should be fixed. Also created a PR upstream.

Awesome, thanks a lot! 🚀

charludo · 2022-10-19T12:02:48Z

Is something keeping us from just using DejaVu for LTR as well as RTL content?

Either Open Sans is providing empty characters to the arabic unicode range, or xhtml2pdf does not support fallback fonts for unknown unicode characters. Since it definitely does not support specifying a unicode-range inside @font-face, using a font supporting arabic out of the box seems the easiest solution.

charludo · 2022-10-19T13:55:14Z

Probably this is due to the /wp-content/uploads/ media links which work on web because we created an Apache redirect for it, but it does not work in the PDF library, because it thinks the files are not in the static directory.

Yes, looks like it. Is there some location mapping that works for all of these files? Don't think I have access to the Apache config. Alternatively I can leave this issue for someone who has access.

charludo · 2022-10-19T14:40:58Z

~~Yes, looks like it. Is there some location mapping that works for all of these files? Don't think I have access to the Apache config. Alternatively I can leave this issue for someone who has access.~~

Something appears to be wrong with the icons themselves, at least in regards to xhtml2pdf.

Try manually downloading https://cms.malteapp.de/schwerin/wp-content/uploads/sites/124/2020/07/phone.png, then uploading it to the CMS Media Library and insert it into a page. Same issue occurs.

Open the file in an image editor, reexport, add to Media Library again, insert into page, bug is gone.

Unless we want to use PIL or something to open and reexport every single image being passed to link_callback(), I think this issue should be fixed in content, not by us.

timobrembeck · 2022-10-19T21:27:47Z

Something appears to be wrong with the icons themselves, at least in regards to xhtml2pdf.

Try manually downloading https://cms.malteapp.de/schwerin/wp-content/uploads/sites/124/2020/07/phone.png, then uploading it to the CMS Media Library and insert it into a page. Same issue occurs.

Open the file in an image editor, reexport, add to Media Library again, insert into page, bug is gone.

Unless we want to use PIL or something to open and reexport every single image being passed to link_callback(), I think this issue should be fixed in content, not by us.

Ohh, interesting, thanks for figuring that out. Ok, then sorry for bothering you, yes this should be fixed in the content.

Is something keeping us from just using DejaVu for LTR as well as RTL content?

Either Open Sans is providing empty characters to the arabic unicode range, or xhtml2pdf does not support fallback fonts for unknown unicode characters. Since it definitely does not support specifying a unicode-range inside @font-face, using a font supporting arabic out of the box seems the easiest solution.

In an ideal world, the PDF looks exactly like the content in the apps, which use Raleway for headings and open sans for the content (see Fonts for Alphabets). However, it's more important that text is actually readable, so if using a different font is the only option, I'd go with it.
xhtml2pdf itself is very pessimistic about custom fonts:

Right now it seems like right-to-left support isn’t working while using a default font-family like p { font-family: Times-Roman }. We’re working on fixing this. However, it works by using the @font-face tag in the CSS definition and defining a custom font. Therefore you need the specified font file. “MarkaziText” for example seems to work. Other fonts might work as well but haven’t been tested.

— https://xhtml2pdf.readthedocs.io/en/latest/reference.html#arabic-hebrew-persian-etc-fonts-support

timobrembeck added the 🔨 enhancement This improves an existing feature label Apr 11, 2022

timobrembeck added this to the Version 1.1 milestone Apr 11, 2022

This was referenced Apr 13, 2022

Fix PDF export #1357

Merged

Release 2022.4.1 #1363

Merged

timobrembeck mentioned this issue Apr 30, 2022

Fix legacy media urls in PDF export #1375

Merged

timobrembeck modified the milestones: Version 1.1, Version 1.2 May 9, 2022

timobrembeck mentioned this issue May 9, 2022

Release 2022.5.0 #1399

Merged

ulliholtgrave added the 😱 effort: high Big change, which requires >12h label May 16, 2022

svenseeberg added the ‼️ prio: high Needs to be resolved ASAP. label Jun 7, 2022

svenseeberg modified the milestones: 22Q2, 22Q3 Jun 7, 2022

timobrembeck mentioned this issue Jun 8, 2022

Meta: 📄 PDF #1533

Open

33 tasks

timobrembeck modified the milestones: 22Q3, 22Q4 Oct 2, 2022

charludo self-assigned this Oct 15, 2022

charludo mentioned this issue Oct 19, 2022

Fix ListItem styling in PDF exports #1781

Closed

charludo linked a pull request Oct 19, 2022 that will close this issue

Fix ListItem styling in PDF exports #1781

Closed

charludo mentioned this issue Oct 20, 2022

Various small PDF export improvements #1784

Merged

charludo closed this as completed in #1784 Oct 21, 2022

ulliholtgrave mentioned this issue Oct 26, 2022

Release 2022.10.2 #1809

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve PDF export #1350

Improve PDF export #1350

timobrembeck commented Apr 11, 2022 •

edited

osmers commented Apr 19, 2022

timobrembeck commented Apr 19, 2022

osmers commented Apr 19, 2022

timobrembeck commented Apr 19, 2022

timobrembeck commented Apr 24, 2022

ulliholtgrave commented Apr 27, 2022 •

edited

timobrembeck commented Apr 27, 2022

ulliholtgrave commented Apr 27, 2022

timobrembeck commented Apr 27, 2022 •

edited

ulliholtgrave commented Apr 27, 2022

timobrembeck commented Apr 27, 2022

ulliholtgrave commented Apr 29, 2022

ulliholtgrave commented Oct 11, 2022

charludo commented Oct 19, 2022

timobrembeck commented Oct 19, 2022 •

edited

charludo commented Oct 19, 2022

charludo commented Oct 19, 2022

charludo commented Oct 19, 2022 •

edited

timobrembeck commented Oct 19, 2022

Improve PDF export #1350

Improve PDF export #1350

Comments

timobrembeck commented Apr 11, 2022 • edited

Motivation

Proposed Solution

Alternatives

Additional Context

osmers commented Apr 19, 2022

timobrembeck commented Apr 19, 2022

osmers commented Apr 19, 2022

timobrembeck commented Apr 19, 2022

timobrembeck commented Apr 24, 2022

ulliholtgrave commented Apr 27, 2022 • edited

timobrembeck commented Apr 27, 2022

ulliholtgrave commented Apr 27, 2022

timobrembeck commented Apr 27, 2022 • edited

ulliholtgrave commented Apr 27, 2022

timobrembeck commented Apr 27, 2022

ulliholtgrave commented Apr 29, 2022

ulliholtgrave commented Oct 11, 2022

charludo commented Oct 19, 2022

timobrembeck commented Oct 19, 2022 • edited

charludo commented Oct 19, 2022

charludo commented Oct 19, 2022

charludo commented Oct 19, 2022 • edited

timobrembeck commented Oct 19, 2022

timobrembeck commented Apr 11, 2022 •

edited

ulliholtgrave commented Apr 27, 2022 •

edited

timobrembeck commented Apr 27, 2022 •

edited

timobrembeck commented Oct 19, 2022 •

edited

charludo commented Oct 19, 2022 •

edited