Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PDF export #1350

Closed
5 of 6 tasks
Tracked by #1533
timobrembeck opened this issue Apr 11, 2022 · 19 comments · Fixed by #1784
Closed
5 of 6 tasks
Tracked by #1533

Improve PDF export #1350

timobrembeck opened this issue Apr 11, 2022 · 19 comments · Fixed by #1784
Assignees
Labels
‼️ prio: high Needs to be resolved ASAP. 🔨 enhancement This improves an existing feature 😱 effort: high Big change, which requires >12h
Milestone

Comments

@timobrembeck
Copy link
Member

timobrembeck commented Apr 11, 2022

Motivation

At the moment, the PDF export feels a bit rudimentary and buggy.

Proposed Solution

  • Add cover page with city & integreat logos polish table of contents - on hold until requirements are clarified
  • Visualize the hierarchy in the table of contents
  • Fix font of headlines (should be raleway)
  • Fix font of mixed alphabets (e.g. links to Arabic pages in German content) - some kind of fallback-font
  • Fix images & icons in content - Rejected: content issue
  • Fix line breaks:
Content PDF
pdf-source pdf-pdf

Alternatives

Additional Context

@timobrembeck timobrembeck added the 🔨 enhancement This improves an existing feature label Apr 11, 2022
@timobrembeck timobrembeck added this to the Version 1.1 milestone Apr 11, 2022
This was referenced Apr 13, 2022
@osmers
Copy link

osmers commented Apr 19, 2022

The Malte PDF should have the Malte Logo in the bottom right hand of the PDF, right?

@timobrembeck
Copy link
Member Author

The Malte PDF should have the Malte Logo in the bottom right hand of the PDF, right?

At the moment, we don't have the possibility to set individual logos other than the Integreat logo, but it's planned (see issue #1093).
Did the PDF export in WordPress have a Malte logo?

@osmers
Copy link

osmers commented Apr 19, 2022

I don't remember whether it had a Malte Logo - I just found the Integreat Logo in the Malte PDF confusing...especially for people who only know Malte...
Does the individual logo issue include Kommunen logos? Bcs for Integreat it would be great if they could appear additionally to the Integreat Logo in the top right hand corner (or bottom left hand maybe?)

@timobrembeck
Copy link
Member Author

Regarding the Malte Logo I opened PR #1364 which handles all logos in the CMS, so as soon as this is merged, there will the Malte logo in the PDF footer.
And the individual region logos are already shown in the top right corner, they just have to be set in the region form.

@timobrembeck
Copy link
Member Author

@osmers the PR is now merged and released, so the Malte CMS is now styled in the Malte design which includes the logo in the footer of exported PDFs.

@ulliholtgrave
Copy link
Member

ulliholtgrave commented Apr 27, 2022

grafik
@timoludwig I guess the problems with the images and icons are related to our implementations, but our file migration. It seems a few static files are also linking to the old TUM-Server, which are not redirected on the test system. With new or updated images this is no longer an issue.

BUT we seem to have an issue with the handling of malformed HTML. As you can see in the picture some items are missing because there are some unnecessary HTML entities in the content which are only visible in the source code.
See:

  • Meldebescheinigung, An- und Abmelden des Wohnsitzes
  • Aufenthaltstitel und Niederlassungserlaubnis
  • Verlängerung des Ankunftsnachweises
  • Aufenthaltsgestattung ausstellen und verlängern (Asyl)
  • Beantragung der Arbeitserlaubnis
  • Aufenthaltserlaubnis und Ausnahmegenehmigungen für Reisen
  • grafik

    @svenseeberg @osmers As these are pure content issues: Should we handle them somehow in the implementation or is this something our users are supposed to fix by editing the content?

    @timobrembeck
    Copy link
    Member Author

    @timoludwig I guess the problems with the images and icons are related to our implementations, but our file migration. It seems a few static files are also linking to the old TUM-Server, which are not redirected on the test system. With new or updated images this is no longer an issue.

    What do you mean by files linking to the old TUM-server? We also have this problem on Malte which never was on the servers of TUM afaik, so I don't think this has anything to do with it. What you probably mean are the old wordpress-styled image paths with /wp-content/uploads/, but this is totally expected (since we don't want to run a search/replace on the whole content but rather solve this by apache redirects which support the legacy urls). Our PDF library throws an error because it thinks that /wp-content/uploads/ is not in the static directory /static/ and it thinks that something suspicious is going on, so we probably have to implement a whitelist or something in the link_callback() function.

    @svenseeberg @osmers As these are pure content issues: Should we handle them somehow in the implementation or is this something our users are supposed to fix by editing the content?

    In my opinion, this should be fixed in the content (and proper <ul>/<li> elements should be used insted of literal * characters).

    @ulliholtgrave
    Copy link
    Member

    Okay, I see. I found some Sources linking to the "vkcmar..." URL that weren't working, but right: If the /wp-content/uploads URLs aren't working there seems to be another issue.

    @timoludwig Regarding the other issue. Unfortunately Markdown doesn't support raw HTML, but as you can see in the Screenshot we also have this issue with the div inside the list items that are causing troubles. The * are another troubling problem 🙈
    grafik

    @timobrembeck
    Copy link
    Member Author

    timobrembeck commented Apr 27, 2022

    @ulliholtgrave you can insert html code into markdown with three backticks:

    <li>
        <div>test</div>
    </li>
    

    Since div tags inside li seem to be valid html, we should open an issue for this in xhtml2pdf.
    Edit: Maybe this is the same problem as xhtml2pdf/xhtml2pdf#145

    @ulliholtgrave
    Copy link
    Member

    Okay, I see, but this Issue is from 2013 🙈 We might want to fill a new one :D

    @timobrembeck
    Copy link
    Member Author

    Okay, I see, but this Issue is from 2013 see_no_evil We might want to fill a new one :D

    Yes, or just update the existing issue with a very detailed explanation how to reproduce the problem and what the expected output would be, probably the original issue was not tackled because it's too vague.

    @ulliholtgrave
    Copy link
    Member

    I created a new Issue for it in the upstream repository (see xhtml2pdf/xhtml2pdf#611)

    @timobrembeck timobrembeck modified the milestones: Version 1.1, Version 1.2 May 9, 2022
    @ulliholtgrave ulliholtgrave added the 😱 effort: high Big change, which requires >12h label May 16, 2022
    @svenseeberg svenseeberg added the ‼️ prio: high Needs to be resolved ASAP. label Jun 7, 2022
    @svenseeberg svenseeberg modified the milestones: 22Q2, 22Q3 Jun 7, 2022
    @timobrembeck timobrembeck mentioned this issue Jun 8, 2022
    33 tasks
    @timobrembeck timobrembeck modified the milestones: 22Q3, 22Q4 Oct 2, 2022
    @ulliholtgrave
    Copy link
    Member

    At least for the issue with the line breaks I see 3 possible solutions:

    @charludo charludo linked a pull request Oct 19, 2022 that will close this issue
    @charludo
    Copy link
    Contributor

    * [ ]  ~Add cover page with city & integreat logos~ polish table of contents
    

    Is this still current? ToC looks fine, what changes are intended?

    * [ ]  Fix font of mixed alphabets (e.g. links to Arabic pages in German content) - some kind of fallback-font
    

    Is this still current? Latin characters in arabic text appear to have the font we also use in latin PDFs:
    image

    * [ ]  warning **High prio:** Fix images & icons in the content as well as line breaks:
    

    Unable to reproduce. Would be great to get an example of a page where this is happening.

    I created a new Issue for it in the upstream repository (see xhtml2pdf/xhtml2pdf#611)
    Should be fixed. Also created a PR upstream.

    @timobrembeck
    Copy link
    Member Author

    timobrembeck commented Oct 19, 2022

    Is this still current? ToC looks fine, what changes are intended?

    No, I think the service team does not yet know what they want here 😉

    Is this still current? Latin characters in arabic text appear to have the font we also use in latin PDFs:

    I think it's the other way round: Arabic letters in e.g. German pages are not working, e.g.:

    Editor PDF
    Screenshot 2022-10-19 at 11-10-49 Integreat Redaktionssystem Screenshot 2022-10-19 at 11-11-00 Integreat - Deutsch - Willkommen pdf

    Unable to reproduce. Would be great to get an example of a page where this is happening.

    I took the screenshot from here:
    https://malteapp.de/schwerin/de/umgebung/beratungsangebote/migrationsberatung
    And the resulting PDF is:
    https://cms.malteapp.de/pdf/62a5949595/Malte%20-%20Deutsch%20-%20Migrationsberatung.pdf

    Could also be an infrastructure problem with some broken media links or similar, didn't have time yet to look into it.
    Edit: Probably this is due to the /wp-content/uploads/ media links which work on web because we created an Apache redirect for it, but it does not work in the PDF library, because it thinks the files are not in the static directory.

    Should be fixed. Also created a PR upstream.

    Awesome, thanks a lot! 🚀

    @charludo
    Copy link
    Contributor

    Is something keeping us from just using DejaVu for LTR as well as RTL content?

    Either Open Sans is providing empty characters to the arabic unicode range, or xhtml2pdf does not support fallback fonts for unknown unicode characters. Since it definitely does not support specifying a unicode-range inside @font-face, using a font supporting arabic out of the box seems the easiest solution.

    @charludo
    Copy link
    Contributor

    Probably this is due to the /wp-content/uploads/ media links which work on web because we created an Apache redirect for it, but it does not work in the PDF library, because it thinks the files are not in the static directory.

    Yes, looks like it. Is there some location mapping that works for all of these files? Don't think I have access to the Apache config. Alternatively I can leave this issue for someone who has access.

    @charludo
    Copy link
    Contributor

    charludo commented Oct 19, 2022

    Yes, looks like it. Is there some location mapping that works for all of these files? Don't think I have access to the Apache config. Alternatively I can leave this issue for someone who has access.

    Something appears to be wrong with the icons themselves, at least in regards to xhtml2pdf.

    Try manually downloading https://cms.malteapp.de/schwerin/wp-content/uploads/sites/124/2020/07/phone.png, then uploading it to the CMS Media Library and insert it into a page. Same issue occurs.

    Open the file in an image editor, reexport, add to Media Library again, insert into page, bug is gone.

    Unless we want to use PIL or something to open and reexport every single image being passed to link_callback(), I think this issue should be fixed in content, not by us.

    @timobrembeck
    Copy link
    Member Author

    Something appears to be wrong with the icons themselves, at least in regards to xhtml2pdf.

    Try manually downloading https://cms.malteapp.de/schwerin/wp-content/uploads/sites/124/2020/07/phone.png, then uploading it to the CMS Media Library and insert it into a page. Same issue occurs.

    Open the file in an image editor, reexport, add to Media Library again, insert into page, bug is gone.

    Unless we want to use PIL or something to open and reexport every single image being passed to link_callback(), I think this issue should be fixed in content, not by us.

    Ohh, interesting, thanks for figuring that out. Ok, then sorry for bothering you, yes this should be fixed in the content.

    Is something keeping us from just using DejaVu for LTR as well as RTL content?

    Either Open Sans is providing empty characters to the arabic unicode range, or xhtml2pdf does not support fallback fonts for unknown unicode characters. Since it definitely does not support specifying a unicode-range inside @font-face, using a font supporting arabic out of the box seems the easiest solution.

    In an ideal world, the PDF looks exactly like the content in the apps, which use Raleway for headings and open sans for the content (see Fonts for Alphabets). However, it's more important that text is actually readable, so if using a different font is the only option, I'd go with it.
    xhtml2pdf itself is very pessimistic about custom fonts:

    Right now it seems like right-to-left support isn’t working while using a default font-family like p { font-family: Times-Roman }. We’re working on fixing this. However, it works by using the @font-face tag in the CSS definition and defining a custom font. Therefore you need the specified font file. “MarkaziText” for example seems to work. Other fonts might work as well but haven’t been tested.

    https://xhtml2pdf.readthedocs.io/en/latest/reference.html#arabic-hebrew-persian-etc-fonts-support

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    ‼️ prio: high Needs to be resolved ASAP. 🔨 enhancement This improves an existing feature 😱 effort: high Big change, which requires >12h
    Projects
    None yet
    5 participants