Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PDF/A documents #132

Closed
hochleitner opened this issue Nov 4, 2021 · 12 comments · Fixed by #160
Closed

Create PDF/A documents #132

hochleitner opened this issue Nov 4, 2021 · 12 comments · Fixed by #160
Assignees
Labels
enhancement Issue or PR proposing an enhancement of a current features. help wanted Issue or PR where help from outside is wanted.
Milestone

Comments

@hochleitner
Copy link
Member

New efforts on archiving and publishing theses (throughout the whole FH OÖ) have brought up the requirement that archived theses should be in PDF/A format to prevent modification. We should therefore update the PDF generation process to make this possible with the template.

I suggest adding an additional package parameter since it is probably not always desired to create a PDF/A.

It has not been decided which PDF/A standard will be required (there are PDF/A-1 to PDF/A-4 with different subtypes (a, b, c, etc.). We should probably check which ones are feasible and which ones aren't (the accessible versions, e.g., PDF/A-1a, will be problematic with LaTeX as far as I can tell). There might still be potential to influence the final decision if we bring up valid technical requirements/limitations.

@hochleitner hochleitner self-assigned this Nov 4, 2021
@hochleitner
Copy link
Member Author

@rru-hgb, if you have updates or input on the requirements, please let us know.

@hochleitner hochleitner added enhancement Issue or PR proposing an enhancement of a current features. help wanted Issue or PR where help from outside is wanted. labels Nov 5, 2021
@imagingbook
Copy link
Collaborator

imagingbook commented Feb 19, 2023

@hochleitner
I tried an initial setup for creating PDF-A files in a new branch pdf-a (commit daddd53), for document HgbThesisTutorialEN. It uses the pdfx package, creating PDF/A-2b files. Most things went smoothly, surprisingly all figures ran through without problems, except for the following:

  • The hyperref setup needs to be changed slightly, I deactivated warnings that are unavoidable.
  • The \euro character has a problem with dimensions, I deactivated it (needs to be fixed).
  • The included fragebogen.pdf was not compliant, I replaced it with a fixed version.
  • I set the PDF minor output version to 7 to stop some warnings with included figures.

One not so elegant aspect is that the PDF metadata need to be contained in a separate file (main.xmpdata), which is created in the preamble of main.tex (before \begin{document}). The associated entries currently cannot be filled in automatically from the author/title definitions, since these are defined later. Can this be changed?

Some relevant links:

One should also look into hyperxmp as an alternative (see https://tex.stackexchange.com/questions/150221/pdfx-package-leads-to-non-working-hyperref-links).

@hochleitner hochleitner added this to the Release 2024 milestone Feb 20, 2023
@hochleitner
Copy link
Member Author

I'll have a look at it asap. I've read up on the topic as well, and it seems a whole bunch of things is involved. Metadata, color intents, oof.

I've added a new Release 2024 milestone, where I'll add all the relevant issues for next year's release. I think a 1-year release cycle with a target of the end of February might work well. If necessary, we can always add a fall release.

@imagingbook
Copy link
Collaborator

Great. In the meantime I checked hyperxmp -- unfortunately, it does not work as expected and I have no idea how to fix it.
However, both hyperxmp and pdfx may be obsolete anyway, because some recent additions to LaTeX itself make PDF-A creation a lot easier. I am currently testing this, looks good sofar ...

@imagingbook
Copy link
Collaborator

imagingbook commented Feb 21, 2023

Just pushed a new and IMO much better variant of PDF/A generation in branch pdf-a-l3. It is based on the forthcoming LaTeX kernel functions for PDF management and extremely simple to use. The only caveat is that Overleaf has no recent version of the pdfmanagement-testphase package (version 0.95s or higher is needed) and thus is not compliant yet. But this will change and I think this is the right path to go.

  • I had to fix some minor issues with hyperref (what else?) but all in all the setup works as before.
  • The eurosym Euro symbol is corrupted (font metric error), replaced the package by marvosym.
  • There is a new package hgbpdfa.sty with only a few lines now, but I thought is would be easier to maintain if anything else (color profiles etc.) needs to be added later.
  • hgbpdfa.sty needs to be loaded before the \begin{document} command, i.e., before anything is written to the PDF! It is thus not possible to use a document option. Instead, users must comment out a single line if they do not want PDF/A compliance.
  • Generally, I think we should make PDF/A the default in all documents, I see no reason who not to use it.
  • I added a section in the tutorial (EN only) under "Printing", including hints for validating PDFs. Also added stuff to the manual.

@hochleitner Pls. look at it carefully. If adopted, what else remains to do:

  • Copy all style/class files from TutorialEN to dev.
  • Update the German tutorial, including new screenshot.
  • Add PDF/A to other documents.

Here is useful link: https://ctan.org/tex-archive/macros/latex/contrib/pdfmanagement-testphase

@imagingbook
Copy link
Collaborator

imagingbook commented Feb 23, 2023

The remaining points are completed (translated to TutorialDE, files copied to dev.
All documents (except the article) are now set up to produce PDF/A.
Note: The report-based docs had to be modified (author/title definitions moved before \begin{document} to avoid hyperref errors). Perhaps this should be looked into again.
Made a full rebuild, validated all PDFs. Checked with Overleaf (currently throws a warning, no PDF/A is created).

TODO: mention PDF/A in README, add link to online validator.

@imagingbook
Copy link
Collaborator

imagingbook commented Feb 23, 2023

I fixed files hgbarticle.cls and hgbreport.cls to allow author/title declaration after \begin{document}, by adding the hypersetup to the maketitle hook. All affected documents were reverted. Also, I moved \RequirePackage[utf8]{inputenc} to the top of all documents.

Removed remaining ocurrances of \citenobr.

Added a short section on PDF/A and links in README.md.

Plus another full rebuild. Everything looks good now, pls. check the PR.

@imagingbook
Copy link
Collaborator

@hochleitner
We should do a repo cleanup soon! It currently has ca. 650MB (IMO too big for cloning). Just tried: it can be reduced to 54MB by removing old PDF and ZIP files.

@hochleitner
Copy link
Member Author

Okay, it took me ages to finally check it all out - sorry for the huge delay.

Here are my thoughts:

  • First of all, thanks for all the experiments; that's quite some stuff to read.
  • I agree; we should go with the L3 features. It is the most promising, simple, and future-proof version.
  • The pdfmanagement-testphase issue with Overleaf is problematic, but I think we could add a recent version to our latex-foreign folder for now so that projects on Overleaf have a current version. This package changes quite often (we're at 0.95x now), but having a minimum version present would solve the issue on Overleaf. A new TeXlive release will happen in late summer, and having the main branch not working on Overleaf is something we most definitely don't want.
  • Yes, we should make PDF/A the default and not even give people the option to choose. I see no obvious downside except that people might have to fiddle with included graphics. But if we provide an easy way to turn this off, people will turn it off to make it easy. We can give some tips in the wiki on how to deal with included files.
  • The eurosym issue is funny, considering that we used to have marvosym (I just removed traces of it while reworking the tutorial documents), and now we're back again. 😬

I wonder how much effort it is to reach PDF/A-2a. Because proper accessibility is something we need to tackle sooner or later, maybe the l3 functions will improve this workflow too.

@imagingbook
Copy link
Collaborator

In the latest commit (b6d93dc) I tested the idea of making local copies of the recent pdfmanagement-testphase files inside the project directory. It requires the following 8 files:

pdfmanagement-testphase/pdfmanagement-testphase.sty
pdfmanagement-testphase/pdfmanagement-testphase.ltx
pdfmanagement-testphase/l3backend-testphase-pdftex.def
pdfmanagement-testphase/pdfmanagement-firstaid.sty
pdfmanagement-testphase/l3ref-tmp.sty
tagpdf/tagpdf-base.sty
l3experimental/l3bitset/l3bitset.sty
l3backend/l3backend-pdftex.def
latex-lab/documentmetadata-support.ltx

This works locally as expected. Then uploaded a zipped version to Overleaf, which immediately complains that some of the files cannot be parsed(!). The resulting output is PDF/A (at least Acrobat Reader says so, not tested for actual compliance) but contains garbage before the document starts.

In summary, I do not think this a viable option. The project directory is messed up and it still does not work. We should wait for Overleaf to update their LaTeX environment.

@hochleitner
Copy link
Member Author

So, I deliberately waited this long to add something to the issue of PDF-A creation (cough, cough), but at least there is good news. Overleaf now includes pdfmanagement-testphase 0.95x, so the PDF-A creation runs through fine for me.
I uploaded the HgbThesisTutorialDE folder from the #160 PR, producing a valid PDF-A 2B without errors. Both Acrobat and online PDF-A validators confirm it.

So, it seems we should be able to merge #160 and make PDF-A generation a default in the main branch. Could you please test this again yourself just to make sure I did not overlook something? It is a significant change after all.

@hochleitner hochleitner linked a pull request Oct 25, 2023 that will close this issue
@imagingbook
Copy link
Collaborator

Looks good! I tested the PDF/A setup with the most recent MikTeX update and also on Overleaf. I suppose we can merge the pdf-a-l3 branch ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue or PR proposing an enhancement of a current features. help wanted Issue or PR where help from outside is wanted.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants