Skip to content

Accents, DIN 91379, non Latin scripts

Volker Kunert edited this page Nov 3, 2023 · 19 revisions

Correct positioning of accents

To process text containing letters composed of multiple Unicode glyphs e.g. letters with accents, it is necessary to compute the correct positioning of the glyphs and code this positions into the resulting PDF file.

OpenPDF can process such texts starting with release 1.3.24.

Internally OpenPDF uses Java2D builtin routines for glyph layout, reordering and substitution. For Java 9 and newer these routines rely on the HarfBuzz shaping library.

DIN 91379

We tested this approach with letters conforming to "DIN 91379: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" (and the predecessor DIN SPEC 91379) which describes a subset of Unicode consisting mainly of Latin letters and diacritic signs. This standard will be mandatory for the data exchange of the German administration with citizens and businesses from Nov. 2024.

Non-Latin scripts

The processing of text in other languages and scripts using this approach should be possible, you are invited to try it and share the results.

Usage

1. Step: Enable advanced glyph layout using kerning and ligatures

import com.lowagie.text.pdf.LayoutProcessor;
...
    LayoutProcessor.enableKernLiga();

Java's Bidi-class is used to deduce the text direction for each chunk of text.

Optionally it is possible to provide a different text direction for the whole text

    LayoutProcessor.enableKernLiga(java.awt.Font.LAYOUT_RIGHT_TO_LEFT);

2. Step: Provide an OpenType font

Provide an OpenType font containing the necessary characters and positioning information, e.g. a font from the Google Noto fonts.

    String fontFileName = "com/lowagie/examples/fonts/NotoSans-Regular.ttf";
    FontFactory.register(fontFileName, "sans");
    Font font = FontFactory.getFont("sans", BaseFont.IDENTITY_H, fontSize);

3. Step: Further processing

Process the document or form as usual.

Examples

Producing a document

GlyphLayoutDocumentDinSpec91379.java

Processing a form

GlyphLayoutFormDinSpec91379.java

References

  1. DIN 91379 (English Wikipedia)
  2. DIN 91379 (German Wikipedia)
  3. DIN 91379 Characters and Sequences (GitHub)
  4. String.Latin+ 1.2 (extended and commented version of DIN SPEC 91379 in German, free download)
  5. DIN SPEC 91379: Characters in Unicode for the electronic processing of names and data exchange in Europe (free download after registration)
  6. DIN 91379:2022-08: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM (access chargeable)
  7. Decision of IT Planungsrat 2022/51 (in German)
  8. Noto Latin, Greek, Cyrillic fonts, see Google, GitHub
  9. HarfBuzz text shaping library