Skip to content

Accents, DIN 91379, non Latin scripts

Volker Kunert edited this page Aug 12, 2021 · 19 revisions

Correct positioning of accents

To process text containing letters composed of multiple Unicode glyphs e.g. letters with accents, it is necessary to compute the correct positioning of the glyphs and code this positions into the resulting PDF file.

OpenPDF can process such texts starting with release 1.3.24.

Internally OpenPDF uses Java2D builtin routines for glyph layout, reordering and substitution. For Java 9 and newer these routines rely on the HarfBuzz shaping library.

DIN 91379

We tested this approach with letters conforming to "DIN 91379: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" (and the predecessor DIN SPEC 91379) which describes a subset of Unicode consisting mainly of Latin letters and diacritic signs. This standard will be mandatory for the data exchange of the German administration with citizens and businesses from Nov. 2024.

Non-Latin scripts

The processing of text in other languages and scripts using this approach should be possible, you are invited to try it and share the results.

Usage

1. Step: Enable advanced glyph layout

import com.lowagie.text.pdf.LayoutProcessor;
...
    LayoutProcessor.enable();

Optionally it is possible to provide a different text direction.

    LayoutProcessor.enable(java.awt.Font.LAYOUT_RIGHT_TO_LEFT);

2. Step: Provide a TrueType/OpenType font

Provide a TrueType/OpenType font containing the necessary characters and positioning information, e.g. a font from the Google Noto fonts.

    String fontFileName = "com/lowagie/examples/fonts/NotoSans-Regular.ttf";
    FontFactory.register(fontFileName, "sans");
    Font font = FontFactory.getFont("sans", BaseFont.IDENTITY_H, fontSize);

3. Step: Further processing

Process the document or form as usual.

Examples

Producing a document

GlyphLayoutDocumentDinSpec91379.java

Processing a form

GlyphLayoutFormDinSpec91379.java

References

  1. String.Latin+ 1.2 (extended and commented version of DIN SPEC 91379 in German)
  2. DIN SPEC 91379: Characters in Unicode for the electronic processing of names and data exchange in Europe (free download after registration)
  3. DIN 91379: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM (Draft can be read after registration)
  4. DIN 91379:2021-07 - Draft: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM
  5. Decision of IT Planungsrat 2019/53 (in German)
  6. Noto fonts, see Google, GitHub
  7. HarfBuzz text shaping library