Skip to content

Accents, DIN 91379, non Latin scripts

Volker Kunert edited this page Dec 27, 2023 · 19 revisions

Correct positioning of accents

To process text containing letters composed of multiple Unicode glyphs e.g. letters with accents, it is necessary to compute the correct positioning of the glyphs and code this positions into the resulting PDF file.

OpenPDF can process such texts starting with release 1.3.24.

Internally OpenPDF uses Java2D builtin routines for glyph layout, reordering and substitution. For Java 9 and newer these routines rely on the HarfBuzz shaping library.

DIN 91379

We tested this approach with letters conforming to "DIN 91379: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" (and the predecessor DIN SPEC 91379) which describes a subset of Unicode consisting mainly of Latin letters and diacritic signs. This standard will be mandatory for the data exchange of the German administration with citizens and businesses from Nov. 2024.

Non-Latin scripts

The processing of text in other languages and scripts using this approach should be possible, you are invited to try it and share the results.

Usage

1. Step: Enable advanced glyph layout using kerning and ligatures (since 1.3.31)

import com.lowagie.text.pdf.LayoutProcessor;
...
    LayoutProcessor.enableKernLiga(); // since 1.3.31

For versions before 1.3.31 kerning and ligatures are not supported.

    LayoutProcessor.enable(); // before 1.3.31

2. Step: Provide an OpenType font

Provide an OpenType font containing the necessary characters and positioning information, e.g. a font from the Google Noto fonts.

    float fontSize = 12.0f;
    String fontDir = "com/lowagie/examples/fonts/";
    FontFactory.register(fontDir+"noto/NotoSans-Regular.ttf", "notoSans");
    Font notoSans = FontFactory.getFont("notoSans", BaseFont.IDENTITY_H, true, fontSize);

Java's Bidi-class is used to deduce the text direction for each chunk of text. Optionally you can specify the text direction per font.

    Font notoSansArabic = getFont(fontDir+"noto/NotoSansArabic-Regular.ttf", "notoSansArabic", fontSize);
    LayoutProcessor.setRunDirectionRtl(notoSansArabic);

3. Step: Further processing

Process the document or form as usual.

Examples

Producing a document

GlyphLayoutDocumentDin91379.java

Processing a form

GlyphLayoutFormDin91379.java

Producing a document with bidirectional text

GlyphLayoutDocumentBidi.java

Specify direction per font

GlyphLayoutDocumentBidiPerFont.java

Specify kerning and ligatures per font

GlyphLayoutDocumentKernLigaPerFont.java

References

  1. DIN 91379 (English Wikipedia)
  2. DIN 91379 (German Wikipedia)
  3. DIN 91379 Characters and Sequences (GitHub)
  4. String.Latin+ 1.2 (extended and commented version of DIN SPEC 91379 in German, free download)
  5. DIN SPEC 91379: Characters in Unicode for the electronic processing of names and data exchange in Europe (free download after registration)
  6. DIN 91379:2022-08: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM (access chargeable)
  7. Decision of IT Planungsrat 2022/51 (in German)
  8. Noto Latin, Greek, Cyrillic fonts, see Google, GitHub
  9. HarfBuzz text shaping library