Skip to content
Implementation of the Unicode Standards Annex #9's bidirectional text algorithm
Common Lisp
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
BidiBrackets.txt
BidiCharacterTest.txt
BidiMirroring.txt
BidiTest.txt
DerivedBidiClass.txt
LICENSE
README.mess
compile.lisp
database.lisp
documentation.lisp
isolating-run-sequence.lisp
package.lisp
staple.ext.lisp
status-stack.lisp
test.lisp
uax-9-test.asd
uax-9.asd
uax-9.lisp

README.mess

## About UAX-9
This is an implementation of the "Unicode Standards Annex #9"(https://www.unicod.org/reports/tr9/)'s bidirectional text algorithm. It provides a convenient way to handle text bidirectionality.

Bidirectional text occurs when text of different directionality is mixed. For instance, if arabic text, which is typically right-to-left, intersperses roman numerals, which is typically left-to-right, then the roman numerals need to be rendered in reversed order to produce the correct display order. 

The Unicode Bidirectional algorithm implemented by this library handles the reordering of such text into a canonical order that can then be used by text rendering engines to produce correctly laid out text.

Note that this algorithm does not analyse line breaking. You must provide the appropriate line breaking opportunities yourself, see "UAX-14"(https://shinmera.github.io/uax-14/). The algorithm will also not handle paragraph breaks, but instead expects you to deliver properly segmented strings for analysis.

## How To
The system will compile binary database files on first load. Should anything go wrong during this process, a note is produced on load. If you would like to prevent this automated loading, push ``uax-9-no-load`` to ``*features*`` before loading. You can then manually load the database files when convenient through ``load-databases``.

Once loaded, you can compute the line breaking levels of a string with the ``levels`` function. To use this information and produce a reordering index vector, pass its result to the ``reorder`` function. Note that when iterating through these characters, the level of the character needs to be taken into consideration, as some characters need to be mirrored when right-to-left oriented. You can detect right-to-left levels by testing whether they are odd. You can then retrieve the potentially mirrored variant of the character through ``mirror-at``.

Alternatively you can also iterate through the string directly in the correct character order (including mirroring) using ``do-in-order``. Also note that some characters will require manual mirroring in the rendering engine as no equivalent mirrored characters exist in Unicode.

## External Files
The following files were retrieved from external resources, last accessed on 4.9.2019.

- ``BidiBrackets.txt`` https://www.unicode.org/Public/UCD/latest/ucd/BidiBrackets.txt
- ``BidiCharacterTest.txt`` https://www.unicode.org/Public/UCD/latest/ucd/BidiCharacterTest.txt
- ``BidiMirroring.txt`` https://www.unicode.org/Public/UCD/latest/ucd/BidiMirroring.txt
- ``BidiTest.txt`` https://www.unicode.org/Public/UCD/latest/ucd/BidiTest.txt
- ``DerivedBidiClass.txt`` https://www.unicode.org/Public/UCD/latest/ucd/DerivedBidiClass.txt

At the time, Unicode 12.1 was considered the latest version.
You can’t perform that action at this time.