Skip to content

Latest commit

 

History

History
110 lines (70 loc) · 3.11 KB

charmap.rst

File metadata and controls

110 lines (70 loc) · 3.11 KB

camel_tools.utils.charmap

Contains the CharMapper class for mapping characters in a Unicode string to other strings.

Classes

camel_tools.utils.charmap.CharMapper

camel_tools.utils.charmap.InvalidCharMapKeyError

camel_tools.utils.charmap.BuiltinCharMapNotFoundError

JSON File Structure

JSON files to be used with CharMapper should have the following format:

{
    "default": "",

    "charMap": {
        "a": "z",
        "b-g": "",
        "x-z": null
    }
}

The root object in the file should be a dictionary with two keys: 'default' and 'charMap'. These correspond to and follow the same restrictions as the respective input parameters to the CharMapper constructor (with null in the JSON file corresponding to None in Python).

Built-in mappings

Below is a listing of built-in mappings:

Arabic Transliteration

  • ar2bw Transliterates Arabic text to Buckwalter scheme.
  • ar2safebw Transliterates Arabic text to Safe Buckwalter scheme.
  • ar2xmlbw Transliterates Arabic text to XML Buckwalter scheme.
  • ar2hsb Transliterates Arabic text to Habash-Soudi-Buckwalter scheme.
  • bw2ar Transliterates Buckwalter scheme text to Arabic.
  • bw2safebw Transliterates Buckwalter scheme text to Safe Buckwalter scheme.
  • bw2xmlbw Transliterates Buckwalter scheme text to XML Buckwalter scheme.
  • bw2hsb Transliterates Buckwalter scheme text to Habash-Soudi-Buckwalter scheme.
  • safebw2ar Transliterates Safe Buckwalter scheme text to Arabic.
  • safebw2bw Transliterates Safe Buckwalter scheme text to Buckwalter scheme.
  • safebw2xmlbw Transliterates Safe Buckwalter scheme text to XML Buckwalter scheme.
  • safebw2hsb Transliterates Safe Buckwalter scheme text to Habash-Soudi-Buckwalter scheme.
  • xmlbw2ar Transliterates XML Buckwalter Scheme text to Arabic.
  • xmlbw2bw Transliterates XML Buckwalter Scheme text to Buckwalter scheme.
  • xmlbw2safebw Transliterates XML Buckwalter Scheme text to Safe Buckwalter scheme.
  • xmlbw2hsb Transliterates XML Buckwalter Scheme text to Habash-Soudi-Buckwalter scheme.
  • hsb2ar Transliterates Habash-Soudi-Buckwalter scheme text to Arabic.
  • hsb2bw Transliterates Habash-Soudi-Buckwalter scheme text to Buckwalter scheme.
  • hsb2safebw Transliterates Habash-Soudi-Buckwalter scheme text to Safe Buckwalter scheme.
  • hsb2xmlbw Transliterates Habash-Soudi-Buckwalter scheme text to XML Buckwalter scheme.

See ../../reference/encoding_schemes for more information on Arabic encoding schemes.

Utility

  • arclean Cleans Arabic text by

    • Deleting characters that are not in Arabic, ASCII, or Latin-1.
    • Converting all spacing characters to an ASCII space character.
    • Converting Indic digits into Arabic digits.
    • Converting extended Arabic letters into basic Arabic letters.
    • Converting 1-char presentation froms into simple basic forms.