# [Text Processing Services](https://docs.python.org/3/library/text.html)
## [unicodedata — Unicode Database](https://docs.python.org/3/library/unicodedata.html)

This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 12.1.0.

The module uses the same names and symbols as defined by Unicode Standard Annex #44, [Unicode Character Database](https://www.unicode.org/reports/tr44/). It defines the following functions:

In [3]:
import traceback as tb
import unicodedata

**_unicodedata.lookup(name)_**

Look up character by name. If a character with the given name is found, return the corresponding character. If not found, KeyError is raised.

In [8]:
try:
    print(unicodedata.lookup('LEFT CURLY BRACKET'))
except KeyError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

{


**_unicodedata.name(chr[, default])_**

Returns the name assigned to the character *chr* as a string. If no name is defined, default is returned, or, if not given, *ValueError* is raised.

In [14]:
try:
    print(unicodedata.name('\\'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

REVERSE SOLIDUS


**_unicodedata.decimal(chr[, default])_**

Returns the decimal value assigned to the character *chr* as integer. If no such value is defined, default is returned, or, if not given, *ValueError* is raised.

In [24]:
try:
    print(unicodedata.decimal('3'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

3


In [22]:
try:
    print(unicodedata.decimal('a'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

ValueError('not a decimal')


  File "<ipython-input-22-ac850855e1c4>", line 2, in <module>
    print(unicodedata.decimal('a'))


**_unicodedata.digit(chr[, default])_**

Returns the digit value assigned to the character *chr* as integer. If no such value is defined, default is returned, or, if not given, *ValueError* is raised.

In [20]:
try:
    print(unicodedata.digit('9'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

9


In [18]:
try:
    print(unicodedata.digit('a'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

ValueError('not a digit')


  File "<ipython-input-18-b6768cae64ec>", line 2, in <module>
    print(unicodedata.digit('a'))


**_unicodedata.numeric(chr[, default])_**

Returns the numeric value assigned to the character *chr* as float. If no such value is defined, default is returned, or, if not given, *ValueError* is raised.

In [25]:
try:
    print(unicodedata.numeric('9'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

9.0


In [27]:
try:
    print(unicodedata.numeric('a'))
except ValueError as e:
    print(repr(e))
    tb.print_tb(e.__traceback__)

ValueError('not a numeric character')


  File "<ipython-input-27-2be3e71a76d9>", line 2, in <module>
    print(unicodedata.numeric('a'))


**_unicodedata.category(chr)_**

Returns the general category assigned to the character *chr* as string.

In [28]:
unicodedata.category('A')  # 'L'etter, 'u'ppercase

'Lu'

In [30]:
unicodedata.category('a')

'Ll'

In [35]:
unicodedata.category('2')

'Nd'

In [32]:
unicodedata.category('?')

'Po'

**_unicodedata.bidirectional(chr)_**

Returns the bidirectional class assigned to the character *chr* as string. If no such value is defined, an empty string is returned.

In [36]:
unicodedata.bidirectional('\u0660') # 'A'rabic, 'N'umber

'AN'

In [41]:
'\u0660'

'٠'

**_unicodedata.combining(chr)_**

Returns the canonical combining class assigned to the character *chr* as integer. Returns 0 if no combining class is defined.

In [43]:
unicodedata.combining('A')

0

**_unicodedata.east_asian_width(chr)_**

Returns the east asian width assigned to the character chr as string.

**_unicodedata.mirrored(chr)_**

Returns the mirrored property assigned to the character *chr* as integer. Returns 1 if the character has been identified as a “mirrored” character in bidirectional text, 0 otherwise.

**_unicodedata.decomposition(chr)_**

Returns the character decomposition mapping assigned to the character *chr* as string. An empty string is returned in case no such mapping is defined.

In [48]:
'\ufb01'

'ﬁ'

In [47]:
unicodedata.decomposition('\ufb01')

'<compat> 0066 0069'

In [49]:
'\u0066' + '\u0069'

'fi'

**_unicodedata.normalize(form, unistr)_**

Return the normal form form for the Unicode string *unistr*. Valid values for form are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

In [50]:
s = '\ufb01' # A single character
unicodedata.normalize('NFKC', s)

'fi'

**_unicodedata.is_normalized(form, unistr)_**

Return whether the Unicode string unistr is in the normal form *form*. Valid values for *form* are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’.

In [52]:
unicodedata.is_normalized('NFKD', s)

False

In [53]:
unicodedata.is_normalized('NFKD', unicodedata.normalize('NFKC', s))

True

In addition, the module exposes the following constant:

**_unicodedata.unidata_version_**

The version of the Unicode database used in this module.

In [54]:
unicodedata.unidata_version

'12.1.0'

**_unicodedata.ucd_3_2_0_**

This is an object that has the same methods as the entire module, but uses the Unicode database version 3.2 instead, for applications that require this specific version of the Unicode database (such as IDNA).