# Numerals in Unicode

I want to see a table of all characters in Unicode that have an integer value.

To do this, I have parsed the [Unicode Character Database][UCD] and created my own over-engineered classes to represent the data. I did not use Python's built-in [`unicodedata`][unicodedata] module, because I wanted data from the latest standard. I also wanted script data, which is not present in the `unicodedata` module.

[UCD]: https://www.unicode.org/reports/tr44/
[unicodedata]: https://docs.python.org/3/library/unicodedata.html

In [1]:
from codepoint import Codepoint

def desired_numeral(cp):
    # NOTE: there will be mismatches if the General_Category is obtained from unicodedata
    # and not the downloaded UnicodeData.txt file.
    return cp.general_category[0] == "N" and bool(cp.numeric_type) and cp.decomposition == ""

columns = ["Codepoint", "Character", "Name", "GC", "Script", "Bidi", "Type", "Value"]
records = [(str(cp), cp.character, cp.name, cp.general_category, cp.script, cp.bidirectional_class, cp.numeric_type, value)
           for cp in Codepoint.iterate_all_codepoints()
           if desired_numeral(cp) and isinstance((value := cp.numeric_value), int)]

In [2]:
import pandas as pd

df = pd.DataFrame.from_records(records, columns=columns, index=["Codepoint"])

## Table

Here are _all_ of the numerals with an assigned integer value in Unicode (except for compatibility forms).

In [3]:
pd.set_option('display.max_rows', None)
df

Unnamed: 0_level_0,Character,Name,GC,Script,Bidi,Type,Value
Codepoint,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
U+0030,0,DIGIT ZERO,Nd,Common,EN,Decimal,0
U+0031,1,DIGIT ONE,Nd,Common,EN,Decimal,1
U+0032,2,DIGIT TWO,Nd,Common,EN,Decimal,2
U+0033,3,DIGIT THREE,Nd,Common,EN,Decimal,3
U+0034,4,DIGIT FOUR,Nd,Common,EN,Decimal,4
U+0035,5,DIGIT FIVE,Nd,Common,EN,Decimal,5
U+0036,6,DIGIT SIX,Nd,Common,EN,Decimal,6
U+0037,7,DIGIT SEVEN,Nd,Common,EN,Decimal,7
U+0038,8,DIGIT EIGHT,Nd,Common,EN,Decimal,8
U+0039,9,DIGIT NINE,Nd,Common,EN,Decimal,9


## See also

 - [Numerals in Unicode (Wikipedia)](https://en.wikipedia.org/wiki/Numerals_in_Unicode)
 - [Unicode Character Database (data files)](https://www.unicode.org/Public/14.0.0/ucd/)