Skip to content
Query the Unicode database from the commandline
Go Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
isatty
terminal
.gitignore
LICENSE
README.markdown
data.go
dmenu-uni
gen.go
gen_entities.go
gen_unidata.go
go.mod
go.sum
print.go
uni.go

README.markdown

This project is considered experimental

uni queries the Unicode database from the commandline.

There are three commands: identify to print Unicode information about a string, search to search for codepoints, and print to print groups of Unicode classes.

Install it with go get arp242.net/uni, which will put the binary at ~/go/bin/uni.

Integrations

There is a dmenu example script at dmenu-uni. This can also be used with rofi or similar programs.

You can add a :UnicodeName command to Vim with:

command! UnicodeName echo
        \ system('uni -q i',
        \      [strcharpart(strpart(getline('.'), col('.') - 1), 0, 1)]
        \ )[:-2]

Usage

Identify a character:

$ uni identify €
     cpoint  dec    utf-8      html       name
'€'  U+20AC  8364   0xe282ac   €     EURO SIGN

Or an entire string. i is a shortcut for identify:

$ uni i h€łłø
     cpoint  dec    utf-8       html       name
'h'  U+0068  104    68          h     LATIN SMALL LETTER H
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN
'ł'  U+0142  322    c5 82       ł   LATIN SMALL LETTER L WITH STROKE
'ł'  U+0142  322    c5 82       ł   LATIN SMALL LETTER L WITH STROKE
'ø'  U+00F8  248    c3 b8       ø   LATIN SMALL LETTER O WITH STROKE

Identify byte offset from a file (useful for editor integration):

$ uni i 'README.markdown:#0'
      cpoint  dec    utf-8       html       name
 '`'  U+0060  96     60          `    GRAVE ACCENT

Or a range from a file:

$ uni i 'README.markdown:#0-4'
     cpoint  dec    utf-8       html       name
'`'  U+0060  96     60          `    GRAVE ACCENT
'u'  U+0075  117    75          u     LATIN SMALL LETTER U
'n'  U+006E  110    6e          n     LATIN SMALL LETTER N
'i'  U+0069  105    69          i     LATIN SMALL LETTER I
'`'  U+0060  96     60          `    GRAVE ACCENT

Note that these are byte offsets, not character offsets:

$ uni i 'README.markdown:#130'
uni: WARNING: input string is not valid UTF-8
     cpoint  dec    utf-8       html       name
'�'  U+FFFD  65533  ef bf bd    �   REPLACEMENT CHARACTER

$ uni i 'README.markdown:#130-132'
     cpoint  dec    utf-8       html       name
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN

Search description:

$ uni search euro
     cpoint  dec    utf-8       html       name
'₠'  U+20A0  8352   e2 82 a0    ₠   EURO-CURRENCY SIGN (Currency_Symbol)
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN (Currency_Symbol)
'𐡷'  U+10877 67703  f0 90 a1 b7 𐡷  PALMYRENE LEFT-POINTING FLEURON (Other_Symbol)
'𐡸'  U+10878 67704  f0 90 a1 b8 𐡸  PALMYRENE RIGHT-POINTING FLEURON (Other_Symbol)
'𐫱'  U+10AF1 68337  f0 90 ab b1 𐫱  MANICHAEAN PUNCTUATION FLEURON (Other_Punctuation)
'🌍' U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA (Other_Symbol)
'🏤' U+1F3E4 127972 f0 9f 8f a4 🏤  EUROPEAN POST OFFICE (Other_Symbol)
'🏰' U+1F3F0 127984 f0 9f 8f b0 🏰  EUROPEAN CASTLE (Other_Symbol)
'💶' U+1F4B6 128182 f0 9f 92 b6 💶  BANKNOTE WITH EURO SIGN (Other_Symbol)

The s command is a shortcut for search. Multiple words are matched individually:

$ uni s earth globe
     cpoint  dec    utf-8       html       name
'🌍' U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA (Other_Symbol)
'🌎' U+1F30E 127758 f0 9f 8c 8e 🌎  EARTH GLOBE AMERICAS (Other_Symbol)
'🌏' U+1F30F 127759 f0 9f 8c 8f 🌏  EARTH GLOBE ASIA-AUSTRALIA (Other_Symbol)

$ uni s globe earth
      cpoint  dec    utf-8       html       name
'🌍'  U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🌎'  U+1F30E 127758 f0 9f 8c 8e 🌎  EARTH GLOBE AMERICAS
'🌏'  U+1F30F 127759 f0 9f 8c 8f 🌏  EARTH GLOBE ASIA-AUSTRALIA

Use standard shell quoting for more literal matches:

$ uni s rightwards black arrow
     cpoint  dec    utf-8       html       name
'➡'  U+27A1  10145  e2 9e a1    ➡   BLACK RIGHTWARDS ARROW
'➤'  U+27A4  10148  e2 9e a4    ➤   BLACK RIGHTWARDS ARROWHEAD
'➥'  U+27A5  10149  e2 9e a5    ➥   HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW
[..]

$ uni s 'rightwards black arrow'
     cpoint  dec    utf-8       html       name
'⮕'  U+2B95  11157  e2 ae 95    ⮕   RIGHTWARDS BLACK ARROW

The print command (shortcut p) can be used to print specific codepoints or groups of codepoints:

$ uni print U+2042
     cpoint  dec    utf-8       html       name
'⁂'  U+2042  8258   e2 81 82    ⁂   ASTERISM (Other_Punctuation)

General category:

$ uni p Po
     cpoint  dec    utf-8       html       name
'!'  U+0021  33     21          !     EXCLAMATION MARK (Other_Punctuation)
'"'  U+0022  34     22          "     QUOTATION MARK (Other_Punctuation)
'#'  U+0023  35     23          #      NUMBER SIGN (Other_Punctuation)
[..]

Blocks:

$ uni p arrows 'box drawing'
     cpoint  dec    utf-8       html       name
'←'  U+2190  8592   e2 86 90    ←     LEFTWARDS ARROW (Math_Symbol)
'↑'  U+2191  8593   e2 86 91    ↑     UPWARDS ARROW (Math_Symbol)
'→'  U+2192  8594   e2 86 92    →     RIGHTWARDS ARROW (Math_Symbol)
'↓'  U+2193  8595   e2 86 93    ↓     DOWNWARDS ARROW (Math_Symbol)
[..]
'─'  U+2500  9472   e2 94 80    ─     BOX DRAWINGS LIGHT HORIZONTAL (Other_Symbol)
'━'  U+2501  9473   e2 94 81    ━   BOX DRAWINGS HEAVY HORIZONTAL (Other_Symbol)
'│'  U+2502  9474   e2 94 82    │     BOX DRAWINGS LIGHT VERTICAL (Other_Symbol)
'┃'  U+2503  9475   e2 94 83    ┃   BOX DRAWINGS HEAVY VERTICAL (Other_Symbol)
[..]
You can’t perform that action at this time.