Query the Unicode database from the commandline
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Hello Dec 28, 2018
LICENSE Hello Dec 28, 2018
README.markdown Search all words, instead of substring Jan 5, 2019
go.mod Hello Dec 28, 2018
html.go Hello Dec 28, 2018
uni.go Stable output order Jan 8, 2019
unidata.go Hello Dec 28, 2018

README.markdown

uni prints Unicode information about strings.

Install it with go get arp242.net/uni.

Identify a character:

$ uni identify €
     cpoint  dec    utf-8      html       name
'€'  U+20AC  8364   0xe282ac   €     EURO SIGN

Or an entire string. i is a shortcut for identify:

$ uni i h€łłø
     cpoint  dec    utf-8       html       name
'h'  U+0068  104    68          h     LATIN SMALL LETTER H
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN
'ł'  U+0142  322    c5 82       ł   LATIN SMALL LETTER L WITH STROKE
'ł'  U+0142  322    c5 82       ł   LATIN SMALL LETTER L WITH STROKE
'ø'  U+00F8  248    c3 b8       ø   LATIN SMALL LETTER O WITH STROKE

Identify byte offset from a file (useful for editor integration):

$ uni i 'README.markdown:#0'
      cpoint  dec    utf-8       html       name
 '`'  U+0060  96     60          `    GRAVE ACCENT

Or a range from a file:

$ uni i 'README.markdown:#0-4'
     cpoint  dec    utf-8       html       name
'`'  U+0060  96     60          `    GRAVE ACCENT
'u'  U+0075  117    75          u     LATIN SMALL LETTER U
'n'  U+006E  110    6e          n     LATIN SMALL LETTER N
'i'  U+0069  105    69          i     LATIN SMALL LETTER I
'`'  U+0060  96     60          `    GRAVE ACCENT

Note that these are byte offsets, not character offsets:

$ uni i 'README.markdown:#130'
uni: WARNING: input string is not valid UTF-8
     cpoint  dec    utf-8       html       name
'�'  U+FFFD  65533  ef bf bd    �   REPLACEMENT CHARACTER

$ uni i 'README.markdown:#130-132'
     cpoint  dec    utf-8       html       name
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN

Search description:

$ uni search euro
     cpoint  dec    utf-8      html       name
'₠'  U+20A0  8352   e2 82 a0    ₠   EURO-CURRENCY SIGN
'€'  U+20AC  8364   e2 82 ac    €     EURO SIGN
'𐡷'  U+10877 67703  f0 90 a1 b7 𐡷  PALMYRENE LEFT-POINTING FLEURON
'𐡸'  U+10878 67704  f0 90 a1 b8 𐡸  PALMYRENE RIGHT-POINTING FLEURON
'𐫱'  U+10AF1 68337  f0 90 ab b1 𐫱  MANICHAEAN PUNCTUATION FLEURON
'🌍'  U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🏤'  U+1F3E4 127972 f0 9f 8f a4 🏤  EUROPEAN POST OFFICE
'🏰'  U+1F3F0 127984 f0 9f 8f b0 🏰  EUROPEAN CASTLE
'💶'  U+1F4B6 128182 f0 9f 92 b6 💶  BANKNOTE WITH EURO SIGN

The s command is a shortcut for search. Multiple words are matched individually:

$ uni s earth globe
      cpoint  dec    utf-8       html       name
'🌍'  U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🌎'  U+1F30E 127758 f0 9f 8c 8e 🌎  EARTH GLOBE AMERICAS
'🌏'  U+1F30F 127759 f0 9f 8c 8f 🌏  EARTH GLOBE ASIA-AUSTRALIA

$ uni s globe earth
      cpoint  dec    utf-8       html       name
'🌍'  U+1F30D 127757 f0 9f 8c 8d 🌍  EARTH GLOBE EUROPE-AFRICA
'🌎'  U+1F30E 127758 f0 9f 8c 8e 🌎  EARTH GLOBE AMERICAS
'🌏'  U+1F30F 127759 f0 9f 8c 8f 🌏  EARTH GLOBE ASIA-AUSTRALIA

Use standard shell quoting for more literal matches:

$ uni s rightwards black arrow
     cpoint  dec    utf-8       html       name
'➡'  U+27A1  10145  e2 9e a1    ➡   BLACK RIGHTWARDS ARROW
'➤'  U+27A4  10148  e2 9e a4    ➤   BLACK RIGHTWARDS ARROWHEAD
'➥'  U+27A5  10149  e2 9e a5    ➥   HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW
[..]

$ uni s 'rightwards black arrow'
     cpoint  dec    utf-8       html       name
'⮕'  U+2B95  11157  e2 ae 95    ⮕   RIGHTWARDS BLACK ARROW