Meaning of the first column #1

mathiasbynens · 2017-03-17T22:53:01Z

The README says:

The command analyses the input and
prints then three columns: the raw byte count of the first codepoint in this
row […]

However, from the examples, and from the actual output, this doesn’t seem accurate.

Boldewyn · 2017-03-18T20:38:39Z

You're right, there was an offset error for multibyte characters. Fixed in v1.1.1. Apart from that, it should work as advertized. E.g., the example from the README:

  echo -n 'ABCDEFGHIJKLMNOP' | unidump -n 4
        0    0041 0042 0043 0044    ABCD
        4    0045 0046 0047 0048    EFGH
        8    0049 004A 004B 004C    IJKL
       12    004D 004E 004F 0050    MNOP

The zero-based byte offset in the UTF-8 string is 0 before A, 4 before E etc. Did I inadvertently formulate that ambiguously in the usage message?

(By the way: I haven't forgotten the Unicode v10 update...)

mathiasbynens · 2017-03-21T06:13:03Z

I meant that “the raw byte count of the first codepoint [sic]” makes it sound like it counts the byte length of the first code point on that row. Instead, it seems to display the index of the first code point on that row.

Boldewyn · 2017-03-21T07:44:03Z

Good point, thanks! I'll rephrase it.

Boldewyn closed this as completed in f4e6ca1 Mar 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meaning of the first column #1

Meaning of the first column #1

mathiasbynens commented Mar 17, 2017

Boldewyn commented Mar 18, 2017

mathiasbynens commented Mar 21, 2017

Boldewyn commented Mar 21, 2017

Meaning of the first column #1

Meaning of the first column #1

Comments

mathiasbynens commented Mar 17, 2017

Boldewyn commented Mar 18, 2017

mathiasbynens commented Mar 21, 2017

Boldewyn commented Mar 21, 2017