Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of the first column #1

Closed
mathiasbynens opened this issue Mar 17, 2017 · 3 comments
Closed

Meaning of the first column #1

mathiasbynens opened this issue Mar 17, 2017 · 3 comments

Comments

@mathiasbynens
Copy link

The README says:

The command analyses the input and
prints then three columns: the raw byte count of the first codepoint in this
row […]

However, from the examples, and from the actual output, this doesn’t seem accurate.

@Boldewyn
Copy link
Contributor

You're right, there was an offset error for multibyte characters. Fixed in v1.1.1. Apart from that, it should work as advertized. E.g., the example from the README:

  echo -n 'ABCDEFGHIJKLMNOP' | unidump -n 4
        0    0041 0042 0043 0044    ABCD
        4    0045 0046 0047 0048    EFGH
        8    0049 004A 004B 004C    IJKL
       12    004D 004E 004F 0050    MNOP

The zero-based byte offset in the UTF-8 string is 0 before A, 4 before E etc. Did I inadvertently formulate that ambiguously in the usage message?

(By the way: I haven't forgotten the Unicode v10 update...)

@mathiasbynens
Copy link
Author

I meant that “the raw byte count of the first codepoint [sic]” makes it sound like it counts the byte length of the first code point on that row. Instead, it seems to display the index of the first code point on that row.

@Boldewyn
Copy link
Contributor

Good point, thanks! I'll rephrase it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants