A high-performance Go package for measuring the monospace display width of strings, UTF-8 bytes, and runes.
go get github.com/clipperhouse/displaywidth
package main
import (
"fmt"
"github.com/clipperhouse/displaywidth"
)
func main() {
width := displaywidth.String("Hello, 世界!")
fmt.Println(width)
width = displaywidth.Bytes([]byte("🌍"))
fmt.Println(width)
width = displaywidth.Rune('🌍')
fmt.Println(width)
}
You can specify East Asian Width and Strict Emoji Neutral settings. If
unspecified, the default is EastAsianWidth: false, StrictEmojiNeutral: true
.
options := displaywidth.Options{
EastAsianWidth: true,
StrictEmojiNeutral: false,
}
width := options.String("Hello, 世界!")
fmt.Println(width)
This package implements the Unicode East Asian Width standard (UAX #11) and is
intended to be compatible with go-runewidth
. It operates on bytes without
decoding runes for better performance.
Part of my motivation is the insight that we can avoid decoding runes for better performance.
go test -bench=. -benchmem
goos: darwin
goarch: arm64
pkg: github.com/clipperhouse/displaywidth
cpu: Apple M2
BenchmarkStringDefault/displaywidth-8 10537 ns/op 160.10 MB/s 0 B/op 0 allocs/op
BenchmarkStringDefault/go-runewidth-8 14162 ns/op 119.12 MB/s 0 B/op 0 allocs/op
BenchmarkString_EAW/displaywidth-8 10776 ns/op 156.55 MB/s 0 B/op 0 allocs/op
BenchmarkString_EAW/go-runewidth-8 23987 ns/op 70.33 MB/s 0 B/op 0 allocs/op
BenchmarkString_StrictEmoji/displaywidth-8 10892 ns/op 154.88 MB/s 0 B/op 0 allocs/op
BenchmarkString_StrictEmoji/go-runewidth-8 14552 ns/op 115.93 MB/s 0 B/op 0 allocs/op
BenchmarkString_ASCII/displaywidth-8 1116 ns/op 114.72 MB/s 0 B/op 0 allocs/op
BenchmarkString_ASCII/go-runewidth-8 1178 ns/op 108.67 MB/s 0 B/op 0 allocs/op
BenchmarkString_Unicode/displaywidth-8 896.9 ns/op 148.29 MB/s 0 B/op 0 allocs/op
BenchmarkString_Unicode/go-runewidth-8 1434 ns/op 92.72 MB/s 0 B/op 0 allocs/op
BenchmarkStringWidth_Emoji/displaywidth-8 3033 ns/op 238.74 MB/s 0 B/op 0 allocs/op
BenchmarkStringWidth_Emoji/go-runewidth-8 4841 ns/op 149.56 MB/s 0 B/op 0 allocs/op
BenchmarkString_Mixed/displaywidth-8 4064 ns/op 124.74 MB/s 0 B/op 0 allocs/op
BenchmarkString_Mixed/go-runewidth-8 4696 ns/op 107.97 MB/s 0 B/op 0 allocs/op
BenchmarkString_ControlChars/displaywidth-8 320.6 ns/op 102.93 MB/s 0 B/op 0 allocs/op
BenchmarkString_ControlChars/go-runewidth-8 373.8 ns/op 88.28 MB/s 0 B/op 0 allocs/op
BenchmarkRuneDefault/displaywidth-8 335.5 ns/op 411.35 MB/s 0 B/op 0 allocs/op
BenchmarkRuneDefault/go-runewidth-8 681.2 ns/op 202.58 MB/s 0 B/op 0 allocs/op
BenchmarkRuneWidth_EAW/displaywidth-8 146.7 ns/op 374.80 MB/s 0 B/op 0 allocs/op
BenchmarkRuneWidth_EAW/go-runewidth-8 495.6 ns/op 110.98 MB/s 0 B/op 0 allocs/op
BenchmarkRuneWidth_ASCII/displaywidth-8 63.00 ns/op 460.33 MB/s 0 B/op 0 allocs/op
BenchmarkRuneWidth_ASCII/go-runewidth-8 68.90 ns/op 420.91 MB/s 0 B/op 0 allocs/op
I use a similar technique in this grapheme cluster library.
displaywidth
will mostly give the same outputs as go-runewidth
, but there are some differences:
- Unicode category Mn (Nonspacing Mark):
displaywidth
will return width 0,go-runewidth
may return width 1 for some runes. - Unicode category Cf (Format):
displaywidth
will return width 0,go-runewidth
may return width 1 for some runes. - Unicode category Mc (Spacing Mark):
displaywidth
will return width 1,go-runewidth
may return width 0 for some runes. - Unicode category Cs (Surrogate):
displaywidth
will return width 0,go-runewidth
may return width 1 for some runes. Surrogates are not valid UTF-8; some packages may turn them into the replacement character (U+FFFD). - Unicode category Zl (Line separator):
displaywidth
will return width 0,go-runewidth
may return width 1. - Unicode category Zp (Paragraph separator):
displaywidth
will return width 0,go-runewidth
may return width 1. - Unicode Noncharacters (U+FFFE and U+FFFF):
displaywidth
will return width 0,go-runewidth
may return width 1.
See TestCompatibility
for more details.