-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate length of multibyte characters better #24
Conversation
If we length check on just the string when there's some foreign characters in there it can often come back as a length you wouldn't expect. This doesn't make much sense to me for validation purposes, you always want the length of a string to mean x characters, not bytes. See my Go Playground: https://play.golang.org/p/GNDIG2Tzx4y I would expect the behaviour of the runed string, not the latter when validating.
Codecov Report
|
validate.go
Outdated
@@ -439,14 +439,16 @@ func (v *Validator) HexColor(key, value string, message ...string) { | |||
func (v *Validator) Len(key, value string, min, max int, message ...string) { | |||
msg := getMessage(message, "") | |||
|
|||
runeVal := []rune(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing: if you're creating an intermediate variable then you might as well use length := len([]rune(value))
? That's only what it's used for.
There is also utf8.RuneCountInString()
, which seems a bit faster:
Original:
BenchmarkLenASCII-4 300000000 4.04 ns/op
BenchmarkLenMB-4 300000000 4.04 ns/op
Original PR:
BenchmarkLenASCII-4 30000000 48.1 ns/op
BenchmarkLenMB-4 20000000 66.3 ns/op
RuneCountInString:
BenchmarkLenASCII-4 50000000 26.1 ns/op
BenchmarkLenMB-4 50000000 32.7 ns/op
Not that it really matters all that much in this context, but don't see a reason not to use it?
Full patch with benchmark:
diff --git i/validate.go w/validate.go
index 64bb805..ec9846d 100644
--- i/validate.go
+++ w/validate.go
@@ -52,6 +52,7 @@ import (
"strconv"
"strings"
"time"
+ "unicode/utf8"
"github.com/teamwork/mailaddress"
)
@@ -439,16 +440,16 @@ func (v *Validator) HexColor(key, value string, message ...string) {
func (v *Validator) Len(key, value string, min, max int, message ...string) {
msg := getMessage(message, "")
- runeVal := []rune(value)
+ l := utf8.RuneCountInString(value)
switch {
- case len(runeVal) < min:
+ case l < min:
if msg != "" {
v.Append(key, msg)
} else {
v.Append(key, fmt.Sprintf(MessageLenLonger, min))
}
- case max > 0 && len(runeVal) > max:
+ case max > 0 && l > max:
if msg != "" {
v.Append(key, msg)
} else {
diff --git i/validate_test.go w/validate_test.go
index 476c2be..1319d69 100644
--- i/validate_test.go
+++ w/validate_test.go
@@ -810,3 +810,17 @@ func TestErrorOrNil(t *testing.T) {
})
}
}
+
+func BenchmarkLenASCII(b *testing.B) {
+ v := New()
+ for n := 0; n < b.N; n++ {
+ v.Len("x", "Hello, world, it's a sentence!", 0, 40)
+ }
+}
+
+func BenchmarkLenMB(b *testing.B) {
+ v := New()
+ for n := 0; n < b.N; n++ {
+ v.Len("x", "H€llo, world, it's a sentence¡", 0, 40)
+ }
+}
@@ -439,14 +439,16 @@ func (v *Validator) HexColor(key, value string, message ...string) { | |||
func (v *Validator) Len(key, value string, min, max int, message ...string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify the comment that it counts characters, and not byte size? There are use cases for both, so seems to me that it's good to clarify in docs. e.g.
// Len sets the minimum and maximum length for a string.
//
// A maximum of 0 indicates there is no upper limit. The length is counted in
// characters (runes), not bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
If we length check on just the string when there's some foreign characters in there it can often come back as a length you wouldn't expect. This doesn't make much sense to me for validation purposes, you always want the length of a string to mean x characters, not bytes.
See my Go Playground: https://play.golang.org/p/GNDIG2Tzx4y
I would expect the behaviour of the runed string, not the latter when validating.