Use mb_strlen instead of strlen in Validator::lengthBetween() #270

Closed
Leechael opened this Issue Jan 15, 2012 · 14 comments

Comments

Projects
None yet
7 participants

In case counting multi bytes characters, we can not get actually character number, we get how many bytes in string in fact.

Current what we need it works fine with multi bytes character is override this validator by following lines:


Validator::add('lengthBetween', function($value, $format, $options) {
     $options += array('min' => 1, 'max' => 255, 'encoding' => 'utf-8');
     $length = mb_strlen($value, $options['encoding']);
     return ($length >= $options['min'] && $length <= $options['max']);
});
Member

daschl commented Jan 15, 2012

This means that Lithium users need to have the mbstring extension installed and I don't think we want to force user to install it.

@nateabele @davidpersson ?

Contributor

masom commented Jan 15, 2012

I think this should be a configuration value, disabled by default.

Member

daschl commented Jan 15, 2012

Well, we could actually check if the method exists or not, but this kind oh behavior should be consistent in the whole framework them!

Owner

davidpersson commented Jan 15, 2012

Although it has no external dependencies the docs say:

mbstring is a non-default extension. This means it is not enabled by default. You must explicitly enable the module with the configure option.

However I yet didn't see any package/bundle/installation without it being enabled. I'd vote for using mb_strlen as the result seems to be more what I'd expect in general.

Contributor

greut commented Jan 15, 2012

+1 for mb_strlen, good frameworks enforce good practices. And unicode everywhere is one of them.

Contributor

m4rcsch commented Jan 16, 2012

+1 too
why?: you have to enable mb_strlen anyway. Because php_ca is using it (if you run the php_ca tests)

Owner

nateabele commented Jan 16, 2012

@davidpersson Is there an ext/intl option for this?

Owner

davidpersson commented Jan 16, 2012

There is no ext/intl equivalent for mb_strlen(). Transliteration/transforms doesn't work for this. As far as I can see using (iconv_strlen($string, 'UTF-8') ?: $string) is one possible solution - in case we want to rely on ext/iconv That extension is enabled by default. Also found a few different takes on creating a userland function browsing through the comments. But I'm pretty sure they don't cover all cases.

Owner

davidpersson commented Jan 16, 2012

Well iconv_strlen() has some side-effects as it returns false when $string contains invalid characters. In the context of a validation function this downside could be turned around and used beneficially as it also validates the "correctness" of the string. However this may not be what one expects running that validator.

Owner

davidpersson commented Jan 16, 2012

Further research reveals that Symfony2 uses a 3 folded fallback chain which I find interesting mainly because it reveals I was slightly wrong in my first comment. ext/intl seems to provide a solution via grapheme_strlen() but only for UTF-8.

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Validator/Constraints/MaxLengthValidator.php#L45-51

ZF2 conditionally runs mb_strlen(), falling back to normal strlen().

Owner

davidpersson commented Jan 19, 2012

I'm working on a fix.

@ghost ghost assigned davidpersson Jan 19, 2012

Contributor

m4rcsch commented Jan 19, 2012

maybe we should use that fix in lithium_qa too?

davidpersson added a commit to UnionOfRAD/framework that referenced this issue Jan 28, 2012

Owner

davidpersson commented Jan 28, 2012

@dgalien As this is merged into master I'll make it available in _qa, too.

Contributor

m4rcsch commented Jan 28, 2012

great!

davidpersson added a commit to UnionOfRAD/framework that referenced this issue Sep 15, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment