word_wrap in Email.php (library) does not handle multibyte characters well #1280

Open
williamli opened this Issue Apr 23, 2012 · 5 comments

Comments

Projects
None yet
6 participants
@williamli

word_wrap in Email.php (library) does not handle multibyte characters well.
It made long line of plaintext Chinese characters look weird when it cuts multibyte characters in the middle.

@philsturgeon

This comment has been minimized.

Show comment Hide comment
@philsturgeon

philsturgeon Apr 26, 2012

Contributor

Could you try something out?

Go here http://php.net/manual/en/function.wordwrap.php and check out the contributed function iconv_wordwrap by "mail at dasprids dot de".

Put the same Chinese characters through that string, then let us know how it goes.

Contributor

philsturgeon commented Apr 26, 2012

Could you try something out?

Go here http://php.net/manual/en/function.wordwrap.php and check out the contributed function iconv_wordwrap by "mail at dasprids dot de".

Put the same Chinese characters through that string, then let us know how it goes.

@williamli

This comment has been minimized.

Show comment Hide comment
@williamli

williamli Apr 26, 2012

Sure. I will give it a try this weekend. Hope it is not too late.

Sure. I will give it a try this weekend. Hope it is not too late.

@narfbg narfbg removed the Looking Into It label Nov 4, 2014

@narfbg

This comment has been minimized.

Show comment Hide comment
@narfbg

narfbg Nov 4, 2014

Contributor

Some improvements have been made, but it's still not flawless ... I'm not sure it can ever be. If anybody has feedback - please comment.

Contributor

narfbg commented Nov 4, 2014

Some improvements have been made, but it's still not flawless ... I'm not sure it can ever be. If anybody has feedback - please comment.

@greenwizard88

This comment has been minimized.

Show comment Hide comment
@greenwizard88

greenwizard88 Feb 25, 2015

So let me preface this by saying that my C knowledge is (almost) nonexistent.

PHP's wordwrap function uses strncmp, which operates on char's, which are 1 byte, as opposed to wstrcmp which can handle multi-byte characters... so the C code behind the wordwrap function sees your multi-byte character as 2 separate characters.

This can be fixed in 1 of 2 ways:

  1. Re-writing the word_wrap function to not use php's wordwrap() and instead write our own version. This will be slower, but should work if we use mb_strlen() for calculations instead of strlen().

  2. Add a mb_wordwrap() function to PHP, and then use that instead.

I don't have time to see which function works properly, but the wordwrap() function in the Email class could possibly be replaced by something here: http://stackoverflow.com/questions/3825226/multi-byte-safe-wordwrap-function-for-utf-8

So let me preface this by saying that my C knowledge is (almost) nonexistent.

PHP's wordwrap function uses strncmp, which operates on char's, which are 1 byte, as opposed to wstrcmp which can handle multi-byte characters... so the C code behind the wordwrap function sees your multi-byte character as 2 separate characters.

This can be fixed in 1 of 2 ways:

  1. Re-writing the word_wrap function to not use php's wordwrap() and instead write our own version. This will be slower, but should work if we use mb_strlen() for calculations instead of strlen().

  2. Add a mb_wordwrap() function to PHP, and then use that instead.

I don't have time to see which function works properly, but the wordwrap() function in the Email class could possibly be replaced by something here: http://stackoverflow.com/questions/3825226/multi-byte-safe-wordwrap-function-for-utf-8

@drivingmenuts

This comment has been minimized.

Show comment Hide comment
@drivingmenuts

drivingmenuts Feb 25, 2015

Adding a multibyte wordwrap function to PHP is probably a bit beyond the scope of this project.

However, there are several examples of multibyte wordwrap functions at http://stackoverflow.com/questions/3825226/multi-byte-safe-wordwrap-function-for-utf-8

Any of those could be adapted/extended/coopted.

Adding a multibyte wordwrap function to PHP is probably a bit beyond the scope of this project.

However, there are several examples of multibyte wordwrap functions at http://stackoverflow.com/questions/3825226/multi-byte-safe-wordwrap-function-for-utf-8

Any of those could be adapted/extended/coopted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment