New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with encoding: core_detect_unicode() should not decide unicode simply because its not ASCII #154
Comments
Hi, Whay are you re-opened this issue ? Is this issue still persist ? anton |
Yes, the code to detect the encoding was moved to fn_core.php, but the detection is still being done with the mb_detect_encoding, that doesn't know of the GSM charset. So all the messages that are not plain ascii are detected as UTF8 |
can you give us real example, emphasize on examples, not explanation. in what scenario this issue cause problem, what kind of issue exactly, etc.. thanks, |
Ok, when I send a message such as "stàs de guàrdia," having created it from playsms and not having selected the 'send message in unicode' option, it shouldn't be detected as UTF8 and encoded because all the characters are valid GSM, in the previous version "stàs de guàrdia," was converted into "73 74 7f 73 20 64 65 20 67 75 7f 72 64 69 61 2c" Parapapà gets converted into "00 50 00 61 00 72 00 61 00 70 00 61 00 70 00 e0" which are the utf encoded characters. |
Hi, I've tested this in my server, using Kannel and GSM modem. and that message received as is. also in kannel and playSMS theres no log it changed to ucs2. this was in kannel, smsbox.log: this was my playsms.log: anton |
ok. sending through kannel maybe no problem, its because theres secondary checking inside kannel fn.php i see the problem, core_detect_unicode() should not decide unicode simply because its not ASCII. |
Yep, that's the point, a more complex validation has to be prepared... |
Some code to check this https://gist.github.com/aseques/8040175, a merge request is in progress |
Ich bin bis 27.12.2013 abwesend. Bis zum 27.12.2013 bin ich nicht im Büro. Mit freundlichen Grüßen Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht "Re: Diese ist die einzige Benachrichtigung, die Sie empfangen werden, während |
After the patch in pull request #158 |
There is the function core_detect_unicode in the fn_core.php file that checks if a message is or isn't using characters not part of ASCII, but the SMS allow an special charset called the GSM character set (see it here http://www.clockworksms.com/blog/the-gsm-character-set/ or here http://unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT), that allows some extra characters such all of the accented vowels and others, Since playsms is using mb_string_detect, it detects them as UTF-8, hence encoding the messages in double byte, when there is no need for it.
I have been myself looking for solutions for this, so far but couldn't find much, this link has some more information http://stackoverflow.com/questions/27599/reliable-sms-unicode-gsm-encoding-in-php
The text was updated successfully, but these errors were encountered: