Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with encoding: core_detect_unicode() should not decide unicode simply because its not ASCII #154

Closed
aseques opened this issue Dec 12, 2013 · 10 comments
Milestone

Comments

@aseques
Copy link
Contributor

aseques commented Dec 12, 2013

There is the function core_detect_unicode in the fn_core.php file that checks if a message is or isn't using characters not part of ASCII, but the SMS allow an special charset called the GSM character set (see it here http://www.clockworksms.com/blog/the-gsm-character-set/ or here http://unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT), that allows some extra characters such all of the accented vowels and others, Since playsms is using mb_string_detect, it detects them as UTF-8, hence encoding the messages in double byte, when there is no need for it.

I have been myself looking for solutions for this, so far but couldn't find much, this link has some more information http://stackoverflow.com/questions/27599/reliable-sms-unicode-gsm-encoding-in-php

@aseques aseques closed this as completed Dec 12, 2013
@aseques aseques reopened this Dec 12, 2013
@antonraharja
Copy link
Member

Hi,

Whay are you re-opened this issue ? Is this issue still persist ?

anton

@aseques
Copy link
Contributor Author

aseques commented Dec 13, 2013

Yes, the code to detect the encoding was moved to fn_core.php, but the detection is still being done with the mb_detect_encoding, that doesn't know of the GSM charset. So all the messages that are not plain ascii are detected as UTF8

@antonraharja
Copy link
Member

can you give us real example, emphasize on examples, not explanation.

in what scenario this issue cause problem, what kind of issue exactly, etc..

thanks,
anton

@aseques
Copy link
Contributor Author

aseques commented Dec 13, 2013

Ok, when I send a message such as "stàs de guàrdia," having created it from playsms and not having selected the 'send message in unicode' option, it shouldn't be detected as UTF8 and encoded because all the characters are valid GSM, in the previous version "stàs de guàrdia," was converted into "73 74 7f 73 20 64 65 20 67 75 7f 72 64 69 61 2c"
à is 7f
But in the current version it would be just converted to double byte message, for example

Parapapà gets converted into "00 50 00 61 00 72 00 61 00 70 00 61 00 70 00 e0" which are the utf encoded characters.

@antonraharja
Copy link
Member

Hi,

I've tested this in my server, using Kannel and GSM modem.
selection_016

and that message received as is. also in kannel and playSMS theres no log it changed to ucs2.

this was in kannel, smsbox.log:
INFO: sendsms sender:xxx:xxx (127.0.0.1) to: msg: stàs de guàrdia Parapapà

this was my playsms.log:
2013-12-13 18:22:12 PID52aaede484408 - L3 kannel outgoing # unicode autodetected
2013-12-13 18:22:12 PID52aaede484408 - L3 kannel outgoing # http://localhost:13131/cgi-bin/sendsms?username=xxx&password=xxx&from=xxx&to=xxx&dlr-mask=31&dlr-url=http%3A%2F%2Flocalhost%2Findex.php%3Fapp%3Dcall%26cat%3Dgateway%26plugin%3Dkannel%26access%3Ddlr%26type%3D%25d%26smslog_id%3D1003205%26uid%3D2&charset=UTF-8&coding=2&account=xxx&text=st%C3%A0s+de+gu%C3%A0rdia+Parapap%C3%A0+%40xxx&smsc=gsm1

anton

@antonraharja
Copy link
Member

ok. sending through kannel maybe no problem, its because theres secondary checking inside kannel fn.php

i see the problem, core_detect_unicode() should not decide unicode simply because its not ASCII.

@aseques
Copy link
Contributor Author

aseques commented Dec 16, 2013

Yep, that's the point, a more complex validation has to be prepared...

@aseques
Copy link
Contributor Author

aseques commented Dec 19, 2013

Some code to check this https://gist.github.com/aseques/8040175, a merge request is in progress

@astrakid
Copy link
Contributor

Ich bin bis 27.12.2013 abwesend.

Bis zum 27.12.2013 bin ich nicht im Büro.
In dringenden Fällen wenden Sie sich an meine Kollegen:
Telefonie: nss.tapi@gfi.ihk.de (+49 231 9746 456)
Fastviewer: fastviewer@gfi.ihk.de (+49 231 9746 457)

Mit freundlichen Grüßen
Andre Gronwald

Hinweis: Dies ist eine automatische Antwort auf Ihre Nachricht "Re:
[playSMS] Issues with encoding: core_detect_unicode() should not decide
unicode simply because its not ASCII (#154)" gesendet am 12/19/2013 3:56:20
PM.

Diese ist die einzige Benachrichtigung, die Sie empfangen werden, während
diese Person abwesend ist.

@aseques
Copy link
Contributor Author

aseques commented Dec 19, 2013

After the patch in pull request #158
Parapapà would is converted into "00 50 00 61 00 72 00 61 00 70 00 61 00 70 00 7f"
instead of "00 50 00 61 00 72 00 61 00 70 00 61 00 70 00 e0" whics is better (because 7f is the character in gsm for à, but still is being sent as utf.
Any tips?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants