New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 Encoding Arabic Support #40
Comments
Can you please provide me with an example RouterOS config so I can reproduce the problem? |
@AmrSubZero I'm looking into this. It should be possible to add a
Let me know please, I'd like to fix this. |
First, i' have created a sample configuration working on v5.25 of routerOS with arabic characters (comments column) for "ip/hotspot/users & ip bindings" so you can connect, get and display the "comment" column for those, and test with them. In the configuration the user is admin and there's no password I'm getting connected via API by connecting to 10.0.0.1 with the user admin and no password as i mentioned above, if there's a connection problem, please let me know. Second, the correct character encoding is "cp1256". I'm truly glad that you're ready to help, waiting you, thanks. |
@AmrSubZero I think I have a fix for this. If you want to test the fix, please pull the current master branch from github and build and try that version. As an experiment, I've changed the internal encoding for transmission between the mikrotik-java API and RouterOS to be UTF-8 and it seems to solve the problem. I'll release version 3.0.3 shortly to make this change available. Thanks to @boen_robot - your comments above pointed me in the right direction, which lead to this StackOverflow article http://stackoverflow.com/questions/5729806/encode-string-to-utf-8
|
This works for me. Can you please see if it works for you as well. Example java code that creates users with Arabic, Japanese and Cyrillic text in comments private static final String JAPANESE = "事報ハヤ久送とゅ歳用ト候新放すルドう二5春園ユヲロ然納レ部悲と被状クヘ芸一ーあぽだ野健い産隊ず";
private static final String CRYLLIC = "Лорем ипсум долор сит амет, легере елояуентиам хис ид. Елигенди нолуиссе вих ут. Нихил";
private static final String ARABIC = "تجهيز والمانيا تم قام. وحتّى المتاخمة ما وقد. أسر أمدها تكبّد عل. فقد بسبب ترتيب استدعى أم, مما مع غرّة، لأداء. الشتاء، عسكرياً";
private void test() throws MikrotikApiException {
con.execute("/ip/hotspot/user/add name=userJ comment='" + JAPANESE + "'");
con.execute("/ip/hotspot/user/add name=userC comment='" + CRYLLIC + "'");
con.execute("/ip/hotspot/user/add name=userA comment='" + ARABIC + "'");
for (Map<String, String> res : con.execute("/ip/hotspot/user/print return name,comment")) {
System.out.printf("%s : %s\n", res.get("name"), res.get("comment"));
}
} _Output from the that code_
_Screen shot of ssh login to Mikrotik_ |
I've updated the maven repository in my android project to : cleaned & rebuilt the project and tried the following examples :
Results : and :
Results : Which correctly displayed the userA arabic string in the App, but not the other users. So we have Two problems here : First problem : when i send a command to add user with comment it's not adding the comment in the right arabic condition it's displaying in Winbox like : This is not the correct arabic string that i passed in the add user command. i think sending arabic from api to mikrotik should not be in "UTF-8" encoding, or maybe it should be a specific encoding which i don't know, to be displayed correctly in Winbox. Second problem : in that case i will need to Set all the users comments to that weird character encoding to display them correctly in the App, but i won't be able to read arabic in Winbox i will only read arabic correctly in the App which is a problem not a feature. Sending characters from API to MikroTik should be done with a specific encoding. If there anything you don't understand just let me know. |
On my setup the API and ssh command line show the same information, but Winbox and the web interface don't display them correctly, they look like in your pictures (I even enbled Arabic in my Chrome settings to see if the web interface will display correctly). So I have a question. If you ssh in to your mikrotik and do a |
I've experimented some more and modified the API code to explicitly convert from cp1256 before sending data over the network and to convert to cp1256 when receiving data from the network (and before passing it to the user) and it makes no difference at all. I tried the same with iso-8859-6, no difference. The Arabic text set using the API looks correct from the router (ssh) command line, and it is read back correctly. But Winbox and the web interface do not show correctly. I suspect that Winbox and the web interface do not encode and decode the Arabic characters. I set Chrome's encoding settings (under View/Encoding) to "Arabic (Windows-1256)" and it does not make any difference either. I tried with "Arabic (iso-8859-6)" as well, no difference. This leads me to believe that it is already wrong by the time it reaches the browser. I fixed the encoding problem in the API in 3.0.3 and at this point the API will send non-English characters correctly and read them correctly. Other software is not working and I can't fix that. I think you're going to have to decide on a workaround for this problem. The options I see are:
|
I understand that you want to fix this, and i appreciate your help. But let's discuss it. As i asked (this question) boen_robot told me that WinBox displays the bytes using the Windows ANSI charset which is different for every locale. And as i know, boen_robot is the owner of the Pear2/RouterOS PHP API for Mikrotik, he has a Charset Configurable that works correctly for each user configuration, as i mentioned in this Issue the setCharset() function, the user choosing the REMOTE and LOCAL charset, to Write to MikroTik with the correct charset, and Read from MikroTik also with the correct charset.
The cool thing is that i tested his API using setCharset() and it is working like a charm! Writing to MikroTik from API, displayed the Arabic correctly in the WinBox : Reading from MikroTik through API, also displayed the Arabic correctly in the Web : boen_robot has recommended you to create a Charset Configurable to fix this problem : Can that be achieved? is it possible to fix this by doing a Charset Configurable setCharset() method? I'll be happy if that can be achieved, i have a huge data in my RouterOS the only way that i can distinction between each item and the other is the comment column and it will be too much easier to read it in arabic. Let me know any step you take, please! Thanks alot. |
I tried manipulating the the charset and it does not solve it. Maybe I'm doing it wrong, but The library is open source and you can easily implement a |
That is a bad news, but anyway, thanks Mr Gideon for your help. |
Hi , I'm using tesseract 4 with vs 2017 . i have used with English characters first , now i started to include arabic as well . the thing is got weird characters even when i change the eng.traineddata to ara.traineddata. i found out that it's the characters when the UTF8 code is treated as Hex code . I think the problem is with UTF8 or might be my visual studio can't recognize Arabic letters or what |
Hi
You replied to a thread concerning the support for Arabic characters in an API used to program Mikrotik routers. I cannot help you with problems with Visual Studio.
Gideon
… On 02 May 2018, at 10:54, AbdelsalamHaa ***@***.***> wrote:
Hi , I'm using tesseract 4 with vs 2017 . i have used with English characters first , now i started to include arabic as well . the thing is got weird characters even when i change the eng.traineddata to ara.traineddata.
i found out that it's the characters when the UTF8 code is treated as Hex code .
<https://user-images.githubusercontent.com/35866217/39512903-1cb50738-4e25-11e8-8d0b-eba47c163d8b.png>
this is the image i want to recognize .
the result is
<https://user-images.githubusercontent.com/35866217/39513059-9a6a2424-4e25-11e8-9bcd-2eb56d3437b6.png>
i convert the letters in arabic to UTF8 code using this website
https://r12a.github.io/app-conversion/ <https://r12a.github.io/app-conversion/>
and when i take this code and converted as a hex code to character i get the same characters that tesseract showed me the first time.
"عبدالسلام مدي عبدالعزيز"
I think the problem is with UTF8 or might be my visual studio can't recognize Arabic letters or what
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub <#40 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABFCFMclw-T_iyq1UVP_o0PdqD87yXJoks5tuXQzgaJpZM4JGvMx>.
|
@GideonLeGrange okay thanks for your reply |
how i can used this code via class.php.api ???? //Here's where we specify the charset pair. |
I wrote and maintain a Java API. I cannot help you with a PHP API that somebody else wrote. Ask the author of that class. |
As i know The API sends and receives the raw bytes
But when i try to display the arabic words correctly, it displays like "����" which seems to be "ISO-8859-1" or maybe "windows-1252" i don't know.
Is there an option allows me to change the API Charset Encoding to UTF-8? like the Pear2/RouterOS PHP API? (see the link).
or maybe convert the (diamond question marks) to "UTF-8" in Java? if so, please post an example!
I really need to correctly display arabic words, thanks.
The text was updated successfully, but these errors were encountered: