-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Char encoding on Windows 7 #48
Comments
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Hi Batte, yes, the problem is that it's converting everything to ANSI in Windows, which it is not the best solution. I'll change this to convert everything to UTF-8, but you'll not see the ♫ unless you change the command line default code page since it is using ANSI by default. You can change the codepage to UTF-8 with "chcp 65001", but you also will need to change the default font, for someone that support this characters ( like Lucida Console ). |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). Hello, I understand but the default encoding of QtCreator's command line page (IDE I use) does not seem to be ANSI, it's the reason why I'm able to see special chars like ♫. About your new encoding choice (UTF-8), are you sure that's the correct encoding of Windows ? I thouth that's UTF-16 ... Maybe you're right, I'm actually not sure ! Thanks, B. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Yes, Windows encourage the use of UTF-16 as the default encoding, but it's not a requirement, since it supports any Unicode method. I can't use UTF-16 because i'm using std::string to keep it simple, and the other OSes use UTF-8, so the correct approach is to always use the same encoding. There's nothing impeding yo to convert the strings to any other encoding, and you can use the String class used internally by efsw ( efsw::String::fromUtf8( filename ).toWideString() ). Edit: Regards, |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). Hello, All UTF-8 chars seem to be correctly sent by EFSW. But I have some problems with others files (files with filenames originating from Mac OS X --> created directly on a Mac and correctly printed by Finder and Explorer Windows). One of them : On Windows, if you edit its filename, after you copy the filename and you put it on a text plain editor like Notepad, you'll see some specific chars ! And these chars are not correctly sent by EFSW ... Do you know why ? I hope I'm understandable ... Do not hesitate to tell me if I'm not. B. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Sorry, but i'm not sure what are you trying to say. |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). OK, sorry to be not understandable. It's simple. As for "test♫.xlsx", I tried to put the file below into the watched folder : Result ? As for "test?.xlsx" resulting from EFSW, I received a wrong filename for this new file. Do you know why ? For information : after some investigations, I understood that the file was coming from Mac OS X and had some strange chars on its filename (you can see them by copying the filename into Notepad (or other text plain editors ...). B. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). This is what i explained in the previous messages, this is not an efsw problem, you need to use a command line that supports UTF-8 character encoding, with a font that also supports it. Or you can change the output to an encoding that the command line interprets correctly. |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). Hello, I've also a char encoding problem on Mac OS. Indeed, if I put a folder "lolélalé" into a watched folder, I'll get "lole'lale'" ... For help, this is a part of my code :
You can see that I correctly take the outside with UTF8 encoding ... Thanks for your help. B. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). You have the same problem than in Windows, your locale it's not correctly set in the Terminal. I've tested with the default terminal locale ( en_US.UTF-8 ) and everything works just fine. Also works in the application output from QtCreator. Your code looks fine, so i don't thing there's nothing wrong there. |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). Hello, I understand you're saying but I think that my problem is another thing. Indeed, my Qt project is a client which talks with a web service. I have looked the decimal value of each char of the file name sent by EFSW and it doesn't seem to be the UTF-8 decimal value of "t" and "é" chars. Do you know what I mean ? |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Yes, it's clear what you are describing. I'll compare the string hash produced on OS X and Windows, if something is different, means that efsw is doing something wrong, otherwise it should be something of your application. Let me see and i'll tell you. Thanks |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Ok, i made the tests and it looks everything fine. The string hashes are the same, the binary data is exactly the same. I still think that this is not an efsw issue, if you can reproduce it with a simple example that i can test here, i'll take a look at it. But, please nothing with Qt or client/server, since it has nothing to do with the library. Regards |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). Hello, I'm sorry but I have still some problems about EFSW encoding. I have print in hexadecimal the string that EFSW gives after an event occured.
As you can see, all caracters are encoded in Unicode UTF-8 ...
... EXCEPT "é" and "è" :
But normally, UTF-8 code of "é" is : 0xc3a9 This difference is the reason why my C++ program (in Qt) doesn't correctly understand the word "pépè.png" ... Have you the same observation ? Thanks for your help. B. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Sorry, but i tested again and i'm getting the correct UTF-8 codes ( i tested with mingw and vs too ). |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). I'm gonna test with efsw-test. Can you just give me the hexadecimal output of a file "pépè.png" detected by EFSW ? Something like that :
|
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). OK, thank you. Do you know why "é" and "è" chars are encoded on 14 bytes instead of 2 for others ? |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). No, that's not the encoding, i printed the data as you asked me, converting every char to unsigned int ( printf("%02x", (unsigned int) *s++); ), that's why you see those extra ffffff. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Oh no, now i see your previous post, you used QString::fromString. So i don't know, still if you want, make a minimal example of this failing, and i'll debug it ( use Qt4 if you want, because i think there's the problem ). |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). Hello, It's really really strange. As you advised me, I have changed the "test" sources of EFSW project : src/test/efsw-test.cpp :
As you can see, I've just added the "print_hex()" function. There is no worries about Qt ; indeed, I use your makefile to compile test program. After compiling and executing, I get :
So exactly the same ... I really need your help. You'll find the EFSW project I use, here : https://mega.co.nz/#!IQ0EDZZB!BAR8vwK8cnDWo05hpIJ_BhOkXgg0CaFNr0zsEPDMWYU With these sources, what result do you have ? Do you have any other idea ? Thanks a lot by advance, B. |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). Wait... your project file is from OS X, and i was testing on windows... so... your problems now are on OS X? |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). I'm getting the correct code:
What i'm thinking is that your OS X file system is using a different encoding for file names. (if you're on Mavericks and python crashes running this, fix it with the instructions from here: http://stackoverflow.com/questions/19569143/python3-segmentation-fault-on-osx-mavericks ). It must be something similar to this problems: I'm a little bit busy to look for a fix right now, i'll need you to help me with this, or just wait a little bit for me to get some time to read about this. I don't event own a mac, so it's not that easy for me to see this. Regards, |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ).
:( Indeed, this link seems to be interesting : |
Original comment by Martín Lucas Golini (Bitbucket: SpartanJ, GitHub: SpartanJ). I think the problems comes from the file system encoding, please make a test converting the filename string from NFD to NFC, here's a function that i got from stackoverflow:
It seems to be a very common problem, but i'm not sure if we are dealing with this or is another thing. Regards, |
Original comment by Batte HUCHAI (Bitbucket: bhuchai, ). It works. Thanks a lot for this last point. |
Original report by Batte HUCHAI (Bitbucket: bhuchai, ).
Hello,
I have a problem of char encoding on Windows 7. Indeed, when I create a file with Windows Explorer which contains a special char (no ASCII or "basic" char ...) like "♫" (for example) in its name, the EFSW FileWatcher gives to my application the same filename but with a "?" char instead of "♫" :
For example, when I create the file "test♫.xlsx", I get :
It can be stupid to have a filename with a "♫" but it's just an example. It's the same thing with all special chars which are accepted by Windows Explorer but are "modified" by EFSW naming ...
For your information, it seems working in Mac OS 1.7.5.
Any idea ?
The text was updated successfully, but these errors were encountered: