New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 14474 - Internally use UTF-8 strings for Windows #4602
Conversation
What if someone wrote that file by hand not using dub? Couldn't it already be in ACP encoding? Maybe we should use a file BOM. |
You're already writing D code in UTF-8 and thus using same editor you would also save this file in UTF-8. I think there isn't any good reason for it to be in ANSI code page, especially because it varies by windows locale and IMO would cause more problems. |
@@ -134,6 +135,15 @@ bool response_expand(size_t *pargc, const char ***pargv) | |||
|
|||
buffer = (char *)f.buffer; | |||
bufend = buffer + f.len; | |||
#if _WIN32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need comment explaining what is happening here.
Could use a test case. |
You can get this in a shell script like this, we also run them on Windows. |
Created a test for this, but it would have failed if windows default locale (ANSI code page) couldn't support that charset. It compiles and works fine with both MSVC and DMC on Windows. No idea about Linux. Also looks like need some extra library for ddmd linking but I'm not sure how magicport works. |
Whenever interacting with external sources (eg. WinAPI functions) then encode/decode to/from Wide strings Fix Issue 14474
What's the error? |
When linking
|
Looks like it can't find druntime properly. |
@@ -15,6 +15,7 @@ | |||
#include <assert.h> | |||
#include <limits.h> | |||
#include <string.h> | |||
#include <locale.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dear gawd, please, no locale.h! It's a giant bug.
If we're going to support Unicode in dmd source, we should do it right, not a quick hack. I suggest doing it the same way Phobos does for D - everything is UTF-8, and conversions to/from the Windows 'W' functions is done when those Windows functions are called. |
I.e. we need to do this right or not bother, and we know how to do it right. Phobos does it right. |
That was written for first implementation, since then I've implemented it much better. Also this is about Unicode paths when specifying source filenames and not about it's content. Please review PR changeset, but there are still some issues, I'll work on them a bit later. |
The code looks good, but it should be moved to |
What's the status of this now that we've switch from C++ to D? Does declaring |
Not sure about dmc, but msc should be fine. If we want utf8 command line arguments we should use D main instead IMO. Most of the file i/o should already be fixed by the long-filename-fixes (but assumes ANSI), but sources of trouble for non-ANSI characters are sensible console output and invocation of the linker (MS link understands response files with BOM, but optlink is limited in that regard). |
OK, so speculatively closing then. @davispuh if you think this is still a problem, please rebase and reopen. |
Original issues are still present, several places use Windows ANSI API instead of Wide API. As well as Some while ago I actually rebased this and rewrote some parts from C to D as it seemed easier. This way I changed This is how that patch looks atm |
I suspect this is due to Linux now requiring "-fPIC" for builds. I run into this problem also. See #7420 and #7427. I currently apply #7420, and then I can build and run all but 2 or 3 tests. My understanding is @wilzbach is working on adding hardened Linux "machines" to our CIs (#7579), but I don't clearly understand what the roadmap is. |
@@ -710,7 +710,7 @@ char *file_8dot3name(const char *filename) | |||
char *buf; | |||
int i; | |||
|
|||
h = FindFirstFile(filename,&fileinfo); | |||
h = FindFirstFileA(filename,&fileinfo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*A versions are basically deprecated junk on Windows. Do not use unless absolutely needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S.: NT kernel is UTF-16 throughout.
This is a quick hack for Issue 14474
Decode UTF-8 to windows Wide string and then encode it to default windows ANSI code page.
Best should use Wide WinAPI functions and then wouldn't need to encode it to ANSI code page but that's quite a lot of work... Note that not all UTF-8 characters can be encoded to ANSI code page. Thus it's possible to have some paths/names that can't be accessed with ANSI functions.