Skip to content

Commit

Permalink
interpret pluginName as UTF8 rather than pure ASCII
Browse files Browse the repository at this point in the history
toUnicode is just expanding bytes to short, and re-interpret them UTF16, not regarding original encoding.
This works well for pure ASCII, but is not really defined for other encodings.
We should prefer using UTF8 by default for all the image-VM string transfer.
  • Loading branch information
nicolas-cellier-aka-nice committed Dec 30, 2018
1 parent 627bc5e commit 98fc85d
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion platforms/win32/vm/sqWin32ExternalPrims.c
Expand Up @@ -41,7 +41,10 @@ void *ioLoadModule(char *pluginName)
TCHAR *name;

#ifdef UNICODE
name = toUnicode(pluginName);
int len = MultiByteToWideChar(CP_UTF8, 0, pluginName, -1, NULL, 0);
if (len <= 0) return 0; /* invalid UTF8 ? */
name = alloca(len);

This comment has been minimized.

Copy link
@nicolas-cellier-aka-nice

nicolas-cellier-aka-nice Jan 1, 2019

Author Contributor

Err!!! It should be alloca(len*sizeof(WCHAR))

This comment has been minimized.

Copy link
@krono

krono Jan 1, 2019

Member

Also, are we missing terminating '\0'? (See: 82d1c33#diff-ea77aa3cf89ad27f0980e8a8b857fb1cR562)

if (MultiByteToWideChar(CP_UTF8, 0, pluginName, -1, name, len) == 0) return 0;
#else
name = pluginName;
#endif
Expand Down

2 comments on commit 98fc85d

@nicolas-cellier-aka-nice
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, no, the terminating NULL should be copied when we pass -1 for original string length.
I think I had to re-read the docs a few times before I convince myself. This part is clear:

MultiByteToWideChar does not null-terminate an output string if the input string length is explicitly specified without a terminating null character. To null-terminate an output string for this function, the application should pass in -1 or explicitly count the terminating null character for the input string.

Then the result, less so:

Returns the number of characters written to the buffer indicated by lpWideCharStr if successful. If the function succeeds and cchWideChar is 0, the return value is the required size, in characters, for the buffer indicated by lpWideCharStr.

It's the required size for the buffer, in characters, not the number of characters for the string (wcslen), so I understand it is including the terminating NULL, if ever we asked for it (with -1 or with strlen()+1).

To be sure:

#include <stdio.h>
#include <windows.h>

int main() {
    char *foo="foo";
    int len = MultiByteToWideChar(CP_UTF8,0,foo,-1,NULL,0);
    printf("len=%d\n",len);
    return 0;
}

Then

$ i686-w64-mingw32-gcc test.c
$ ./a.exe
$ len=4

@krono
Copy link
Member

@krono krono commented on 98fc85d Jan 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

Please sign in to comment.