-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Win64 unicode #332
Win64 unicode #332
Commits on Dec 30, 2018
-
Handle both an ASCII (UTF8) and a WIDE (UTF16) version of image/vm na…
…me/path Note: the Microsoft windows API mostly uses the W version (for enabling internationalized image name/path) the image uses UTF8 encoded bytes string for communication with the VM (this is best for compatibility with Unix/Mac) The idea here is that the implementation maintains both versions of the UTF8 and UTF16 path/name The appropriate macro returning a TCHAR * are also provided. This is in order to support the generic version using TCHAR, which are normally used to ease transition to UNICODE. Note about string length: No effort has been made so far to support long path names for image and VM. The path is limited to MAX_PATH in UTF16. UTF8 can eventually consume more characters than UTF16 (not necessarily more bytes). Thus, the ASCII version has been made longer (MAX_PATH_UTF8) in order to avoid an even more restrictive limit.
Configuration menu - View commit details
-
Copy full SHA for ea3d51e - Browse repository at this point
Copy the full SHA ea3d51eView commit details -
DropPlugin: modernize OpenFile/_lwrite/_lclose API
OK, OK, they are compatible with 16 bits windows ;) But these are not recommended any more especially if we want to go toward UNICODE paths everywhere.
Configuration menu - View commit details
-
Copy full SHA for 8bdd0b6 - Browse repository at this point
Copy the full SHA 8bdd0b6View commit details -
Drop plugin: let tempPathName be Wide
This is for the generated temporary file $$squeak$$.bmp The path to temporary directory could be non ASCII, so let's be more robust to UNICODE.
Configuration menu - View commit details
-
Copy full SHA for cb32e41 - Browse repository at this point
Copy the full SHA cb32e41View commit details -
Configuration menu - View commit details
-
Copy full SHA for 85e0883 - Browse repository at this point
Copy the full SHA 85e0883View commit details -
Setting fp flags _MCW_PC and _MCW_IC is not supported on ARM nor X64
See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2?view=vs-2017 Setting these flags triggers an assertion failure when compiled with MSVC in debug mode Assertion failure (mask&~(_MCW_DN|_MCW_EM|_MCW_RC))==0
Configuration menu - View commit details
-
Copy full SHA for f689145 - Browse repository at this point
Copy the full SHA f689145View commit details -
The pageSize and pageMask are too short on WIN64
Due to this, roundUpToPage is truncating addresses > 2^32 With MSVC, minAddress is > 2^32, and it then takes ages to generate a VirtualAlloc > minAddress (several minutes).
Configuration menu - View commit details
-
Copy full SHA for 49eff31 - Browse repository at this point
Copy the full SHA 49eff31View commit details -
Use the eventually true UNICODE imageNameT if -DUNICODE
using `toUnicode` does not do the right thing: it promotes each UTF8 byte code to short... ... which can hardly work beyond ASCII.
Configuration menu - View commit details
-
Copy full SHA for 627bc5e - Browse repository at this point
Copy the full SHA 627bc5eView commit details -
interpret pluginName as UTF8 rather than pure ASCII
toUnicode is just expanding bytes to short, and re-interpret them UTF16, not regarding original encoding. This works well for pure ASCII, but is not really defined for other encodings. We should prefer using UTF8 by default for all the image-VM string transfer.
4Configuration menu - View commit details
-
Copy full SHA for 98fc85d - Browse repository at this point
Copy the full SHA 98fc85dView commit details -
refactor sqMessageBox to void using toUnicode
Why to get rid of toUnicode fromUnicode fromSqueak fromSqueak2? - we'd rather use UTF8 everywhere. - and it's the last usage remaining! Now the clients must use a TEXT() macro also for the format. In the future, we could eventually translate the VM messages... Get rid of wsprintf variant which has no character limit and might be prone to buffer overrun. An alternative for supporting both ASCII and WIDE is the _tcs* family in <tchar.h> See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/vsprintf-vsprintf-l-vswprintf-vswprintf-l-vswprintf-l?view=vs-2017 In order to avoid buffer overrun, prefer a vsnprintf variant Note: I did also update the prototype in sqWin32SpurAlloc, but not the contents. It is dead code, and only there for testing purposes. We'd better remove it!
3Configuration menu - View commit details
-
Copy full SHA for 4f6f191 - Browse repository at this point
Copy the full SHA 4f6f191View commit details -
Discard now unused toUnicode fromUnicode fromSqueak fromSqueak2 lstrrchr
For UNICODE compatibility, - every String coming from image to the VM should better be interpreted UTF8, and converted to wide String via MultiByteToWideChar() - every String going to the image from the VM should better be converted from Wide string to UTF8 via WideCharToMultiByte() See: https://docs.microsoft.com/en-us/windows/desktop/api/stringapiset/nf-stringapiset-multibytetowidechar https://docs.microsoft.com/en-us/windows/desktop/api/stringapiset/nf-stringapiset-widechartomultibyte Side note: there is also a _tcsrrchr in <tchar.h> at least since visual studio 2015 See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strrchr-wcsrchr-mbsrchr-mbsrchr-l?view=vs-2017 <tchar.h> is also supported by mingw so if ever we need lstrrchr again, we'll use that.
1Configuration menu - View commit details
-
Copy full SHA for 545ec0a - Browse repository at this point
Copy the full SHA 545ec0aView commit details -
#if 0 ? YAGNI ! we now use builtin _WIN32 _WIN64 anyway
Reminder: even in WIN64, _WIN32 is defined, so the comment was a bit misleading anyway.
Configuration menu - View commit details
-
Copy full SHA for 2024d43 - Browse repository at this point
Copy the full SHA 2024d43View commit details -
Configuration menu - View commit details
-
Copy full SHA for e1e83f7 - Browse repository at this point
Copy the full SHA e1e83f7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8638522 - Browse repository at this point
Copy the full SHA 8638522View commit details -
Platform specific knowledge: WIN32_FILE and STD_FILE are mutually exc…
…lusive options Therefore, when we define `WIN32_FILE_SUPPORT` we must also define `NO_STD_FILE_SUPPORT` Until now, this knowledge was in build.win*/common/Makefile.* (and a legacy MSVC project in plaftorms/win32/misc/Squeak.dsp - on pourrait réduire la voilure !!!) Since it is easy to forget the second define, lets extend the knowledge in the sqPlatformSpecific.h
Configuration menu - View commit details
-
Copy full SHA for 1893512 - Browse repository at this point
Copy the full SHA 1893512View commit details -
We can't compare a TCHAR*windowClassName with a char*buf
Even if buf were re-interpreted as WCHAR* when -DUNICODE, strcmp wouldn't do the right thing (it will stop at first ASCII because high 8 bits will be zero!) We thus use the TCHAR*dedicated `_tcscmp`. If ever we want to switch to UTF16 (WCHAR*) windowClassName, then it will be `wcscmp`.
Configuration menu - View commit details
-
Copy full SHA for f250415 - Browse repository at this point
Copy the full SHA f250415View commit details
Commits on Dec 31, 2018
-
Tu quoque NewspeakVM, frustra TEXT macro...
That's the limit of using compiler warnings: we only focus on the sections we compile... BTW, ifdef NewspeakVM, OK, but what do Pharo people think about it?
Configuration menu - View commit details
-
Copy full SHA for 84a8d17 - Browse repository at this point
Copy the full SHA 84a8d17View commit details -
gai_strerror returns a TCHAR*, we cannot simply fprintf it...
Choose the UNICODE variant, because error messages are presumably localized an may use non ASCII characters The alternative would be to use `_ftprintf(stderr,TEXT("%s"),gai_strerror(gaiError))` and let -DUNICODE decide...
Configuration menu - View commit details
-
Copy full SHA for 7d3264e - Browse repository at this point
Copy the full SHA 7d3264eView commit details -
Let lookup account for NULL terminating a WCHAR*
The low level `RegQueryValueEx` deals with a char*, but here char* just means some un-interpreted bytes, not a string! If we compile with `-DUNICODE` the un-interpreted bytes will contain a WCHAR* And if we want to properly NULL terminate this WCHAR*, then we need 2 null bytes, not 1!
Configuration menu - View commit details
-
Copy full SHA for 6ff9625 - Browse repository at this point
Copy the full SHA 6ff9625View commit details -
Revert to ASCII only version of DnsInfo
Rationale: there's no urge in providing localized UNICODE info... That's IP addresses, etc... Eventually, server names could be UNICODE but this seems to be a real mess! The short term goal is to enable compilation with -DUNICODE For longer term, we'll see later.
Configuration menu - View commit details
-
Copy full SHA for 32840ac - Browse repository at this point
Copy the full SHA 32840acView commit details -
iconPath is char*, LoadImage expects a TCHAR*
We now interpret iconPath as UTF-8 encoded We convert it to WideChar and call the W version. TODO: for now, do not deal with UNC long filenames...
Configuration menu - View commit details
-
Copy full SHA for 475d84c - Browse repository at this point
Copy the full SHA 475d84cView commit details -
Configuration menu - View commit details
-
Copy full SHA for ef245b6 - Browse repository at this point
Copy the full SHA ef245b6View commit details -
DPRINTF must take a TCHAR*fmt because wvsprintf does!
though, vfprintf does not, so reconcile by using _vftprintf from <tchar.h>
Configuration menu - View commit details
-
Copy full SHA for 07ff6a6 - Browse repository at this point
Copy the full SHA 07ff6a6View commit details -
NOTIFYICONDATA.szTip maybe a WCHAR* if -DUNICODE
so let's use appropriate TCHAR* functions/macros
Configuration menu - View commit details
-
Copy full SHA for dfe4d09 - Browse repository at this point
Copy the full SHA dfe4d09View commit details -
_DISPLAY_DEVICE.DeviceString may be a WCHAR* if -DUNICODE, we cannot …
…simply sprintf We have to test #ifdef UNICODE, and if so, convert to UTF8
Configuration menu - View commit details
-
Copy full SHA for db33158 - Browse repository at this point
Copy the full SHA db33158View commit details -
VM_VERSION_TEXT is a TCHAR*, we cannot simply fprintf
But there is no need for UNICODE in VM_VERSION_TEXT Revert to plain ASCII and rename it VM_VERSION_VERBOSE
Configuration menu - View commit details
-
Copy full SHA for 46bd992 - Browse repository at this point
Copy the full SHA 46bd992View commit details -
iniName, manufacturer and model may be WCHAR* if -DUNICODE
Let iniName be WCHAR unconditionally. Get manufacturer and model into a UTF16 buffer, then convert them to UTF8 while at it, protect buffer overrun strcat - > wcsNcat
Configuration menu - View commit details
-
Copy full SHA for c5f207c - Browse repository at this point
Copy the full SHA c5f207cView commit details -
Information queried in Registry can be WideChar if -DUNICODE
We want to generate an UTF8 report (do we really?). So provide a `RegLookupUTF8String`, so as to make all the queries with W variant, then convert all information to UTF8 While at it, use snprintf instead of sprintf and strncat instead of strcat
Configuration menu - View commit details
-
Copy full SHA for d9b3927 - Browse repository at this point
Copy the full SHA d9b3927View commit details -
Make stderrName and stdoutName be WCHAR*
There was a mixture of TCHAR* and char* which could not work with -DUNICODE Who knows, the TempPath might be localized, so go UNICODE...
Configuration menu - View commit details
-
Copy full SHA for 8b14fbf - Browse repository at this point
Copy the full SHA 8b14fbfView commit details -
Retract support for Windows95 (no you don't dream it's nearly 2019!)
I'm very sorry to dilapidate all this historical knowledge, but frankly YAGNI! Even if we are a museum, we can't expose all our art in permanent collections! Note that minimal version is already set to XP (via WINVER:=-D_WIN32_WINNT=0x0501 -DWINVER=0x0501 in Makefile.tools) So we're keeping this stuf for nothing, and now it gets in our way to UNICODE.
Configuration menu - View commit details
-
Copy full SHA for 643a5e3 - Browse repository at this point
Copy the full SHA 643a5e3View commit details -
Handle UTF16 logName and serviceName
These are the last two obstacles that generate compiler warnings with -DUNICODE No compiler warning does not mean that UNICODE is OK and ready to go. It means that we have at least eliminated all the trivial TCHAR*/char*/WCHAR* mismatch (and there were a bunch of them!!!) Also the code is like a battlefield with lot of different ad hoc recipes, that would deserve more uniform approach (refactoring) But small steps! To reach the current stage, we have to overhaul printCommandLine() It can't answer a TCHAR* when sqMain expects a char*argv[]... So, like what is done in WinMain thru getCommandLineW, we first produce a Wide command line, then convert to UTF8. There is currently no provision for conversion failure: I don't know how to report and exit when the VM is ran as a service. Note that command line parsing is then working the other way around: some of the arguments are converted back to UTF16. I call this style tricotage coding: une maille à l'endroit, une maille à l'envers. Maybe a deeper change will be to WCHAR* all the way down, but let's differ this decision. Small steps! Note that another source of problems is RegQueryValueExW. When we query a WCHAR* string, it is not necessarily NULL terminated. But every answer is handled as raw un-interpretd-bytes (char *), whose byte length is answered in dwSize. That's not the WCHAR character length! So buffer[dwSize] = 0 is not the right thing: - if buffer is declared WVCHAR*, this would write the NULL well beyond the real string length (and eventually overrun the buffer); - if buffer is declared char*, this would not guaranty a terminating NULL WCHAR (we need 2 zero bytes to make a NULL WCHAR). Usage of this function will require a careful review, the price of low level API... Also note that we seem to compile with flag -DNO_SERVICE right now. Why? I have not enough knowledge in this area and lack tests/examples. In that conditions, it's not easy to test the modifications! Any help will be greatly appreciated. The good part is that It won't break current VM usage... You know what I think of dead-code, but half alive Frankenstein code scares me as well ;)
Configuration menu - View commit details
-
Copy full SHA for 1f61637 - Browse repository at this point
Copy the full SHA 1f61637View commit details
Commits on Jan 1, 2019
-
Configuration menu - View commit details
-
Copy full SHA for e5cd4fb - Browse repository at this point
Copy the full SHA e5cd4fbView commit details -
Fix my own confusion about wcsncat
wcsncat does not care about the write limit! (buffer overrun). It only specifies the maximum number of characters to read from source... This way, we pay one more Shlemiel the painter run, and write hyper convoluted code. Nice!
Configuration menu - View commit details
-
Copy full SHA for 4ded318 - Browse repository at this point
Copy the full SHA 4ded318View commit details -
Fix: Shlemiel the painter needs to paint strncat too
Like wcsncat, the ugly and correct usage is strncat( dest, src, sizeof(dest) - 1 - strlen(dest) );
Configuration menu - View commit details
-
Copy full SHA for af19ed7 - Browse repository at this point
Copy the full SHA af19ed7View commit details -
Fix a TCHAR*crashInfo trying to mix a char*msg
Here, i don't want to redefine error() to take a TCHAR* So the simplest alternative is to switch to MessageBoxA But vmLogDirA may contain UNICODE, and UTF8 encoded character may be mangled in the MessageBox. The right way is to switch to W variant unconditionnally, and interpret msg as UTF8...
Configuration menu - View commit details
-
Copy full SHA for bf3840c - Browse repository at this point
Copy the full SHA bf3840cView commit details
Commits on Jan 2, 2019
-
Fix another potential Buffer overrun in sqWin32MIDI.c
The joy of 0-based indices... [skip travis]
Configuration menu - View commit details
-
Copy full SHA for 252e2a8 - Browse repository at this point
Copy the full SHA 252e2a8View commit details -
fixup: skip only 18 WCHAR if keyName begins with \\registry\\machine
Note that keyName was declared char * in the original code. But now it is WCHAR*, so keyName+18 already does the right thing (skip 18 char) while modified code would now skip 36! Note that original code was using memcpy for this case of overlapping memory which is BAD and eventually UB. [skip travis]
Configuration menu - View commit details
-
Copy full SHA for 3e51616 - Browse repository at this point
Copy the full SHA 3e51616View commit details