Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win64 unicode #332

Merged
merged 36 commits into from
Jan 2, 2019
Merged

Win64 unicode #332

merged 36 commits into from
Jan 2, 2019

Commits on Dec 30, 2018

  1. Handle both an ASCII (UTF8) and a WIDE (UTF16) version of image/vm na…

    …me/path
    
    Note:
    the Microsoft windows API mostly uses the W version (for enabling internationalized image name/path)
    the image uses UTF8 encoded bytes string for communication with the VM (this is best for compatibility with Unix/Mac)
    The idea here is that the implementation maintains both versions of the UTF8 and UTF16 path/name
    
    The appropriate macro returning a TCHAR * are also provided.
    This is in order to support the generic version using TCHAR, which are normally used to ease transition to UNICODE.
    
    Note about string length:
    No effort has been made so far to support long path names for image and VM.
    The path is limited to MAX_PATH in UTF16.
    UTF8 can eventually consume more characters than UTF16 (not necessarily more bytes).
    Thus, the ASCII version has been made longer (MAX_PATH_UTF8) in order to avoid an even more restrictive limit.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    ea3d51e View commit details
    Browse the repository at this point in the history
  2. DropPlugin: modernize OpenFile/_lwrite/_lclose API

    OK, OK, they are compatible with 16 bits windows ;)
    But these are not recommended any more especially if we want to go toward UNICODE paths everywhere.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    8bdd0b6 View commit details
    Browse the repository at this point in the history
  3. Drop plugin: let tempPathName be Wide

    This is for the generated temporary file $$squeak$$.bmp
    The path to temporary directory could be non ASCII, so let's be more robust to UNICODE.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    cb32e41 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    85e0883 View commit details
    Browse the repository at this point in the history
  5. Setting fp flags _MCW_PC and _MCW_IC is not supported on ARM nor X64

    See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/control87-controlfp-control87-2?view=vs-2017
    Setting these flags triggers an assertion failure when compiled with MSVC in debug mode
    
        Assertion failure (mask&~(_MCW_DN|_MCW_EM|_MCW_RC))==0
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    f689145 View commit details
    Browse the repository at this point in the history
  6. The pageSize and pageMask are too short on WIN64

    Due to this, roundUpToPage is truncating addresses > 2^32
    With MSVC, minAddress is > 2^32, and it then takes ages to generate a VirtualAlloc > minAddress (several minutes).
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    49eff31 View commit details
    Browse the repository at this point in the history
  7. Use the eventually true UNICODE imageNameT if -DUNICODE

    using `toUnicode` does not do the right thing: it promotes each UTF8 byte code to short...
    ... which can hardly work beyond ASCII.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    627bc5e View commit details
    Browse the repository at this point in the history
  8. interpret pluginName as UTF8 rather than pure ASCII

    toUnicode is just expanding bytes to short, and re-interpret them UTF16, not regarding original encoding.
    This works well for pure ASCII, but is not really defined for other encodings.
    We should prefer using UTF8 by default for all the image-VM string transfer.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    4 Configuration menu
    Copy the full SHA
    98fc85d View commit details
    Browse the repository at this point in the history
  9. refactor sqMessageBox to void using toUnicode

    Why to get rid of toUnicode fromUnicode fromSqueak fromSqueak2?
    
    - we'd rather use UTF8 everywhere.
    - and it's the last usage remaining!
    
    Now the clients must use a TEXT() macro also for the format. In the future, we could eventually translate the VM messages...
    
    Get rid of wsprintf variant which has no character limit and might be prone to buffer overrun.
    An alternative for supporting both ASCII and WIDE is the _tcs* family in <tchar.h>
    See https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/vsprintf-vsprintf-l-vswprintf-vswprintf-l-vswprintf-l?view=vs-2017
    In order to avoid buffer overrun, prefer a vsnprintf variant
    
    Note: I did also update the prototype in sqWin32SpurAlloc, but not the contents.
    It is dead code, and only there for testing purposes.
    We'd better remove it!
    nicolas-cellier-aka-nice committed Dec 30, 2018
    3 Configuration menu
    Copy the full SHA
    4f6f191 View commit details
    Browse the repository at this point in the history
  10. Discard now unused toUnicode fromUnicode fromSqueak fromSqueak2 lstrrchr

    For UNICODE compatibility,
    
    - every String coming from image to the VM should better be interpreted UTF8, and converted to wide String via MultiByteToWideChar()
    - every String going to the image from the VM should better be converted from Wide string to UTF8 via WideCharToMultiByte()
    
    See:
    https://docs.microsoft.com/en-us/windows/desktop/api/stringapiset/nf-stringapiset-multibytetowidechar
    https://docs.microsoft.com/en-us/windows/desktop/api/stringapiset/nf-stringapiset-widechartomultibyte
    
    Side note: there is also a _tcsrrchr in <tchar.h> at least since visual studio 2015
    See
    https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strrchr-wcsrchr-mbsrchr-mbsrchr-l?view=vs-2017
    <tchar.h> is also supported by mingw so if ever we need lstrrchr again, we'll use that.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    1 Configuration menu
    Copy the full SHA
    545ec0a View commit details
    Browse the repository at this point in the history
  11. #if 0 ? YAGNI ! we now use builtin _WIN32 _WIN64 anyway

    Reminder: even in WIN64, _WIN32 is defined, so the comment was a bit misleading anyway.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    2024d43 View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    e1e83f7 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    8638522 View commit details
    Browse the repository at this point in the history
  14. Platform specific knowledge: WIN32_FILE and STD_FILE are mutually exc…

    …lusive options
    
    Therefore, when we define `WIN32_FILE_SUPPORT` we must also define `NO_STD_FILE_SUPPORT`
    Until now, this knowledge was in build.win*/common/Makefile.*
    (and a legacy MSVC project in plaftorms/win32/misc/Squeak.dsp - on pourrait réduire la voilure !!!)
    Since it is easy to forget the second define, lets extend the knowledge in the sqPlatformSpecific.h
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    1893512 View commit details
    Browse the repository at this point in the history
  15. We can't compare a TCHAR*windowClassName with a char*buf

    Even if buf were re-interpreted as WCHAR* when -DUNICODE,
    strcmp wouldn't do the right thing (it will stop at first ASCII because high 8 bits will be zero!)
    We thus use the TCHAR*dedicated `_tcscmp`.
    If ever we want to switch to UTF16 (WCHAR*) windowClassName, then it will be `wcscmp`.
    nicolas-cellier-aka-nice committed Dec 30, 2018
    Configuration menu
    Copy the full SHA
    f250415 View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2018

  1. Tu quoque NewspeakVM, frustra TEXT macro...

    That's the limit of using compiler warnings: we only focus on the sections we compile...
    BTW, ifdef NewspeakVM, OK, but what do Pharo people think about it?
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    84a8d17 View commit details
    Browse the repository at this point in the history
  2. gai_strerror returns a TCHAR*, we cannot simply fprintf it...

    Choose the UNICODE variant, because error messages are presumably localized an may use non ASCII characters
    The alternative would be to use `_ftprintf(stderr,TEXT("%s"),gai_strerror(gaiError))` and let -DUNICODE decide...
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    7d3264e View commit details
    Browse the repository at this point in the history
  3. Let lookup account for NULL terminating a WCHAR*

    The low level `RegQueryValueEx` deals with a char*, but here char* just means some un-interpreted bytes, not a string!
    If we compile with `-DUNICODE` the un-interpreted bytes will contain a WCHAR*
    And if we want to properly NULL terminate this WCHAR*, then we need 2 null bytes, not 1!
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    6ff9625 View commit details
    Browse the repository at this point in the history
  4. Revert to ASCII only version of DnsInfo

    Rationale: there's no urge in providing localized UNICODE info...
    That's IP addresses, etc...
    Eventually, server names could be UNICODE but this seems to be a real mess!
    The short term goal is to enable compilation with -DUNICODE
    For longer term, we'll see later.
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    32840ac View commit details
    Browse the repository at this point in the history
  5. iconPath is char*, LoadImage expects a TCHAR*

    We now interpret iconPath as UTF-8 encoded
    We convert it to WideChar and call the W version.
    TODO: for now, do not deal with UNC long filenames...
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    475d84c View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    ef245b6 View commit details
    Browse the repository at this point in the history
  7. DPRINTF must take a TCHAR*fmt because wvsprintf does!

    though, vfprintf does not, so reconcile by using _vftprintf from <tchar.h>
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    07ff6a6 View commit details
    Browse the repository at this point in the history
  8. NOTIFYICONDATA.szTip maybe a WCHAR* if -DUNICODE

    so let's use appropriate TCHAR* functions/macros
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    dfe4d09 View commit details
    Browse the repository at this point in the history
  9. _DISPLAY_DEVICE.DeviceString may be a WCHAR* if -DUNICODE, we cannot …

    …simply sprintf
    
    We have to test #ifdef UNICODE, and if so, convert to UTF8
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    db33158 View commit details
    Browse the repository at this point in the history
  10. VM_VERSION_TEXT is a TCHAR*, we cannot simply fprintf

    But there is no need for UNICODE in VM_VERSION_TEXT
    Revert to plain ASCII and rename it VM_VERSION_VERBOSE
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    46bd992 View commit details
    Browse the repository at this point in the history
  11. iniName, manufacturer and model may be WCHAR* if -DUNICODE

    Let iniName be WCHAR unconditionally.
    Get manufacturer and model into a UTF16 buffer, then convert them to UTF8
    while at it, protect buffer overrun strcat - > wcsNcat
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    c5f207c View commit details
    Browse the repository at this point in the history
  12. Information queried in Registry can be WideChar if -DUNICODE

    We want to generate an UTF8 report (do we really?).
    So provide a `RegLookupUTF8String`, so as to make all the queries with W variant,
    then convert all information to UTF8
    
    While at it, use snprintf instead of sprintf and strncat instead of strcat
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    d9b3927 View commit details
    Browse the repository at this point in the history
  13. Make stderrName and stdoutName be WCHAR*

    There was a mixture of TCHAR* and char* which could not work with -DUNICODE
    Who knows, the TempPath might be localized, so go UNICODE...
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    8b14fbf View commit details
    Browse the repository at this point in the history
  14. Retract support for Windows95 (no you don't dream it's nearly 2019!)

    I'm very sorry to dilapidate all this historical knowledge, but frankly YAGNI!
    Even if we are a museum, we can't expose all our art in permanent collections!
    Note that minimal version is already set to XP (via WINVER:=-D_WIN32_WINNT=0x0501 -DWINVER=0x0501 in Makefile.tools)
    So we're keeping this stuf for nothing, and now it gets in our way to UNICODE.
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    643a5e3 View commit details
    Browse the repository at this point in the history
  15. Handle UTF16 logName and serviceName

    These are the last two obstacles that generate compiler warnings with -DUNICODE
    No compiler warning does not mean that UNICODE is OK and ready to go.
    It means that we have at least eliminated all the trivial TCHAR*/char*/WCHAR* mismatch (and there were a bunch of them!!!)
    Also the code is like a battlefield with lot of different ad hoc recipes, that would deserve more uniform approach (refactoring)
    But small steps!
    
    To reach the current stage, we have to overhaul printCommandLine()
    It can't answer a TCHAR* when sqMain expects a char*argv[]...
    So, like what is done in WinMain thru getCommandLineW, we first produce a Wide command line, then convert to UTF8.
    There is currently no provision for conversion failure: I don't know how to report and exit when the VM is ran as a service.
    
    Note that command line parsing is then working the other way around: some of the arguments are converted back to UTF16.
    I call this style tricotage coding: une maille à l'endroit, une maille à l'envers.
    Maybe a deeper change will be to WCHAR* all the way down, but let's differ this decision. Small steps!
    
    Note that another source of problems is RegQueryValueExW.
    When we query a WCHAR* string, it is not necessarily NULL terminated.
    But every answer is handled as raw un-interpretd-bytes (char *), whose byte length is answered in dwSize.
    That's not the WCHAR character length! So buffer[dwSize] = 0 is not the right thing:
    
    - if buffer is declared WVCHAR*, this would write the NULL well beyond the real string length (and eventually overrun the buffer);
    - if buffer is declared char*, this would not guaranty a terminating NULL WCHAR (we need 2 zero bytes to make a NULL WCHAR).
    
    Usage of this function will require a careful review, the price of low level API...
    
    Also note that we seem to compile with flag -DNO_SERVICE right now. Why?
    I have not enough knowledge in this area and lack tests/examples.
    In that conditions, it's not easy to test the modifications! Any help will be greatly appreciated.
    The good part is that It won't break current VM usage...
    You know what I think of dead-code, but half alive Frankenstein code scares me as well ;)
    nicolas-cellier-aka-nice committed Dec 31, 2018
    Configuration menu
    Copy the full SHA
    1f61637 View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2019

  1. Configuration menu
    Copy the full SHA
    e5cd4fb View commit details
    Browse the repository at this point in the history
  2. Fix my own confusion about wcsncat

    wcsncat does not care about the write limit! (buffer overrun).
    It only specifies the maximum number of characters to read from source...
    
    This way, we pay one more Shlemiel the painter run, and write hyper convoluted code.
    Nice!
    nicolas-cellier-aka-nice committed Jan 1, 2019
    Configuration menu
    Copy the full SHA
    4ded318 View commit details
    Browse the repository at this point in the history
  3. Fix: Shlemiel the painter needs to paint strncat too

    Like wcsncat, the ugly and correct usage is
    
        strncat( dest, src, sizeof(dest)  - 1 - strlen(dest) );
    nicolas-cellier-aka-nice committed Jan 1, 2019
    Configuration menu
    Copy the full SHA
    af19ed7 View commit details
    Browse the repository at this point in the history
  4. Fix a TCHAR*crashInfo trying to mix a char*msg

    Here, i don't want to redefine error() to take a TCHAR*
    So the simplest alternative is to switch to MessageBoxA
    But vmLogDirA may contain UNICODE, and UTF8 encoded character may be mangled in the MessageBox.
    
    The right way is to switch to W variant unconditionnally, and interpret msg as UTF8...
    nicolas-cellier-aka-nice committed Jan 1, 2019
    Configuration menu
    Copy the full SHA
    bf3840c View commit details
    Browse the repository at this point in the history

Commits on Jan 2, 2019

  1. Fix another potential Buffer overrun in sqWin32MIDI.c

    The joy of 0-based indices...
    [skip travis]
    nicolas-cellier-aka-nice committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    252e2a8 View commit details
    Browse the repository at this point in the history
  2. fixup: skip only 18 WCHAR if keyName begins with \\registry\\machine

    Note that keyName was declared char * in the original code.
    But now it is WCHAR*, so keyName+18 already does the right thing (skip 18 char) while modified code would now skip 36!
    Note that original code was using memcpy for this case of overlapping memory which is BAD and eventually UB.
    [skip travis]
    nicolas-cellier-aka-nice committed Jan 2, 2019
    Configuration menu
    Copy the full SHA
    3e51616 View commit details
    Browse the repository at this point in the history