Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for ARM64 Windows #8142

Open
wants to merge 29 commits into
base: master
Choose a base branch
from

Conversation

cocoa-xu
Copy link
Contributor

Hi, this PR includes initial patches and instructions to bring Erlang/OTP to ARM64 Windows. These changes should only affect the behaviour when running in WSL and user executes one of the following command to set up the build environment

  • ./otp_build env_win32 arm64, set up build environment for compiling ARM64 Erlang/OTP natively,
  • ./otp_build env_win32 x64_arm64, set up build environment for cross-compiling ARM64 Erlang/OTP on an x86_64 machine

Although I understand that ARM64 Windows might not be anywhere close to the main tier of support yet, this PR can serve as an initial guide for people in the community who're also interested in running Erlang/OTP natively on ARM64 Windows.

@CLAassistant
Copy link

CLAassistant commented Feb 17, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

github-actions bot commented Feb 17, 2024

CT Test Results

   10 files    248 suites   3h 34m 33s ⏱️
3 024 tests 2 887 ✅ 137 💤 0 ❌
4 837 runs  4 457 ✅ 380 💤 0 ❌

Results for commit 551751f.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@cocoa-xu cocoa-xu force-pushed the cx/windows-arm64 branch 4 times, most recently from 1b08767 to 73bc87e Compare February 18, 2024 13:59
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 19, 2024

Hi, I have also managed to craft a GitHub workflow that can build it natively on an ARM64 Windows machine, and here is the workflow log, the corresponding CI yaml file and the final installer! This build was based on 6c6b409.

There're still some issues when cross-compiling on x86_64 machine, which I'll look into later.

Other than that, all libraries except wx and odbc build without any issue, ARM64 JIT also seem to work. Although I haven't got time to do any thorough tests yet.

And one more thing is to adjust nsis scripts where relevant, such as defaulting to Program Files (Arm) and adding the vcredist_arm64.exe file.

Copy link
Contributor

@jhogberg jhogberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I've got a few questions in the comments below :)

erts/emulator/sys/win32/erl_win_sys.h Outdated Show resolved Hide resolved
erts/etc/win32/wsl_tools/vc/cc.sh Outdated Show resolved Hide resolved
lib/common_test/configure Show resolved Hide resolved
elif [ X"$CONFIG_SUBTYPE" = X"arm64" ]; then
GCC="aarch64-w64-mingw32-gcc -m64"
elif [ X"$CONFIG_SUBTYPE" = X"x64_arm64" ]; then
GCC="aarch64-w64-mingw32-clang -m64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time following what x64_arm64 is supposed to mean and how it differs from plain arm64, and an even harder time seeing why one uses gcc and the other clang. Can you elaborate a bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jhogberg, x64_arm64 is used when cross-compiling to arm64 windows on an x86_64 machine because we need to call vcvarsall.bat in erts/etc/win32/wsl_tools/SetupWSLcross.bat with different parameters to get right msvc toolchain, for example,

  • call vcvarsall.bat arm64 if compile natively
  • call vcvarsall.bat amd64_arm64 if cross-compile on x86_64 machines

And I think that it's somewhat awkward naming on Windows because win32 and win64 traditionally only stands for 32-bit and 64-bit respectively, but they actually means x86 and x86_64. I wonder if we should standardise these naming while providing "aliases" so that build script won't break. I'm think that adding a universal entry point for setting up environment variables:

./otp_build env_win [<BUILD_ARCH>-]<TARGET_ARCH>

where BUILD_ARCH and TARGET_ARCH each can be one of x86, x86_64 and aarch64.

For example,

x86 target

For compiling natively on x86 windows, env_win32 should be the alias of env_win x86 (or vice-versa). The following commands should behave the same

  • ./otp_build env_win x86, universal entry point
  • ./otp_build env_win32, alias for backward compatibility

For cross-compiling, we will have

  • ./otp_build env_win x86_64-x86, universal entry point
  • ./otp_build env_win aarch64-x86, universal entry point

x86_64 target

Similarly, the following commands should behave the same

  • ./otp_build env_win x86_64, universal entry point
  • ./otp_build env_win32 x64, alias for backward compatibility
  • ./otp_build env_win32 x64, alias for backward compatibility
  • ./otp_build env_win64 x64, alias for backward compatibility

For cross-compiling, we will have

  • ./otp_build env_win x86-x86_64, universal entry point
  • ./otp_build env_win aarch64-x86_64, universal entry point

aarch64 target

For native compiling,

  • ./otp_build env_win aarch64, universal entry point

Maybe we can have these aliases which would follow the convention:

  • ./otp_build env_win32 x86-aarch64
  • ./otp_build env_win32 x64-aarch64
  • ./otp_build env_win64 x86-aarch64
  • ./otp_build env_win64 x64-aarch64

And for cross-compiling, we will have

  • ./otp_build env_win x86-aarch64 , universal entry point
  • ./otp_build env_win x86_64-aarch64, universal entry point

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why using clang for cross-compiling is that aarch64-w64-mingw32-gcc seems to crash the windows machine😅 I'll double verify this on another x86_64 windows machine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What GCC version is aarch64-w64-mingw32-gcc? There's a known bug in GCC 7 (maybe 8 or 9 too?) where it segfaults while compiling the JIT. If so, I don't think we want this workaround.

As for the other stuff, I think @dgud is more equipped to answer than me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What GCC version is aarch64-w64-mingw32-gcc? There's a known bug in GCC 7 (maybe 8 or 9 too?) where it segfaults while compiling the JIT. If so, I don't think we want this workaround.

It's actually LLVM with a wrapper script, https://github.com/mstorsjo/llvm-mingw/releases. I tested aarch64-w64-mingw32-gcc on a physical x86_64 machine, and it seems to be okay now (as I'm not able to do a full build -- there're still some issues when cross-compiling).

I think the reason was that I was testing on a virtual machine, and perhaps there were some bugs in the virtualization layer/software.

erts/etc/win32/wsl_tools/reg_query.sh Outdated Show resolved Hide resolved
@jhogberg jhogberg self-assigned this Feb 19, 2024
@jhogberg jhogberg added the team:VM Assigned to OTP team VM label Feb 19, 2024
@jhogberg jhogberg added the team:PS Assigned to OTP team PS label Feb 19, 2024
@cocoa-xu cocoa-xu force-pushed the cx/windows-arm64 branch 7 times, most recently from b590118 to becdd6d Compare February 19, 2024 13:20
erts/etc/win32/wsl_tools/vc/cc.sh Show resolved Hide resolved
make/autoconf/otp.m4 Outdated Show resolved Hide resolved
otp_build Outdated Show resolved Hide resolved
updated installation guide build with native-ethr-impls enable

updated installation guide
arm64 jit now works on windows
fix cross-compile on x86_64 for aarch64
@cocoa-xu
Copy link
Contributor Author

so that I can get it to install Erlang/OTP in C:\Program Files (Arm) by default.

Program Files (Arm) is for Arm32 applications; 64-bit applications should always be installed in Program Files.

ref: https://learn.microsoft.com/en-us/windows/arm/faq#what-program-files-folder-should-i-use-to-install-my-arm64-application-

Oh I didn't know that before. Thanks for the information! Gonna update the nsis script file!

erts/etc/win32/nsis/erlang20.nsi Outdated Show resolved Hide resolved
erts/etc/win32/nsis/find_redist.sh Outdated Show resolved Hide resolved
cocoa-xu and others added 2 commits February 20, 2024 15:32
Co-authored-by: Dan Gudmundsson <dangud@gmail.com>
@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 20, 2024

And here we have the test results. I'm not sure which of these files should I archive and upload here (and also because the test result directory is about ~1.5 GB, and 259MB after compressing), so I only did a screenshot for now. Please let me know if you need me to upload the test results.

tests results below is outdated

![IMG_6891CE8947DB-1](https://github.com/erlang/otp/assets/89497197/0859f782-a722-4059-b321-fc9fe964c5f4)

@jhogberg
Copy link
Contributor

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 20, 2024

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

Hi, I decided to upload the test result here, https://github.com/cocoa-xu/otp-build/releases/download/v27.0-rc1/test_server-otp_win_arm64_27.0-rc1.tar.xz, and using xz reduces its size to 75MB. It will be easier to check all relevant logs, as well as for other people to look and maybe give any feedbacks.

It corresponds to this installer, https://github.com/cocoa-xu/otp-build/releases/download/v27.0-rc1/otp_win_arm64_27.0-rc1.exe.

Also please let me know if any tests need a re-run, and I'll update the test results. :)

@cocoa-xu
Copy link
Contributor Author

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

As for the ones that cause crashes, they're bs_construct_SUITE:error_info/1 and logger_disk_log_h_SUITE:disk_log_full/1

bs_construct_SUITE:error_info/1

=== Test case: [bs_construct_SUITE:error_info/1](file:///Users/cocoa/Downloads/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-20_06.40.17/tests.emulator_test.logs/run.2024-02-20_06.40.46/bs_construct_suite.src.html#error_info-1) (click for source code)

=== Config value:

    [{watchdog,<0.115844.16>},
     {tc_logfile,"c:/src/release/tests/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-20_06.40.17/tests.emulator_test.logs/run.2024-02-20_06.40.46/bs_construct_suite.error_info.html"},
     {tc_group_properties,[]},
     {tc_group_path,[]},
     {data_dir,"c:/src/release/tests/emulator_test/bs_construct_SUITE_data/"},
     {priv_dir,"c:/src/release/tests/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-20_06.40.17/tests.emulator_test.logs/run.2024-02-20_06.40.46/log_private/"},
     {nodenames,[]}]

=== Current directory is "c:/src/release/tests/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-20_06.40.17"

=== Started at 2024-02-20 06:45:21



<< Atom / binary , Binary / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< Binary / binary , Atom / binary >>: segment 2 of type 'binary': expected a binary but got: some_atom

<< 1 : 32 , Binary / binary , Atom / binary >>: segment 3 of type 'binary': expected a binary but got: some_atom

<< 1 : 32 , "xyz" , Binary / binary , Atom / binary >>: segment 4 of type 'binary': expected a binary but got: some_atom

<< Atom : 32 >>: segment 1 of type 'integer': expected an integer but got: some_atom

<< Atom : ( id ( 32 ) ) >>: segment 1 of type 'integer': expected an integer but got: some_atom

<< LongList : 32 >>: segment 1 of type 'integer': expected an integer but got: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19|...]

<< 42 : Atom >>: segment 1 of type 'integer': expected a non-negative integer as size but got: some_atom

<< Atom : 32 >>: segment 1 of type 'integer': expected an integer but got: some_atom

<< 42 : NegSize >>: segment 1 of type 'integer': expected a non-negative integer as size but got: -1

<< 42 : HugeNegSize >>: segment 1 of type 'integer': expected a non-negative integer as size but got: -18446744073709551616

<< 42 : ( 1 bsl 58 ) / unit : 255 >>: segment 1 of type 'integer': the size 288230376151711744 is too large

<< 42 : ( 1 bsl 60 ) / unit : 8 >>: segment 1 of type 'integer': the size 1152921504606846976 is too large

<< 42 : ( 1 bsl 64 ) >>: segment 1 of type 'integer': the size 18446744073709551616 is too large

<< Atom : 10 / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< Atom : ( id ( 10 ) ) / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< Binary : Atom / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: some_atom

<< Binary : NegSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: -1

<< Binary : HugeNegSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: -18446744073709551616

<< Binary : BadSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: bad_size

<< BadBinary : BadSize / binary >>: segment 1 of type 'binary': expected a non-negative integer as size but got: bad_size

<< Binary : 10 / binary >>: segment 1 of type 'binary': the value <<"abc">> is shorter than the size of the segment

<< Binary : ( id ( 10 ) ) / binary >>: segment 1 of type 'binary': the value <<"abc">> is shorter than the size of the segment

<< Atom / binary >>: segment 1 of type 'binary': expected a binary but got: some_atom

<< ( id ( << 1 : 1 >> ) ) / binary >>: segment 1 of type 'binary': the size of the value <<1:1>> is not a multiple of the unit for the segment

<< ( id ( << 0 : 1111 >> ) ) / binary >>: segment 1 of type 'binary': the size of the value <<0,0,0,0,0,0,0,0...,0,0:7>> is not a multiple of the unit for the segment

<< 0 , ( id ( << 1 : 1 >> ) ) / binary >>: segment 2 of type 'binary': the size of the value <<1:1>> is not a multiple of the unit for the segment

<< 0 , ( id ( << 0 : 1111 >> ) ) / binary >>: segment 2 of type 'binary': the size of the value <<0,0,0,0,0,0,0,0...,0,0:7>> is not a multiple of the unit for the segment

<< Binary : ( 1 bsl 64 ) / binary >>: segment 1 of type 'binary': the size 18446744073709551616 is too large

<< Binary : ( id ( 1 bsl 64 ) ) / binary >>: segment 1 of type 'binary': the size 18446744073709551616 is too large

<< Atom : 64 / float >>: segment 1 of type 'float': expected a float or an integer but got: some_atom

<< Atom : Atom / float >>: segment 1 of type 'float': expected a non-negative integer as size but got: some_atom

<< 42.0 : NegSize / float >>: segment 1 of type 'float': expected a non-negative integer as size but got: -1

<< 42.0 : HugeNegSize / float >>: segment 1 of type 'float': expected a non-negative integer as size but got: -18446744073709551616

<< 42.0 : ( id ( 1 ) ) / float >>: segment 1 of type 'float': expected one of the supported sizes 16, 32, or 64 but got: 1

<< HugeBig : ( id ( 64 ) ) / float >>: segment 1 of type 'float': the value 350746621104...713208549376 is outside the range expressible as a float

<< HugeBig : 64 / float >>: segment 1 of type 'float': the value 350746621104...713208549376 is outside the range expressible as a float

<< 42.0 : ( id ( 1 bsl 64 ) ) / float >>: segment 1 of type 'float': the size 18446744073709551616 is too large

<< Atom / utf8 >>: segment 1 of type 'utf8': expected a non-negative integer encodable as utf8 but got: some_atom

<< Atom / utf16 >>: segment 1 of type 'utf16': expected a non-negative integer encodable as utf16 but got: some_atom

<< Atom / utf32 >>: segment 1 of type 'utf32': expected a non-negative integer encodable as utf32 but got: some_atom

<< 0 : ( MaxSmall ) / unit : 32 , 0 : 64 >>: segment 1 of type 'integer': the size 576460752303423487 is too large

<< 0 : ( MaxSmall ) / unit : 32 , ( id ( 0 ) ) : 64 >>: segment 1 of type 'integer': the size 576460752303423487 is too large

logger_disk_log_h_SUITE:disk_log_full/1

*** System report during logger_disk_log_h_SUITE:disk_log_full/1 in tty 2024-02-20 07:35:45.575 ***[🔗](file:///Users/cocoa/Downloads/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-20_06.45.23/tests.kernel_test.logs/run.2024-02-20_06.45.55/logger_disk_log_h_suite.disk_log_full.html#e-1363)
=ERROR REPORT==== 20-Feb-2024::07:35:45.575000 ===
** Generic server <0.106479.0> terminating 
** Last message in was {disk_log,'test_server@COCOA-ARM64-PC',
                                 logger_disk_log_h_SUITE,full}
** When Server state == #{id => logger_disk_log_h_logger_disk_log_h_SUITE,
                          module => logger_h_common,mode => async,
                          idle => false,burst_limit_enable => true,
                          drop_mode_qlen => 200,flush_qlen => 1000,
                          sync_mode_qlen => 10,burst_msg_count => 9,
                          burst_win_ts => -24019194990094072,
                          cb_state =>
                              #{id => logger_disk_log_h_SUITE,
                                module => logger_disk_log_h,
                                filesync_repeat_interval => 5000,
                                ctrl_sync_count => 11,
                                handler_state =>
                                    #{log_opts =>
                                          #{type => halt,
                                            file =>
                                                "c:/src/release/tests/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-20_06.45.23/tests.kernel_test.logs/run.2024-02-20_06.45.55/log_private/disk_log_full",
                                            max_no_bytes => 50,
                                            max_no_files => 1},
                                      prev_log_result => ok,
                                      prev_sync_result => ok,
                                      prev_disk_log_info => undefined},
                                last_op => write,
                                rep_sync_tref =>
                                    #Ref<0.3961621595.2628255746.191094>},
                          last_load_ts => -24019194990093602,last_qlen => 0,
                          mode_ref =>
                              {logger_olp,
                                  logger_disk_log_h_logger_disk_log_h_SUITE},
                          overload_kill_restart_after => 5000,
                          burst_limit_max_count => 500,
                          burst_limit_window_time => 1000,
                          overload_kill_enable => false,
                          overload_kill_mem_size => 3000000,
                          overload_kill_qlen => 20000}
** Reason for termination ==
** {badarg,[{erlang,display_string,
                    ["Logger - error: "],
                    [{file,"erlang.erl"},
                     {line,2139},
                     {error_info,#{cause => 'The handle is invalid.\r\n',
                                   module => erl_erts_errors}}]},
            {logger,internal_log,2,[{file,"logger.erl"},{line,726}]},
            {logger_disk_log_h,error_notify_new,4,
                               [{file,"logger_disk_log_h.erl"},{line,365}]},
            {logger_h_common,handle_info,2,
                             [{file,"logger_h_common.erl"},{line,310}]},
            {logger_olp,try_callback_call,4,
                        [{file,"logger_olp.erl"},{line,617}]},
            {logger_olp,handle_info,2,[{file,"logger_olp.erl"},{line,271}]},
            {gen_server,try_handle_info,3,
                        [{file,"gen_server.erl"},{line,2158}]},
            {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2246}]}]}

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 20, 2024

It seems that the emulator and kernel tests crashed hard, can you see which tests it crashed on?

new tests results are available in the reply below

outdated test results Updates for the `emulator` tests. I looked and removed the lines that cause hard-crashes, and now the test results is
2316 Ok, 12 Failed, 208 Skipped of 2536

hard-crashes

common

In

  • bs_construct_SUITE:error_info
  • bs_construct_no_opt_SUITE:error_info
  • bs_construct_r25_SUITE:error_info
  • bs_construct_stripped_types_SUITE:error_info
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,(id(0)):64>>),

bs_construct_no_opt_SUITE:error_info

{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,(id(0)):64>>),

failed

  • bif_SUITE:display_string
  • bs_construct_SUITE:dynamic
  • bs_construct_SUITE:error_info
  • bs_construct_no_opt_SUITE:error_info
  • bs_construct_r25_SUITE:dynamic
  • bs_construct_r25_SUITE:error_info
  • bs_construct_stripped_types_SUITE:error_info
  • hello_SUITE:hello
  • port_SUITE: pipe_limit_env
  • port_SUITE: unregister_name
  • smoke_test_SUITE:jump_table
  • time_SUITE:time_unit_conversion

failed to init

mtx_SUITE:init_per_suite
nif_SUITE init_per_suite

Re-running tests in wsl seems to make them work, will update test results in a few hours.

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 21, 2024

Updates for all tests.

I did a clean compile and removed the lines that cause hard-crashes in emulator and kernel.

For the emulator tests, the lines I removed were:

{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(id(MaxSmall))/unit:32,(id(0)):64>>),

In these test cases:

  • bs_construct_SUITE:error_info
  • bs_construct_no_opt_SUITE:error_info
  • bs_construct_r25_SUITE:error_info
  • bs_construct_stripped_types_SUITE:error_info

In addition to the lines above, I also have to remove these two lines in bs_construct_no_opt_SUITE:error_info

{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,0:64>>),
{system_limit, {1,integer,size,MaxSmall}, _} = ?ERROR_INFO(<<0:(MaxSmall)/unit:32,(id(0)):64>>),

For the kernel tests, I need to change these two tests to

disk_log_full(Config) ->
    ok.
% ...
disk_log_events(_Config) ->
    ok.

After the above changes, I ran the tests in WSL (which was done in Powershell before), and the latest test results are:

test_results

Also, sorry that I'm not sure how I should properly remove/truncate successful tests to save space, so I still have to upload the full test results as a GitHub release file here, as the attachment file cannot exceed 25MBs in PR comments.

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 21, 2024

And I was wondering if and which ones of these failed test cases should we address in this PR? Perhaps the ones in the emulator, kernel and system?

@@ -6,6 +6,7 @@
"installation_guide/INSTALL.md": [source: "../../HOWTO/INSTALL.md"],
"installation_guide/INSTALL-CROSS.md": [source: "../../HOWTO/INSTALL-CROSS.md"],
"installation_guide/INSTALL-WIN32.md": [source: "../../HOWTO/INSTALL-WIN32.md"],
"installation_guide/INSTALL-WIN32-ARM64.md": [source: "../../HOWTO/INSTALL-WIN32-ARM64.md"],
Copy link
Contributor Author

@cocoa-xu cocoa-xu Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was causing an issue when testing documentation in CI (although the file should be there?), should I remove this line for now?

@jhogberg
Copy link
Contributor

And I was wondering if and which ones of these failed test cases should we address in this PR? Perhaps the ones in the emulator, kernel and system?

The hard crashes must be fixed before we can include this in the nightly test runs, and the rest need to be investigated and fixed before we merge it. :-)

@cocoa-xu
Copy link
Contributor Author

I found that a few of these hard-crashed ones were related to erlang:display_string/{1,2} errored with bad argument. However, the call was unsuccessful because the fd passed to WriteFile is an invalid handle according to GetLastError.

#ifdef __WIN32__
    printf("[debug] we're in display_string_2 win32\r\n");
    HANDLE fd;
    if (ERTS_IS_ATOM_STR("stdout", BIF_ARG_1)) {
        fd = GetStdHandle(STD_OUTPUT_HANDLE);
        printf("[debug] and the BIF_ARG_1 is stdout, fd=%d\r\n", fd);
    } else if (ERTS_IS_ATOM_STR("stderr", BIF_ARG_1)) {
        fd = GetStdHandle(STD_ERROR_HANDLE);
        printf("[debug] and the BIF_ARG_1 is stderr, fd=%d\r\n", fd);
        if (fd == INVALID_HANDLE_VALUE) {
            printf("[debug] and fd == INVALID_HANDLE_VALUE\r\n");
        }
    }
#else
// ...
#endif

#ifdef __WIN32__
        Uint32 w;
        if (!WriteFile(fd, str, len, &w, NULL)) {
            DWORD last_error = GetLastError();
            LPTSTR error_text = NULL;
            FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_IGNORE_INSERTS, NULL, HRESULT_FROM_WIN32(last_error), MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPTSTR)&error_text, 0, NULL);
            if (error_text != NULL) {
                printf("[debug] error msg: %s\r\n", error_text);
                LocalFree(error_text);
            } else {
                printf("[debug] error msg: cannot get formatted error message\r\n");
            }
            goto error;
        }
        written = (Sint)w;
#else
// ...
#endif

The coresponding test case was changed to:

display_string(Config) when is_list(Config) ->
    # using `erlang:display` for comparison and check that we can display to console
    true = erlang:display("hej"),
    true = erlang:display_string("hej"),
    ok.

And I got this output when running this test case:

$ /mnt/c/src/release/aarch64-windows-msvc/bin/erl.exe -noinput -eval "ts:run(emulator, bif_SUITE, display_string, [batch]),erlang:halt()"
Command: erl -sname test_server -rsh ssh -env PATH "c:\src\release\tests\emulator_test;c:\src\release\AARCH6~1\ERTS-1~1.2\bin;c:\Program Files (Arm)\MICROS~1\JDK-16~1.7-H\bin;c:\PROGRA~1\PARALL~1\PARALL~1\APPLIC~1;c:\Windows\System32;c:\Windows;c:\Windows\System32\wbem;c:\Windows\System32\WindowsPowerShell\v1.0;c:\Windows\System32\OpenSSH;c:\PROGRA~1\Git\cmd;c:\PROGRA~3\CHOCOL~1\bin;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1\Scripts;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1;c:\Users\cocoa\AppData\Local\MICROS~1\WINDOW~1;c:\Users\cocoa\AppData\Local\Programs\MICROS~1\bin;c:\PERL_A~1\bin;c:\Users\cocoa\AppData\Local\GITHUB~1\bin;c:\ERLANG~2\bin;c:\Users\cocoa\ELIXIR~1.1\bin;c:\PROGRA~1\MICROS~1\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\HOSTAR~1\arm64;c:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\arm64;c:\Program Files (Arm)\MICROS~1\JDK-16~1.7-H\bin;c:\PROGRA~1\PARALL~1\PARALL~1\APPLIC~1;c:\Windows\System32;c:\Windows;c:\Windows\System32\wbem;c:\Windows\System32\WindowsPowerShell\v1.0;c:\Windows\System32\OpenSSH;c:\PROGRA~1\Git\cmd;c:\PROGRA~3\CHOCOL~1\bin;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1\Scripts;c:\Users\cocoa\AppData\Local\Programs\Python\PYTHON~1;c:\Users\cocoa\AppData\Local\MICROS~1\WINDOW~1;c:\Users\cocoa\AppData\Local\Programs\MICROS~1\bin;c:\PERL_A~1\bin;c:\Users\cocoa\AppData\Local\GITHUB~1\bin;c:\ERLANG~2\bin;c:\Users\cocoa\ELIXIR~1.1\bin" -env ERL_CRASH_DUMP c:/src/release/tests/test_server/emulator_erl_crash.dump -boot start_sasl -sasl errlog_type error -pz "c:/src/release/tests/test_server" -enable-feature maybe_expr -ct_test_vars "{net_dir,\"\"}" -eval "ts_run:ct_run_test(\"c:/src/release/tests/emulator_test\", [{abort_if_missing_suites,true},{suite,bif_SUITE},{testcase,display_string},{logdir,\"../test_server\"},{config,[\"../test_server/ts.config\",\"../test_server/ts.win32.config\"]},{vars,[{verbose,0}]},batch,{scale_timetraps,true}])"  -noinput -eval "erlang:halt(0,[{flush,false}])."

Common Test starting (cwd is c:/src/release/tests/emulator_test)


Common Test: Running make in test directories...
Recompile: bif_SUITE
Recompile: gen_tcp_dist
Recompile: ignore_cores

CWD set to: "c:/src/release/tests/test_server/ct_run.test_server@COCOA-ARM64-PC.2024-02-21_22.31.26"

Testing tests.emulator_test.bif_SUITE.display_string: Starting test, 1 test cases
"hej"
[debug] we're in display_string_2 win32
[debug] and the BIF_ARG_1 is stderr, fd=96
[debug] WriteFile went wrong
[debug] error msg: The handle is invalid.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
erlang:display_string failed on line 2139
Reason: {badarg,[{erlang,...},{...}|...]}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Testing tests.emulator_test.bif_SUITE.display_string: *** FAILED test case 1 of 1 ***
Testing tests.emulator_test.bif_SUITE.display_string: TEST COMPLETE, 0 ok, 1 failed of 1 test cases

Updating c:/src/release/tests/test_server/index.html ... done
Updating c:/src/release/tests/test_server/all_runs.html ... done

And I also attempted to use _fileno and _write instead of GetStdHandle and WriteFile but it ended up pretty the same:

#ifdef __WIN32__
    int fd;
    if (ERTS_IS_ATOM_STR("stdout", BIF_ARG_1)) {
        fd = _fileno(stdout);
    } else if (ERTS_IS_ATOM_STR("stderr", BIF_ARG_1)) {
        fd = _fileno(stderr);
    }
#else
// ...
#endif

#ifdef __WIN32__
        written = 0;
        do {
            res = _write(fd, str+written, len-written);
            if (res < 0 && errno != ERRNO_BLOCK && errno != EINTR) {
                printf("[debug] _write went wrong, errno=%d\r\n", errno);
                goto error;
            }
            written += res;
        } while (written < len);
#else
// ...
#endif

The test outputted:

Testing tests.emulator_test.bif_SUITE.display_string: Starting test, 1 test cases
"hej"
[debug] we're in display_string_2 win32
[debug] and the BIF_ARG_1 is stderr, fd=96
[debug] _write went wrong, errno=9

Yet this function seems to be fine when using erl either in PowerShell or WSL:

$ /mnt/c/src/release/aarch64-windows-msvc/bin/erl.exe
Erlang/OTP 27 [RELEASE CANDIDATE 1] [erts-14.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Eshell V14.2.2 (press Ctrl+G to abort, type help(). for help)
1> erlang:display_string("hej").
hejtrue

I wonder if we should skip this test for ARM64 Windows or using some alternatives?

@cocoa-xu
Copy link
Contributor Author

I wonder if we should skip this test for ARM64 Windows or using some alternatives?

And for this particular function, erlang:system_info(system_architecture) only returns win32 on all windows systems. Should we consider to change it to aarch64-pc-windows by changing these lines of the echo_env_wsl function in otp_build?

    if [ X"$X64" = X"true" ]; then
        echo_setenv OVERRIDE_TARGET win32 ';'
        echo_setenv CONFIG_SUBTYPE win64 ';'
    elif [ X"$ARM64" = X"arm64" -o X"$ARM64" = X"x64_arm64" ]; then
        echo_setenv OVERRIDE_TARGET aarch64-pc-windows ';'
        echo_setenv CONFIG_SUBTYPE "$ARM64" ';'
	else
        echo_setenv OVERRIDE_TARGET win32 ';'
    fi

@dgud
Copy link
Contributor

dgud commented Feb 21, 2024

Isn't win32 used as the name of the used API and we still use that on arm for windows?

@cocoa-xu
Copy link
Contributor Author

Isn't win32 used as the name of the used API and we still use that on arm for windows?

Oh it's probably just a minor thing. I was trying to find a way to detect the CPU architecture using

erlang:system_info(system_architecture).

For other systems, the result can usually be used to check the CPU architecture, for example, x86_64-pc-linux-gnu or aarch64-apple-darwin23.2.0. But for Windows we have to do another check using

os:getenv("PROCESSOR_ARCHITECTURE").

Besides that, it also doesn't follow the official document:

system_architecture
  Returns a string containing the processor and OS architecture the emulator is built for.

Maybe it's too late and would definitely break a lot of things for x86 and x86_64 Windows, but perhaps we can do it for ARM64 Windows. WDYT?

@cocoa-xu
Copy link
Contributor Author

I still can't figure out why Windows think the file handle we get from GetStdHandle is invalid, but I found a workaround, 2d79abb, and it fixed all failed test cases due to erlang:display_string/2, including:

emulator

  • bif_SUITE/display_string
  • hello_SUITE/hello

kernel

  • logger_disk_log_h_SUITE/mem_kill_new
  • logger_disk_log_h_SUITE/restart_after

I'll upload a screenshot for the emulator and kernel tests shortly.

@cocoa-xu
Copy link
Contributor Author

cocoa-xu commented Feb 24, 2024

Some explanation for commit 892019a

During my tests, I found that high in erts_time_unit_conversion can overflow and cause the test case time_unit_conversion in time_SUITE to fail. For example, on my ARM64 Windows machine when initialising erts_time_sup__.r.o.start_offset.nsec in erts_init_time_sup,

    erts_time_sup__.r.o.start_offset.nsec = (ErtsMonotonicTime)
	erts_time_unit_conversion((Uint64) abs_native_offset,
				  (Uint32) ERTS_MONOTONIC_TIME_UNIT,
				  (Uint32) 1000*1000*1000);

The value of ERTS_MONOTONIC_TIME_UNIT on my machine is 24000000, and abs_native_offset is 576460752312000000. And high will overflow in erts_time_unit_conversion, as the minimal example shown below:

#include <stdio.h>
#include <stdint.h>

uint64_t
erts_time_unit_conversion(uint64_t value,
	uint32_t from_time_unit,
	uint32_t to_time_unit)
{
	uint64_t high, low, result;
	if (value <= ~((uint64_t) 0)/to_time_unit)
		return (value*to_time_unit)/from_time_unit;

	low = value & ((uint64_t) 0xffffffff);
	high = (value >> 32) & ((uint64_t) 0xffffffff);
	
	low *= to_time_unit;
	high *= to_time_unit;
	
	high += (low >> 32) & ((uint64_t) 0xffffffff);
	low &= ((uint64_t) 0xffffffff);
	
	result = high % from_time_unit;
	high /= from_time_unit;
	printf("[!] high=%llu\n", high);
	printf("[!] high will overflow=%s\n",  (high >> 32) != 0 ? "yes" : "no");
	high <<= 32;
	printf("[!] high=%llu, result=%llu\n", high, result);
	
	result <<= 32;
	result += low;
	result /= from_time_unit;
	result += high;
	
	return result;
}

int main(int argc, char *argv[]) {
	printf("%lld\n\n", erts_time_unit_conversion(576460752312000000, 24000000, 1000*1000*1000));
	printf("%lld\n", erts_time_unit_conversion(576460752312000000, 24000000, 1000*1000));
}

The output of the above minimal code is

[!] high=5592405333
[!] high will overflow=yes
[!] high=5572453937501437952, result=9996874
5572453939290448384

[!] high=5592405
[!] high will overflow=no
[!] high=24019196580986880, result=8001996
24019198013000000

The correct value of the final result should be 24019198012999999488 yet it's too large even for uint64_t to hold it. So practically, there's no easy way to get around with it without changing the type of ErtsMonotonicTime.

And this overflow will cause all subsequent calculations depending on ERTS_MONOTONIC_OFFSET_NSEC to give wrong values. One particular case is that when the user wants to use nano seconds as the time unit in erlang:monotonic_time/1.

As for the fix, I think one possible (and perhaps the easiest and the most striaghtforwad) way is that we can simply do the calculations in erlang code. I tried to detect if there's any overflow in the C code and if so, we handle it in erlang code. But after I've done that, I feel that it just adds some unnecessary complexity to the C code.

I hope all this would make sense, but I'm happy to make any other changes to the C code if there's a better solution for it.

@cocoa-xu
Copy link
Contributor Author

I was wondering if I should try to use __int128_t for ErtsMonotonicTime (if __int128_t is available) or just leave it for now?

@IngelaAndin IngelaAndin removed the team:PS Assigned to OTP team PS label Mar 12, 2024
@garazdawi garazdawi mentioned this pull request May 8, 2024
@cocoa-xu
Copy link
Contributor Author

Hi all! As this is kinda stale, I wonder if it would be easier for us if I break this PR into smaller PRs? I can start with PRs that definitely won't break any existing things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants