Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts on ARM64 Linux Cavium #52589

Closed
rmacnak-google opened this issue Jun 1, 2023 · 12 comments
Closed

Timeouts on ARM64 Linux Cavium #52589

rmacnak-google opened this issue Jun 1, 2023 · 12 comments
Labels
area-infrastructure Use area-infrastructure for SDK infrastructure issues, like continuous integration bot changes.

Comments

@rmacnak-google
Copy link
Contributor

There are new test failures on [test] Don't apply simulator timeout multipler to ARM hardware..

The tests

service/add_breakpoint_rpc_kernel_test/dds Timeout (expected Pass)
service/add_breakpoint_rpc_kernel_test/service Timeout (expected Pass)
service/allocations_test/dds Timeout (expected Pass)
service/allocations_test/service Timeout (expected Pass)
service/async_next_regression_18877_test/dds Timeout (expected Pass)
service/async_next_regression_18877_test/service Timeout (expected Pass)
service/async_next_test/dds Timeout (expected Pass)
service/async_next_test/service Timeout (expected Pass)
service/async_scope_test/dds Timeout (expected Pass)
service/async_scope_test/service Timeout (expected Pass)
service/async_single_step_exception_test/dds Timeout (expected Pass)
service/async_single_step_exception_test/service Timeout (expected Pass)
service/async_single_step_into_test/dds Timeout (expected Pass)
service/async_single_step_into_test/service Timeout (expected Pass)
service/async_single_step_out_test/dds Timeout (expected Pass)
service/async_single_step_out_test/service Timeout (expected Pass)
service/async_star_single_step_into_test/dds Timeout (expected Pass)
service/async_star_single_step_into_test/service Timeout (expected Pass)
service/async_star_step_out_test/service Timeout (expected Pass)
service/async_step_out_test/dds Timeout (expected Pass)
service/async_step_out_test/service Timeout (expected Pass)
service/auth_token_test/dds Timeout (expected Pass)
service/auth_token_test/service Timeout (expected Pass)
service/awaiter_async_stack_contents_2_test/dds Timeout (expected Pass)
service/awaiter_async_stack_contents_2_test/service Timeout (expected Pass)
service/awaiter_async_stack_contents_test/dds Timeout (expected Pass)
service/awaiter_async_stack_contents_test/service Timeout (expected Pass)
service/bad_reload_test/dds Timeout (expected Pass)
    and 517 more tests

are failing on configurations

vm-linux-release-arm64
@rmacnak-google rmacnak-google added the area-infrastructure Use area-infrastructure for SDK infrastructure issues, like continuous integration bot changes. label Jun 1, 2023
@rmacnak-google
Copy link
Contributor Author

Seems like another reason to migrate to GCE instances

@alexmarkov
Copy link
Contributor

This started since 87362c6 which greatly decreased timeouts on arm and arm64.
@rmacnak-google Does it make sense to revert that change until the infra is improved?

@rmacnak-google
Copy link
Contributor Author

I would give up service tests on Linux ARM64 to get testing at all going on Windows ARM64.

copybara-service bot pushed a commit that referenced this issue Jun 8, 2023
Several tests seems to be just at the current timeout threshold and flaking between timeout and their normal outcome.

Cf. 87362c6

Bug: #52589
Change-Id: I4e89aa71618c51a9ca1d38e8d03fffbcd919a744
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/307972
Reviewed-by: Alexander Thomas <athom@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
@dcharkes
Copy link
Contributor

I'm reapproving:

vm-aot-linux-release-arm64
flaky -> Timeout (expected Pass)

  • service/client_resume_approvals_approve_then_disconnect_test/service
  • service/client_resume_approvals_identical_names_test/dds
  • service/dominator_tree_vm_test/dds_1
  • service/dominator_tree_vm_test/dds_2
  • service/dominator_tree_vm_with_double_field_test/dds_1
  • service/dominator_tree_vm_with_double_field_test/service_0
  • service/dominator_tree_vm_with_double_field_test/service_1
  • service/dominator_tree_vm_with_double_field_test/service_2
  • service/external_service_synchronous_invocation_test/service
  • service/gc_test/dds
  • service/gc_test/service
  • service/get_object_store_rpc_test/dds
  • service/http_get_vm_rpc_test/dds
  • service/http_get_vm_rpc_test/service
  • service/kill_running_test/dds
  • service/kill_running_test/service
  • service/regress_45684_test/service
  • service/set_vm_name_rpc_test/dds
  • service/set_vm_name_rpc_test/service
  • service/string_optimized_out_is_not_sentinel_test/service
  • service/get_isolate_after_stack_overflow_error_test/service
  • service/get_user_level_retaining_path_rpc_test/dds
  • service/get_user_level_retaining_path_rpc_test/service
  • service/typed_data_test/dds

@sstrickl
Copy link
Contributor

Reapproving flaky -> Timeout (expected Pass) on vm-aot-linux-release-arm64 for:

  • service/contexts_test/service

@dcharkes
Copy link
Contributor

dcharkes commented Jun 26, 2023

Reapproved:

✔ service/dds_disconnects_existing_clients_test/dds
✔ service/dev_fs_weird_char_test/service
✔ service/get_flag_list_rpc_test/dds
✔ service/get_instances_rpc_test/service
✔ service/get_isolate_rpc_test/dds
✔ service/get_supported_protocols_test/dds
✔ service/get_supported_protocols_test/service
✔ service/http_get_isolate_group_rpc_test/dds
✔ service/http_get_isolate_group_rpc_test/service
✔ service/inbound_references_test/dds
✔ service/object_graph_isolate_group_test/service
✔ service/pause_on_unhandled_exceptions_catcherror_test/dds
✔ service/pause_on_unhandled_exceptions_catcherror_test/service
✔ service/reachable_size_test/dds
✔ service/regexp_function_test/dds_0
✔ service/regexp_function_test/service_0
✔ service/regexp_function_test/service_1
✔ service/string_escaping_test/dds
✔ service/string_escaping_test/service
✔ service/vm_service_dds_test/dds

And also

✔ service/collect_all_garbage_test/dds
✔ service/collect_all_garbage_test/service
✔ service/dominator_tree_vm_test/service_2

This is getting rather labor intensive.

I would give up service tests on Linux ARM64 to get testing at all going on Windows ARM64.

@bkonyi thoughts?

@sstrickl
Copy link
Contributor

sstrickl commented Jul 3, 2023

Reapproved:

  • service/dev_fs_http_put_weird_char_test/service
  • service/dominator_tree_vm_test/dds_1
  • service/dominator_tree_vm_with_double_field_test/dds_0
  • service/dominator_tree_vm_with_double_field_test/service_0
  • service/dominator_tree_vm_with_double_field_test/service_1
  • service/dominator_tree_vm_with_double_field_test/service_2
  • service/external_service_registration_test/service
  • service/get_client_name_rpc_test/service
  • service/get_isolate_after_stack_overflow_error_test/service
  • service/http_enable_timeline_logging_service_test/service
  • service/kill_running_test/dds
  • service/reachable_size_test/service
  • service/regress_45684_test/dds
  • service/string_optimized_out_is_not_sentinel_test/dds
  • service/uri_mappings_lookup_test/service

@sstrickl
Copy link
Contributor

sstrickl commented Jul 7, 2023

Reapproving:

  • service/custom_stream_listen_test/service
  • service/dds_custom_stream_listen_test/dds
  • service/get_isolate_after_stack_overflow_error_test/dds
  • service/get_ports_rpc_test/dds
  • service/get_ports_rpc_test/service
  • service/get_retaining_path_rpc_test/dds
  • service/object_graph_isolate_group_test/dds
  • service/object_graph_isolate_group_test/service
  • service/observatory_assets_test/dds
  • service/process_service_test/dds
  • service/vm_service_dds_test/service
  • service/vm_timeline_events_test/service
  • vm/dart/use_add_readonly_data_symbols_flag_test

copybara-service bot pushed a commit that referenced this issue Jul 10, 2023
The timeout multipiler for linux-arm64 was originally 4, then lowered to
1 (see [0]), then increased again to 2 (see [1]).

Though it seems that service tests are still flakily timing out, so
let's try restoring the original multilier.

[0] https://dart-review.googlesource.com/c/sdk/+/306662
[1] https://dart-review.googlesource.com/c/sdk/+/307972

Also special case `ia32` in timeout calculations due to not using
an AppJIT trained `kernel-isolate` snapshot and therefore being
very slow, especially in ia32-debug mode.

Issue #52589
TEST=ci

Change-Id: Iab8c768866aec9e77bb83c7a3242cc5de8fb4e2f
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/312905
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
@liamappelbe
Copy link
Contributor

liamappelbe commented Jul 10, 2023

@mkustermann I had to reapprove this timeout again, even though you just increased the timeout multiplier. Thoughts?

@mkustermann
Copy link
Member

@mkustermann I had to reapprove this timeout again, even though you just increased the timeout multiplier. Thoughts?

Here's the impact of my daa35fd that restored the timeout multiplier for vm-linux-release-arm64 to it's original value:

status

=> I suspect the majority of the flaky timeouts are gone now for vm-linux-release-arm64.

@derekxu16
Copy link
Member

reapproving service/process_service_test/service, service/set_library_debuggable_test/dds, and service/regress_28443_test/dds

https://ci.chromium.org/ui/p/dart/builders/ci.sandbox/vm-linux-release-arm64/732/overview

copybara-service bot pushed a commit that referenced this issue Aug 7, 2023
Bug: #52589
Bug: #53138
Change-Id: Iaf6e499d80726b5ec8a810a5d3bd1d90a0d93af9
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/318800
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 9, 2023
In particular, this un-skips the service tests.

TEST=ci
Bug: b/302156166
Bug: #26109
Bug: #27806
Bug: #33057
Bug: #52589
Change-Id: Ieddf50ac6c27d23c5efa26a7e5bf7f9044350a5e
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/328381
Reviewed-by: Jonas Termansen <sortie@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 10, 2023
Bug: b/302156166
Bug: #52589
Change-Id: I9470dc13d3f41bf7f5e77c4168efeec7d17e9263
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/329564
Reviewed-by: Jonas Termansen <sortie@google.com>
Reviewed-by: Alexander Thomas <athom@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-infrastructure Use area-infrastructure for SDK infrastructure issues, like continuous integration bot changes.
Projects
None yet
Development

No branches or pull requests

7 participants