Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service and profiler tests failing on dartk-linux-release-arm64 and app_jitk-linux-debug-x64 #46746

Closed
sstrickl opened this issue Jul 28, 2021 · 10 comments

Comments

@sstrickl
Copy link
Contributor

After 746b8f1, there were a lot of service test failurs on a few number of bots. 9397b8f fixes most of the tests, but there are still a handful remaining from runs after it landed:

On dartk-linux-release-arm64 and app_jitk-linux-debug-x64, the following tests are still failing:

     service_2/get_allocation_samples_test/service   (Pass -> RuntimeError, expected Pass) at 746b8f..babfe0
     service_2/get_allocation_samples_test/dds   (Pass -> RuntimeError, expected Pass) at 746b8f..babfe0

On dartk-linux-release-arm64, the following additional tests are still failing:

     service_2/rewind_optimized_out_test/service   (RuntimeError -> Timeout, expected Pass) at 94bdcc
     vm/cc/Profiler_BasicSourcePositionOptimized   (Pass -> Crash, expected Pass) at 746b8f..f424f3
     service_2/get_vm_timeline_rpc_test/service   (RuntimeError -> Timeout, expected Pass) at 94bdcc
     service_2/get_allocation_traces_test/service   (Pass -> RuntimeError, expected Pass) at 746b8f..f424f3
     vm/cc/Profiler_BinaryOperatorSourcePositionOptimized   (Pass -> Crash, expected Pass) at 746b8f..f424f3
     vm/cc/Profiler_SourcePositionOptimized   (Pass -> Crash, expected Pass) at 746b8f..f424f3
     service_2/get_allocation_traces_test/dds   (Pass -> RuntimeError, expected Pass) at 746b8f..f424f3
     vm/cc/Profiler_FunctionInline   (Pass -> Fail, expected Pass) at 746b8f..babfe0
     service_2/rewind_optimized_out_test/dds   (RuntimeError -> Timeout, expected Pass) at cb5d0b..41cf79

Given the number of remaining failing tests, I'm going to approve them for now.

cc @bkonyi

@sstrickl sstrickl changed the title Service tests failing on dartk-linux-release-arm64 and app_jitk-linux-debug-x64 Service and profiler tests failing on dartk-linux-release-arm64 and app_jitk-linux-debug-x64 Jul 28, 2021
@sstrickl
Copy link
Contributor Author

sstrickl commented Jul 28, 2021

While CL 208321 (for 9397b8f) showed all the failures on dartk-linux-debug-ia32 and vm-kernel-optcounter-threshold-linux-release-ia32-try to be fixed (the latter having one test previously approved for a Timeout returning to that state), it also showed all the failures on app_jitk-linux-debug-x64 to be fixed as well but one showed up on the non-trybot, so I imagine there's at least one more similar heisenbug remaining.

@sstrickl
Copy link
Contributor Author

Given that the fix I added was needed even before the original commit landed, I expect it may not be an issue with the commit itself, but just that it exposed already existing issues that were less likely to trigger before.

dart-bot pushed a commit that referenced this issue Jul 28, 2021
…UserTag"

This reverts commits 746b8f1 and
9ee2259.

Reason for revert: #46746

Original change's description:
> [ package:dds ] Add support for caching CPU samples based on UserTag
>
> DDS can be configured to listen for CPU sample events and cache samples
> that were collected while certain UserTags are active. These cached
> samples are stored in a ring buffer and are stored until the isolate
> shuts down.
>
> TEST=pkg/dds/test/get_cached_cpu_samples_test.dart
>
> Change-Id: Ib20770f59f1672c703413486f87795b3bb23f676
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/207206
> Commit-Queue: Ben Konyi <bkonyi@google.com>
> Reviewed-by: Kenzie Schmoll <kenzieschmoll@google.com>

TEST=ci
TBR=bkonyi@google.com,rmacnak@google.com,kenzieschmoll@google.com

Change-Id: I1b6655ad7e3b10e1145ff545cc90ecf3bc6e092d
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/208341
Commit-Queue: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Siva Annamalai <asiva@google.com>
@bkonyi bkonyi self-assigned this Jul 28, 2021
@bkonyi
Copy link
Contributor

bkonyi commented Aug 2, 2021

Unfortunately, I can't seem to reproduce these issues on my Linux box... :-( this is going to be a fun one...

Also @sstrickl, in the future could you please provide links to logs for a subset of the failures? The builder names and test configuration names don't match up 1:1, so it's a pain to go digging for them in the tree. Having even just one log per failure is a huge help!

@aam
Copy link
Contributor

aam commented Aug 4, 2021

@bkonyi wrote

Unfortunately, I can't seem to reproduce these issues on my Linux box... :-( this is going to be a fun one...

Should https://dart.googlesource.com/sdk/+/17961606060fa36971fdc40cee8105a13262e97e be reverted while you investigate the failures?

@bkonyi
Copy link
Contributor

bkonyi commented Aug 4, 2021

No, the failures are approved for now and I have a fix for the failures I could reproduce here. The failures are likely due to us running out of sample buffer space, so increasing the number of sample blocks should the failures.

@aam
Copy link
Contributor

aam commented Aug 4, 2021

It looks like https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-kernel-nnbd-linux-debug-ia32/286, https://ci.chromium.org/p/dart/builders/ci.sandbox/vm-kernel-linux-debug-ia32/7219, https://ci.chromium.org/p/dart/builders/luci.dart.ci.sandbox/cross-vm-linux-release-arm64 all seems to be affected by reland.

service/get_allocation_samples_test, service/get_allocation_traces_test, vm/cc/Profiler_BasicSourcePositionOptimized, vm/cc/Profiler_FunctionInline, vm/cc/Profiler_SourcePositionOptimized, vm/cc/Profiler_BinaryOperatorSourcePositionOptimized are the tests that stared to fail with reland on ia32, arm64 bots.

@aam
Copy link
Contributor

aam commented Aug 4, 2021

No, the failures are approved for now

Main concern with this is that is by approving failures we are potentially disrupting/breaking things downstream from dart sdk(flutter, google).

@bkonyi
Copy link
Contributor

bkonyi commented Aug 4, 2021

I strongly believe that the fix in the CL linked above will resolve most, if not all of the issues. Given that this is next to impossible to reproduce locally on machines with various performance characteristics and we're not actually seeing crashes, just test failures, reverting the reland without landing the potential fix first would be counterproductive.

@aam
Copy link
Contributor

aam commented Aug 4, 2021

reverting the reland without landing the potential fix first would be counterproductive.

Not sure I understand. Revert can be landed quickly as it has no conflicts, the fix can go together with the reland.

@bkonyi
Copy link
Contributor

bkonyi commented Aug 4, 2021

Going to take a different approach for the fix. Revert up here.

@bkonyi bkonyi closed this as completed Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants