Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

echesakov
Copy link

@echesakov echesakov commented May 23, 2018

As discovered in #17851 JIT_CheckedWriteBarrier was not properly patched during GCToEEInterface::StompWriteBarrier when operation=WriteBarrierOp::StompResize

As consequence objects in higher generation sometime can miss cards on card table which causes objects in ephemeral ranges (assigned via JIT_CheckedWriteBarrier) be collected. Such situation is described in https://github.com/dotnet/coreclr/issues/17851#issuecomment-391206977

The problem manifest itself as intermittent segfaults in GC or WriteBarrier code and HeapVerify failures when COMPlus_HeapVerify=1. Affects both Windows and Linux.

This PR enables under ARM already implemented for ARM64 mechanism updating g_highest_address g_lowest_address via ::StompWriteBarrierResize. Also found that ::FlushWriteBarrierInstructionCache call is needed right such call (at least for ARM).

Fixes #17851

Tested manually on Ubuntu/arm for more than 10 hours and Windows/arm for couple hours
@janvorli @jkotas PTAL

@echesakov
Copy link
Author

@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test

@janvorli janvorli merged commit 08beb29 into dotnet:master May 24, 2018
@echesakov echesakov deleted the UpdateGCHeapRangeStompWriteBarrierResize branch May 24, 2018 17:18
@CarolEidt
Copy link

@echesakovMSFT - I've loosely followed the discussion on #17851, and it seems like there was a lot of background knowledge that you had to acquire in order to track this down. Do you think you'd be able to write up some notes that might make it easier for someone else to track down similar issues in future?

@echesakov
Copy link
Author

@CarolEidt Sure, I can try to collect all the ideas in a document somewhere, but I mostly followed the debugging procedure described by @janvorli (thanks a lot for writing this!) in the comments here https://github.com/dotnet/coreclr/issues/15381#issuecomment-377197181 and there https://github.com/dotnet/coreclr/issues/15381#issuecomment-377206297. Also enabling additional stress logging helped a lot.

The main challenge for me was to make LLDB running on Ubuntu/arm, so I can use SOS (and at least be able to do sos DumpLog). I believe this should be definitely documented somewhere (if it's not already).

@tommcdon @mikem8361 Are you familiar with the LLDB issue on Ubuntu/arm (when LLDB is installed from Ubuntu repo) which manifests itself as error: process launch failed: Lost debug server connection.
I found that LLVM_HOST_TRIPLE=arm-linux-gnueabihf must be explicitly passed to cmake when compile arm targeting LLDB. If you want me to document it somewhere what should be appropriate place to do so? I suspect a paragraph in https://github.com/dotnet/coreclr/blob/master/Documentation/building/buildinglldb.md but please confirm? I also have created a Dockerfile cross-compiling arm/LLDB (which is definitely much faster then native compilation on ARM device), but I am not sure if there is a place for this in dotnet/coreclr repo.

/сс @janvorli

@mikem8361
Copy link

mikem8361 commented May 24, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants