Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC memory commit issue with currently not supported OS (Nintendo Switch) #41675

Open
RalfKornmannEnvision opened this issue Sep 1, 2020 · 10 comments
Labels
area-GC-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@RalfKornmannEnvision
Copy link
Contributor

I am in the process of getting CoreRT running on the Nintendo Switch. Most thins are already working. While getting the background GC working I run in a little issue with the way the GC commits the memory for the background GC mark array.
The mark array is reserved together with other management data but the actual commit is postponed until the first background GC is running. This triggers a second commit on the same reserve. But as the mark array is not aligned to a page boundary the first page of this second commit overlaps with the last page that was already committed. The GC code now expect that any additional page is zeroed but the first one should not be touched.

Unfortunately the virtual memory manager I can use is not smart enough for this. Therefore it either let additional pages uninitialized or clears everything.

For performances reasons I would prefer to not implement a tracking for the already committed pages on top of the virtual memory manager. 

A possible fix that would help me could be if the GC avoid this overlapping commit for the background gc mark array. This could be done very easily in the gc_heap::commit_mark_array_by_range function. I already prototyped it on my system and it allows me to get the background GC running even with a virtual memory manager that doesn't behave that nice when it comes to multi commits of the same page.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-GC-coreclr untriaged New issue has not been triaged by the area owner labels Sep 1, 2020
@ghost
Copy link

ghost commented Sep 1, 2020

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@mangod9 mangod9 added this to the Future milestone Sep 1, 2020
@mangod9
Copy link
Member

mangod9 commented Sep 1, 2020

@RalfKornmannEnvision would you mind linking the PR of the prototype? Assume you are asking for a fix in CoreRT, not .net core?

@mangod9 mangod9 added the enhancement Product code improvement that does NOT require public API changes/additions label Sep 1, 2020
@jkotas
Copy link
Member

jkotas commented Sep 1, 2020

CoreRT/NativeAOT is using the exact same GC as CoreCLR. We do not want the implementation to diverge. Also, we want all GC fixes to be submitted via dotnet/runtime repo.

The root cause of the issue is that GC commits same region of memory multiple times today. The Windows and Linux memory managers deal with it gracefully, but the Switch memory manager does not.

Can the GC be fixed so that it always commits given region exactly once?

@mangod9
Copy link
Member

mangod9 commented Sep 1, 2020

CoreRT/NativeAOT is using the exact same GC as CoreCLR. We do not want the implementation to diverge. Also, we want all GC fixes to be submitted via dotnet/runtime repo.

Ok, thanks for the clarification.

@mangod9 mangod9 modified the milestones: Future, 6.0.0 Sep 1, 2020
@RalfKornmannEnvision
Copy link
Contributor Author

Created a draft PR (#41728) with the small change that would be needed to fix this issue.

All it does is changing the start page for the commit of the mark array. So far it always committed starting with the page that contains at least a part of the mark array. The change ensure that it starts with the first page that only contains mark array data. The page that contains although other data has always been already committed at this point as part of the initial commit that happens directly after the memory has been reserved. This initial reserve only exclude the full mark array. There will be nothing behind the mark array therefore we don't need to considerate that the last page of the mark array might be already committed, too. The edge case here is when the mark array actually starts on a page boundary. In this case there will be no page that contains  mixed data. The new align_upper_page helper function ensures that the address will not change if it is already on the start of a page. Therefore the whole mark array will be committed in the edge case, too.

After the commit the function runs a check (in the debug version) to see if the whole mark array has been zeroed. This check will although validate that all pages of the mark array have been committed. 

@RalfKornmannEnvision
Copy link
Contributor Author

I take it back. It seems I have overlooked something here. Beside of a full commit of the whole mark array there seems to be a use case were it is only partially committed. In this case my whole approach to fix this issue break down.

@Maoni0
Copy link
Member

Maoni0 commented Sep 2, 2020

even without partial commit it doesn't work - you could have a new segment that's fully in range and is close enough to an existing fully in range segment that it shares the first page of its mark array with the mark array for the existing segment.

you can make this work by looking to see if there's a seg that would share such a page with the range you want to commit and if that page has been committed based on the old BGC range.

what kind of apps do you run on the switch that uses corert? do you actually need BGC? if these apps have very small heaps you can just disable BGC altogether.

@RalfKornmannEnvision
Copy link
Contributor Author

I am aware that I can disable the BGC. Actually I just have activated it by implementing FEATURE_MANUALLY_MANAGED_CARD_BUNDLES and FEATURE_USE_SOFTWARE_WRITE_WATCH_FOR_GC_HEAP for CoreRT on ARM64. This was when I noticed the issue. 

We might get away without the BGC but as the long term plan is to get other Switch developer access to the sources, too there might be people who are more in need.  But for now it might be the best to disable the BGC until there is a better solution

Sorry for wasting your time.

@mangod9
Copy link
Member

mangod9 commented Sep 2, 2020

Thanks @RalfKornmannEnvision for raising this issue. We will leave this issue on the backlog, but given that you have a workaround for now, it might be a lower priority. If BGC need arises again please feel free to contribute a change and we will evaluate it for .net 6. Thx!

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@Maoni0
Copy link
Member

Maoni0 commented Sep 2, 2020

@RalfKornmannEnvision no need to apologize! and thanks for bringing to this to our attention.

@mangod9 mangod9 modified the milestones: 6.0.0, 7.0.0 Jul 10, 2021
@mangod9 mangod9 modified the milestones: 7.0.0, Future Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-GC-coreclr enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

6 participants