Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/7.0] Fixing a bug that causes us to mistakenly demote a gen2 region to gen0 #83341

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Mar 13, 2023

Backport of #82413 to release/7.0

/cc @mangod9 @Maoni0

Customer Impact

An infinite loop in GC. Multiple customers have reported this issue and it happens when a gen0 region that's almost fully occupied (< 24 bytes free) with one giant plug (ie, no free objects at all in the region). This causes allocate_in_condemned_generations to go into an infinite loop because in ephemeral generations we expect short plugs, ie, we should be able to allocate a min free object in front of each plug. And normally we can because when we allocate objects in gen0 we make sure to break up the allocation contexts with min free objects and when we compact into gen1 we form short plugs.

Reported by multiple customers in #80073. While validating we also found that the repro causes another failure which is fixed by #83342

Testing

local functional testing; GC stress testing; also validated by a customer who hit the issue.

Risk

low risk. this would cause a region that happened to be in gen0 by accident to now actually go through our demotion logic which we already do for all other regions anyway.

IMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

We are seeing a gen0 region that's almost fully occupied (< 24 bytes free) with one giant plug (ie, no free objects at all in the region).
This causes allocate_in_condemned_generations to go into an infinite loop because in ephemeral generations we expect short plugs,
ie, we should be able to allocate a min free object in front of each plug. And normally we can because when we allocate objects in gen0
we make sure to break up the allocation contexts with min free objects and when we compact into gen1 we form short plugs.

We are in this situation when all of the following conditions are true -

+ we did a gen2 compacting GC that generates a pinned plug in a gen2 region almost as big as the whole region. my guess for the reason why there's this giant pinned plug is because that gen2 region was already really compact so when we called allocate_in_condemned_generations on the non pinned plugs that are next to some pinned plugs in it we discovered we can't move the non pinned plugs anyway so we artificially pinned them and formed a giant pinned plug. and during this GC those objects were no longer pinned so we have one giant non pinned plug.
+ this gen2 region needs to be the last region with pinned plugs;
+ this gen2 region hasn't been consumed by allocate_in_condemned_generations yet so it was processed by process_remaining_regions;

Then in process_remaining_regions we'll set the plan_gen_num for that gen2 region to 0 because we are doing

set_region_plan_gen_num_sip (current_region, current_plan_gen_num);

instead of going through the demotion logic to decide whether we should demote this region or not.
@ghost
Copy link

ghost commented Mar 13, 2023

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

Backport of #82413 to release/7.0

/cc @mangod9 @Maoni0

Customer Impact

Testing

Risk

IMPORTANT: Is this backport for a servicing release? If so and this change touches code that ships in a NuGet package, please make certain that you have added any necessary package authoring and gotten it explicitly reviewed.

Author: github-actions[bot]
Assignees: -
Labels:

area-GC-coreclr

Milestone: -

Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved. we will take for consideration in 7.0.x

@jeffschwMSFT jeffschwMSFT added the Servicing-approved Approved for servicing release label Mar 16, 2023
@jeffschwMSFT jeffschwMSFT added this to the 7.0.6 milestone Mar 16, 2023
@carlossanlop
Copy link
Member

I'm retargeting this PR to the new release/7.0-staging branch, which is the one that we will use from now on for servicing fixes.

Repo maintainers will now be allowed to merge their own servicing PR as long as it meets the requirements:

  • It is approved by Tactics (signaled by adding the Servicing-approved label).
  • It's signed-off by an area owner.
  • The CI is green, or the failures are investigated as unrelated.
  • And if the PR touches an OOB package, the necessary OOB authoring changes are added.

The new process is described here: runtime/docs/project/library-servicing.md.

The infra team will be actively monitoring servicing PRs to ensure all requirements are met and to help with any issues.

Let me know if you have any questions.

@carlossanlop carlossanlop changed the base branch from release/7.0 to release/7.0-staging March 28, 2023 20:57
@carlossanlop carlossanlop added Servicing-approved Approved for servicing release and removed Servicing-approved Approved for servicing release labels Mar 30, 2023
@carlossanlop
Copy link
Member

Reminder: April 10th is the last day to merge backport PRs to ensure they get included in the May Release. PR owners are now in charge of merging their own PRs.

@mangod9 mangod9 merged commit f8777ff into release/7.0-staging Apr 4, 2023
@carlossanlop carlossanlop deleted the backport/pr-82413-to-release/7.0 branch April 4, 2023 21:20
@ghost ghost locked as resolved and limited conversation to collaborators May 5, 2023
@leecow leecow modified the milestones: 7.0.6, 7.0.7 Jun 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-GC-coreclr Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants