Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

sdmaclea
Copy link

@sdmaclea sdmaclea commented Mar 26, 2020

Fixes dotnet/runtime#34109

This has already been fixed in .NET 5 (see #25958). This PR ports that fix to release/3.1.

Customer Impact

Setting a debugger breakpoint on Amd64 (x64) can result in incorrect calculation of floating point values.

Regression?

This was a known and difficult to fix issue. The hand written Amd64 instruction parser was flaky and nearly impossible to maintain.

The problem arose because the debugger must determine the length and form of the instruction where the breakpoint is inserted. The original code was a minimal implementation which only covered the instruction form historically used by the JIT. As new forms and especially intrinsics were added, the code was not updated because the complexity was far too high.

We completely missed decode of all x64 simd instruction forms.

The fix required was this substantial change. At the time we missed the 3.0 branch and we did not have sufficient feedback to consider backport to 3.x.

Testing

  • This code was manually tested to fix the originally reported issue.
    • Additionally the same code was tested for regressions in other areas.
      • Trying a breakpoint on every line of code
      • Stepping through every line of code
  • The code has been in the 5.0 branch for more than 6 months with no new regressions
    • Shipped in all .NET Core 5.0 previews to date
    • Over the last 90 days, there have been
      • 184K debugging sessions launched on .NET Core 5 previews
      • from 3.3K unique MAC addresses
    • We have no reports of issues linked to this fix
  • The 5.0 code has been tested against recently reported similar issues
    • We have several confirmation that this issue observed in 3.0/3.1 has been fixed by 5.0
  • The internal diagnostic test suite was amended to test for this issue and prevent further identical regressions.
  • @tommcdon & @hoyosjs have each used the x64 debugger since this change for debugging various reported/discovered issues. They saw no issues suggesting a regression.

Risk

Low.

  • Modify amd64walker to use table based decode #25958 has been in 5.0 since 9/2019
  • Only affects x64 debugger operation.
  • Removes a flaky instruction parser and replaces it with a machine generated one.
    • Rewrites one function NativeWalker::DecodeInstructionForPatchSkip(). Used only for debugging x64 code.
    • Adds 1 new machine generated instruction decode table. (10k lines)
  • Adds tools to regenerate the parser tables
    • Add 3 files to archive the tool used to generate the decode table.to machine generate. (Only used in development)
  • Adds documentation (1 file)
  • Sufficient testing coverage

Code Reviewers

@hoyosjs @tannergooding @noahfalk @jkotas

@sdmaclea sdmaclea added area-Diagnostics Servicing-consider Issue for next servicing release review labels Mar 26, 2020
@sdmaclea sdmaclea added this to the 3.1.4 milestone Mar 26, 2020
@sdmaclea sdmaclea self-assigned this Mar 26, 2020
@jeffschwMSFT jeffschwMSFT modified the milestones: 3.1.4, 3.1.x Mar 26, 2020
Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM the additional +43/-43 changes (as compared to #25958) are just stripping additional whitespace from the end of lines in src/debug/ee/amd64/amd64walker.cpp

Copy link
Member

@hoyosjs hoyosjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also verified the customer scenario reported gets fixed.

@jeffschwMSFT jeffschwMSFT removed the Servicing-consider Issue for next servicing release review label Mar 31, 2020
@jeffschwMSFT
Copy link
Member

Will consider again in a few months

@jeffschwMSFT jeffschwMSFT added Servicing-consider Issue for next servicing release review and removed Servicing-consider Issue for next servicing release review labels Apr 27, 2020
@tommcdon
Copy link
Member

Adding @noahfalk @jkotas for additional review

@noahfalk
Copy link
Member

I'm taking a look at the change but one question that rises to the top - do we have telemetry telling us how much usage we've gotten on this code path in .Net 5? For example can we determine the number of VS debugging sessions that have occurred with a .Net 5 x64 runtime? My fear is that we'd make an assumption that .Net 5 has given the scenario lots of coverage and the lack of feedback implies the change has no issues when in practice the code had been used relatively little and there are still bugs lurking. It seems unlikely that we'd catch an issue in this change via code review.

@noahfalk
Copy link
Member

A basic pass through revealed no obvious issues, but again I wouldn't expect to be able to spot anything without spending considerable time studying the X64 instruction encoding reference documentation and then doing a detailed comparison between the spec and this code. If we can demonstrate considerable usage has already occurred in .Net 5 that would bring down the risk.

sdmaclea and others added 3 commits April 30, 2020 20:21
* Modify amd64walker to use table based decode

Use tables for NativeWalker::DecodeInstructionForPatchSkip()
Also files used to generate the decode tables in doc folder.
+ Typos (dotnet#26968)
+ Feedback from 3.1 amd64InstrDecode review (dotnet/runtime#35670)
@sdmaclea
Copy link
Author

sdmaclea commented May 1, 2020

Over the last 90 days, there have been

  • 184K debugging sessions launched on .NET Core 5
  • from 3.3K unique MAC addresses
  • All of these build had the 5.0 version of this PR.

@sdmaclea sdmaclea added the Servicing-consider Issue for next servicing release review label May 16, 2020
@leecow leecow added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels May 21, 2020
@leecow leecow modified the milestones: 3.1.x, 3.1.6 May 21, 2020
@Anipik
Copy link

Anipik commented Jun 9, 2020

@tannergooding @hoyosjs @jkotas can you review and approve the change after the latest commits ?

@Anipik Anipik merged commit 20ee1c4 into dotnet:release/3.1 Jun 9, 2020
@sdmaclea sdmaclea deleted the Amd64Decode branch June 10, 2021 00:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Diagnostics Servicing-approved Approved for servicing release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants