-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(MQ cleanup) Remove some unsafe code from System.Xml #43379
(MQ cleanup) Remove some unsafe code from System.Xml #43379
Conversation
Tagging subscribers to this area: @buyaa-n, @krwq, @jeffhandley |
{ | ||
if ((uint)iByte >= (uint)bytes.Length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is written this way to avoid bounds checks later in the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GrabYourPitchforks, (maybe you already know it but) since #40180 was merged, the uint
cast workaround is not needed in some cases where constant operands are involved. It turned out that in some cases, RyuJIT does not elide the bound check with uint
casts, where it was doing before. I posted some findings on macOS here: #11623 (comment). It is unlikely that this construct with (non const) operands is affected by that change, but perhaps would be good to double check with the latest master. I have a feeling that in some places in the framework, we can remove the uint
cast, as they are deoptomized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@am11 This is a good observation! As these other optimizations come online I think it'll be useful to perform a libraries-wide sweep of all of these patterns. It's always great to make the code more readable while maintaining peak efficiency. :)
return new string(p, 0, cch); | ||
} | ||
// bitblt source bytes directly into the destination char span | ||
// n.b. source buffer assumed to be well-formed UTF-16 machine endian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of an expansion on this comment: the original code had two different behaviors based on whether the underlying data was aligned or unaligned. If the underlying data was aligned, the original code would basically memmove the contents of the old buffer into the new string, not performing any UTF-16 validation. If the underlying data was unaligned, the original code would go through Encoding.Unicode.GetString
, which would perform UTF-16 validation. This really only affects whether unpaired surrogate characters get replaced with '\uFFFD'
in the destination string.
Since the updated code is just a simple memmove, I opted for the "don't perform any validation" behavior.
|
||
int cch = dstChars.Length; | ||
ReadOnlySpan<byte> srcBytes = state._data.AsSpan(state.pos, checked(cch * sizeof(char))); | ||
Span<byte> dstBytes = MemoryMarshal.AsBytes(dstChars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically MemoryMarshal
is an unsafe-equivalent API. But at least we got rid of the pointer manipulation in this method. :)
@@ -2890,7 +2859,7 @@ public static string DoubleToString(double dbl) | |||
|
|||
if (IsInteger(dbl, out iVal)) | |||
{ | |||
return IntToString(iVal); | |||
return iVal.ToString(CultureInfo.InvariantCulture); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed custom implementation in favor of delegating to the runtime's already-optimized int.ToString
implementation.
On .NET 5/6, dereferencing CultureInfo.InvariantCulture
is very fast due to recent JIT optimizations.
3636e0e
to
7dba9f6
Compare
{ | ||
goto Return; | ||
} | ||
} | ||
} | ||
|
||
if (pChar < pCharsEndPos && *pChar == '=') | ||
if ((uint)iChar < (uint)chars.Length && chars[iChar] == '=') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to cast it everywhere? Should you just make it unsigned from the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the "bounds check elision" pattern used elsewhere in the runtime code base, where the indexer is cast to uint only for the purposes of comparison against Length. All other uses of the indexer remain as normal signed integers.
I'm not a huge fan of this pattern, since I wish we'd just use nuint
everywhere and call it a day. :)
src/libraries/System.Private.Xml/src/System/Xml/BinaryXml/XmlBinaryReader.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some questions
Latest iteration is just a merge. I'm blocked running unit tests locally or testing other changes until #43137 is resolved. |
This is an MQ cleanup work item to remove some of the unsafe code from System.Xml in favor of safer alternatives when such alternatives exist.
There's still quite a bit of unsafe code that can be removed, but this should represent a good first effort.