New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch DirectoryControl to use AsnWriter, AsnDecoder #101512
base: main
Are you sure you want to change the base?
Conversation
Replaced this with the managed AsnDecoder, removing PInvoke from a potential hot path.
Also removed the manual API calls to ldap_create_sort_control - this is now built in managed code. This then has knock-on effects to eliminate the SortKeyInterop classes.
Most of the Control tests were hardcoded to the output of BerConverter, which uses four-byte lengths in all cases. This behaviour is now different: the same output is returned across all platforms for .NET, and remains unchanged for .NET Framework. This should also close issue 34679.
Reduce number of copies required in TransformControls, and enable these copies to take advantage of newer intrinsics where available.
Windows domain controllers may return a distinguished name starting with OU=, rather than ou=.
Out of curiosity, any benchmarks for perf. numbers before/after switching to |
I've not got benchmarks right now, but will write some in the next few days. In advance of these, I expect there'll be a modest reduction in managed and unmanaged memory usage, and that execution time will reduce (while remaining within the margin of error for the network request itself.) |
Preallocating space for AsnWriter buffers to reduce memory usage. Correctly handling attribute names in SortControls.
Benchmarks are below. To summarize:
Performance header
AsqRequestControl.GetValue: -90% execution time, -35% Gen0 memory allocation
CrossDomainMoveControl.GetValue: -54% execution time, -62% Gen0 memory allocation
DirSyncRequestControl.GetValue: -89% execution time, -35% Gen0 memory allocation
ExtendedDNControl.GetValue: -89% execution time, -30% Gen0 memory allocation
PageResultRequestControl.GetValue: -90% execution time, -40% Gen0 memory allocation
QuotaControl.GetValue: -92% execution time, -28% Gen0 memory allocation
SearchOptionsControl.GetValue: -90% execution time, -30% Gen0 memory allocation
SecurityDescriptorFlagControl.GetValue: -88% execution time, -30% Gen0 memory allocation
SortRequestControl.GetValue: -79% execution time, +4% Gen0 memory allocation
DirectoryControl.TransformControls: -75% execution time, -76% Gen0 memory allocation
VerifyNameControl.GetValue: -87% execution time, -46% Gen0 memory allocation
VlvRequestControl.GetValue: -84% execution time, -69% Gen0 memory allocation
I've made a performance adjustment by specifying the initial size of AsnWriter, since this is trivial to calculate (or always static.) AsnWriter grows in 1KB increments, which is much larger than the size of a normal directory control and causes memory usage to balloon. One inefficiency which I couldn't eliminate is that when writing strings as ASN.1 octet strings, I want to manually select the encoding to use and encode directly into the AsnWriter buffer. This isn't possible, (probably to keep AsnWriter specification-compliant) so I have to reserve/allocate a byte array, encode into that and write that out as an octet string. An example of this behaviour is in Edit: the updated build has completed and the test failures are unrelated, so I'm now happy that the benchmarks are valid @PaulusParssinen |
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
Thanks @PaulusParssinen, I'll correct the nits shortly. I dug into the codegen for stackalloc. The results were fairly interesting. Variable string length stackallocSample codeSpan<byte> byteSpan = stackalloc byte[myString.Length];
return byteSpan.Length; Assembly
Constant length stackallocSample codeSpan<byte> byteSpan = stackalloc byte[0x100];
return byteSpan.Length; Assembly
Constant length stackalloc, with SkipLocalsInitSample codeSpan<byte> byteSpan = stackalloc byte[0x100];
return byteSpan.Length; Assembly
The general pattern of the assembly is thus:
The relevant section is surprising. localsinit affects over a stackalloc'd Span, even though the documentation is fairly specific in saying that this might not happen:
I'll mark the DirectoryControls method to disable localsinit, then make the Span length constant. If localsinit were to be left enabled, without any intrinsics helping the zeroing along, switching to using a constant 256-byte Span is slower than a dynamic (but shorter) one. The benchmark for this is:
It doesn't change the control-by-control benchmarks earlier in the PR though - it's a rounding error in comparison. |
IIRC, BCL is marked with update: If I'm reading it right, this should be enabling it for all libraries. runtime/src/libraries/Directory.Build.targets Lines 211 to 227 in 34e65b9
|
Corrected off-by-one error in stack allocation comparison. Made some stackallocs constant (unless a meaningful constant size would be >256 bytes) and reviewed these constant values based on real-world attribute lengths/server names.
Excellent, thanks. I've fixed the off-by-one error and made all but one of the stackallocs constant. In the case of attribute names, I've then cut the threshold from 256 -> 64, since the largest AD attribute name is 53 bytes. The stackalloc which I've kept dynamic is the server name in VerifyNameControl. Looking at RFC1035, a three-label FQDN could be up to 384 bytes when UTF-16 encoded - potentially 512 bytes in the case of an enterprise AD environment. I think that at that point, the elimination of two MOVs is outweighed by the cost of always allocating 512 bytes of stack space. |
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
// Attribute name is optional: AD for example never returns attribute name | ||
asnReadSuccessful = AsnDecoder.TryReadEncodedValue(asnSpan, AsnEncodingRules.BER, out Asn1Tag octetStringTag, out _, out int octetStringLength, out _); | ||
if (asnReadSuccessful) | ||
{ | ||
// decoding might fail as attribute is optional | ||
o = BerConverter.Decode("{e}", value); | ||
Debug.Assert(o != null && o.Length == 1); | ||
Span<byte> attributeNameBuffer = octetStringLength <= AttributeNameStackAllocationThreshold ? attributeNameScratchSpace.Slice(0, octetStringLength) : new byte[octetStringLength]; | ||
|
||
result = (int)o[0]; | ||
_ = AsnDecoder.TryReadOctetString(asnSpan, attributeNameBuffer, AsnEncodingRules.BER, out _, out _); | ||
attribute = s_utf8Encoding.GetString(attributeNameBuffer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TryReadEncodedValue can fail for a lot of sub-reasons; but I think you're just using it for "there was still data". In which case you should rewrite this to
if (!asnSpan.IsEmpty)
{
scoped ReadOnlySpan<byte> attributeNameBuffer;
if (asnSpan.Length <= AttributeNameStackAllocationThreshold)
{
asnReadSuccessful = AsnDecoder.TryReadOctetString(
asnSpan,
attributeNameScratchSpace,
AsnEncodingRules.BER,
out int consumed,
out int written);
// TryReadOctetString can't fail when the output buffer is at least as big as the input buffer.
Debug.Assert(asnReadSuccessful);
attributeNameBuffer = attributeNameScratchSpace.Slice(0, written);
asnSpan = asnSpan.Slice(consumed);
}
else
{
attributeNameBuffer = AsnDecoder.ReadOctetString(
asnSpan,
AsnEncodingRules.BER,
out int consumed);
asnSpan = asnSpan.Slice(consumed);
}
attribute = s_utf8Encoding.GetString(attributeNameBuffer);
}
if (!asnSpan.IsEmpty)
{
throw new BerConversionException();
}
SortResponseControl sort = ...;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this library wants to change to using array pool for "too big for the stack", then you'd go with something like your current code. But once you're using new byte[len]
you should just call the array-allocating variety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the pattern of TryReadEncodedValue
+ TryReadOctetString
to do a few things. The octet strings in question are optional, so I'm interrogating the buffer to figure out it's about to yield an octet string and if so, how big it is. I'm also adding a post-hoc check to make sure that it's the last thing in the sequence, in line with the prior review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume you have a definite-length payload: 30 04 0A 01 01 04
.
- ReadSequence will report success. The definite length encoding means it isn't going to bother checking that the children make sense.
- ReadEnumerated will consume 3 bytes (
0A 01 01
). The sequence contents now have one byte left (which is never valid). - TryReadEncodedValue will return false. It can read a tag, but not a length, you're out of data. Nothing was corrupt, so no exception, just a false return.
- What state are you in now?
The code I wrote is "there's more stuff. Just read it as an OCTET STRING, because that's the only optional thing remaining". That will cause an exception, because the data is insufficient and therefore corrupt. If there was NO data remaining, it'd not read anything else, and satisfy the OPTIONAL.
I don't know what BerConverter would do in this invalid-data case. If it previously rejected it (which I hope it would) then it needs to continue to be rejected. If it's super loosey goosey, then what you have is more compatible (though I think the more strict read is still better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example SortControl payload and explanation makes sense. Incidentally, BerConverter will reject that if I use the {eO}
format specifier.
That payload should naturally fail for either of two reasons:
- An octet string is expected to follow the enumerated value, and the data which actually follows that enumerated value isn't a valid octet string.
- The data following the enumerated value isn't an octet string, so we're expecting the enumerated value to be the last thing in the sequence. It isn't.
It doesn't right now because TryReadEncodedValue will return false and the updated code will ignore the extra data inside the payload's sequence. I need to fix that when handling PageControl, SortControl and VlvResponseControl.
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
...irectoryServices.Protocols/src/System/DirectoryServices/Protocols/common/DirectoryControl.cs
Outdated
Show resolved
Hide resolved
_keys = new SortKey[value.Length]; | ||
for (int i = 0; i < value.Length; i++) | ||
{ | ||
_keys[i] = new SortKey(value[i].AttributeName, value[i].MatchingRule, value[i].ReverseOrder); | ||
_keysAsnLength += 13 + (value[i].AttributeName?.Length ?? 0) + (value[i].MatchingRule?.Length ?? 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly advise ditching all of the size estimation and just using a shared writer with the default 1k first page.
Thanks @bartonjs - your review makes sense, and I'll apply the feedback later today. What's your reasoning behind ditching the size estimation? It seems to be a fairly natural progression to shrink the first allocated block to a more appropriate size (since most controls will never even come close to 1KB in size.) From there, the sizes are either constant (because the control's structure is a set of fixed-size fields) or trivially calculated; in both cases, the length of each control is implied by the specification which defines it, so I don't see it as violating separation of concerns. AsnWriter will expand if the estimate is wrong, so I don't think it's creating any exceptions in the edge case. Is it a problem solely because the estimation makes it difficult to pool the AsnWriter instances, or does it have a problem in its own right? |
Assuming the objects are pooled centrally, it's that a low estimate from one area just forces a resize to happen somewhere else. And otherwise that it's a lot of fiddly calculations to explain, especially in the ones that aren't just a sequence of an integer. If they're pooled by class that needs them, then I suppose the "this is the size I should need here" applies... at least for rigidly sized ones. |
* Added limited caching of AsnWriters, as small/medium/large buckets. * Replaced several Debug.Asserts with BerExceptions. * Changed the type of raised exception to retain backwards-compatibility. * Added extra checks on BER content lengths to match BerConverter.
That makes sense, thanks. I've changed this to use a set of up to three AsnWriters based on expected payload length (0-32, 33-128, 129+) and this seems to work well. As a result, everything except for The changes have slightly improved performance, but nothing to write home about - about 10-20ns shaved off the previous changes. The encode/decode tests pass locally, but I've not currently got access to an AD server to perform real-world testing. I'll re-test once I do. |
Relates to #97540.
This PR replaces all references to
BerConverter
in LDAP directory control generation/parsing to use AsnWriter and AsnDecoder. SortRequestControl didn't use BerConverter directly - it called the OpenLDAP and WLDAPldap_create_sort_control
APIs instead. This class was the only thing in S.DS.P which referenced theldap_create_sort_control
andSortKeyInterop
struct, so I deleted them both.SortRequestControl
The change to SortRequestControl's generation mechanism might also resolve #34679, since there shouldn't be any mechanism for the heap corruption to occur.
Most of the SortRequestControl's new ASN.1 encoding is pretty uncontroversial, but there was a bit of discussion in PR #65548 around the encoding of the sort key's attribute name, and this was marshalled (as part of SortKeyInterop) with different encodings between Windows and Linux. In the RFC, this is defined (indirectly) as an LdapString; this is described as ISO10646 characters, encoded as a UTF-8 string and represented as an OCTET STRING. I'm fairly sure that UTF8Encoding.GetBytes fulfils this, and running the associated test case against a real AD domain controller passes.
Test changes
There are also test changes, but these are largely to change the special-casing of expected byte values between OpenLDAP and WLDAP - .NET now generates these values in a consistent format (the OpenLDAP format) regardless of platform. The .NET Framework tests continued to use the version of S.DS.P from the GAC in my environment, so I've special-cased by the framework version rather than by the platform.
Misc. optimizations
There were a handful of byte-by-byte array copies, which I've switched over to using span-based copies in hopes that they'll benefit slightly from vectorisation. TransformControls and GetValue have a related change: where they used to reference properties returning byte arrays (which took defensive copies) they now reference the property values directly. These should both reduce GC traffic slightly.