-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BerConverter does not use AsnWriter, AsnDecoder APIs #97540
Comments
Tagging subscribers to this area: @dotnet/area-system-directoryservices, @jay98014 Issue DetailsAt the moment, the The AsnDecoder functions outperform BerConverter by a fair margin, although I don't think it's called enough times to make a measurable difference. The main value lies in being able to remove all of the BerPal class except for FreeBerElement (which is called by LdapConnection.) It'll mean taking a dependency on System.Formats.Asn1, which seems reasonable for an LDAP library. Performance benchmarks - 3-6x execution time improvement, 3.4-5x memory allocation reductionThese performance benchmarks are for a sequence of 10,000 integers. The AsnWriter uses an appropriately-sized initial buffer, which can be estimated from the format string; without this buffer, the memory allocation of AsnWriterEncode explodes by a factor of 9, to triple that of BerConverterEncode. BenchmarkDotNet v0.13.12, Windows 11 (10.0.22621.3007/22H2/2022Update/SunValley2)
This will change the BER output on Windows, but Windows is already a slight anomaly. At present, Windows'
There's also a difference in the ASN.1 BIT STRING. BerConverter doesn't appear to include the unused bit count, while AsnWriter and OpenLDAP do. An example of the changed output for this for the BIT STRING needed for a value of 4 (and two unused bits) is:
Both of these are already noted to a certain degree in BerConverterTests, and switching to AsnWriter and AsnDecoder would make the output consistent across platforms. Although it's a public API, it's worth nothing that this is only called within the library when building the list of
|
I've been looking at ber_scanf's documentation states that the inbound byte array is interpreted according to the format string - but in doing so, it ignores the ASN.1 tags, to a degree that their contents can be interpreted corruptly. This can cut both ways: a valid ASN.1-formatted byte array can generate invalid results, and an invalid buffer can generate valid results. Case 1: Valid byte array, invalid resultsI've lifted this example byte array from
The final output of ber_scanf with a format string of Case 2: Invalid byte array, valid resultsThis is simply an adapted version of the previous byte array. It's an invalid ASN.1 structure which will be parsed as if it was correct.
ImpactThe impact of using AsnWriter to power BerConverter.Encode is pretty clear, but using AsnDecoder for BerConverter.Decode will mean that the format string needs to match the actual ASN.1 data types directly. This'll cause Decode to throw an exception in some situations. I think that's the right thing to do though: at the moment, it's easy for Decode to interpret data in undefined ways. I was also hoping to avoid API changes, but if Decode is to be made stricter then the exception behaviour of Decode and Encode should change to make sure that we're providing useful information when we throw Separately to this, there are a few test cases missing in BerConverterTests to verify that the Proposed API changespublic static class BerConverter
{
public static byte[] Encode(string format, params object[] value);
+ public static byte[] Encode(ReadOnlySpan<char> format, params object[] value);
+ public static bool TryEncode(string format,
+ Span<byte> destination, out int bytesWritten, params object[] value);
+ public static bool TryEncode(ReadOnlySpan<char> format,
+ Span<byte> destination, out int bytesWritten, params object[] value);
public static object[] Decode(string format, byte[] value);
+ public static object[] Decode(ReadOnlySpan<char> format, byte[] value);
+ public static bool TryDecode(string format, byte[] value, out object[] destination);
+ public static bool TryDecode(ReadOnlySpan<char> format, byte[] value, out object[] destination);
}
public class BerConversionException
{
+ // Position in the format string where the exception was generated.
+ public int FormatIndex { get; }
+ // When encoding, the type of parameter which was expected. Null when dealing with structures
+ // which don't require a parameter (such as sequences.) Null when decoding.
+ public Type? ExpectedParameterType { get; }
+ // When encoding, the parameter being encoded when the exception was generated. Null when
+ // dealing with structures which don't require a parameter.
+ // When decoding, the decoded parameter value (if this is known when the exception was generated.)
+ public object? ParameterValue { get; }
+ public BerConversionException(int formatIndex, Type? expectedParameterType);
+ public BerConversionException(int formatIndex, Type? expectedParameterType, string message);
+ public BerConversionException(int formatIndex, Type? expectedParameterType,
+ object? parameterValue);
+ public BerConversionException(int formatIndex, Type? expectedParameterType,
+ object? parameterValue, string message);
} What's the right way for me to receive initial feedback on the proposal and to get it ready for the API review process? |
Tag experts in the |
Adding new API and changing the underlying implementation are really two different things; so they should be in two separate issues. The new exception data might depend on the implementation changing, so they might be interdependent in that sense... but otherwise they feel orthogonal. For the API part:
For the implementation change:
|
Thanks for the review. The additional API request was largely a result of the Decode API changes. The Encode API changes are pretty simple (although I'm a bit curious why Windows is always returning the long-form length attribute - was this for backwards compatibility with something, and if so then is it still around?) In a nutshell: the existing BerConverter.Decode API calls AsnReader is better-behaved - when we try to read (e.g.) an integer, it'll actually validate the datatype of the ASN element in its input buffer, and throw if it doesn't match. Throwing an exception in this circumstance is a breaking change, but given that the input buffer comes directly from the remote server, I think throwing is preferrable - trying to blindly interpret the data in the same way that ber_scanf does feels like a short-cut to undefined behaviour, even if bounds checking protects against the buffer overreads I mentioned earlier. This then led to the extra fields in BerConversionException: if we're going to throw this exception when the input buffer doesn't match the format string, it made sense to indicate which part of the format string had the error. I then expanded the same principle to the Encode logic, which currently just performs Debug.WriteLine if there's a type mismatch while encoding. Your points around the rest of the API surface for BerConverter make a lot of sense. My thought process for using ReadOnlySpans for the format strings was that using this would make string constant inlining work, and that the very short format strings used in user code might interact with the JIT's PGO and loop unrolling to turn into (essentially) a JIT-generated BER deserializer. If that's not the case, then the ROSes can go - they're not going to do any good. There's some value in being able to decode directly from the input buffer as a span and avoid copying a byte array around. I'll discard the TryDecode and ReadOnlySpans, and will see about adding a Decode overload which accepts that span. This might be a few days though. |
Looking into it further, neither BerConverter.Encode nor BerConverter.Decode seem to be very popular (8 hits for Decode on https://grep.app, and 11 for Encode); so I'm not actually convinced that any performance gains here are worth the compatibility risk of changing either to use System.Formats.Asn1. (Callers, of course, should feel encouraged to do so... even within the System.DirectoryServices.Protocols library (when safe to do so)) I don't see any particular problems with adding a Span-writing TryEncode, or an overload of Decode that takes |
Writing the length as runtime/src/libraries/System.Formats.Asn1/src/System/Formats/Asn1/AsnWriter.cs Lines 569 to 582 in 8f33d62
|
Your latter point makes a lot of sense, thanks. They're both valid BER outputs, and avoiding a potentially-expensive block copy seems sensible. I agree that BerConverter's not used often, but it's on the inner loop of a hot path when using System.DirectoryServices.Protocols to query an LDAP server. Every parsed LDAP result can have a directory control, and this directory control needs to be parsed from a BER-formatted byte array (sometimes multiple times). Each character in the BER format string is a PInvoke into winldap/openldap, sometimes with the associated allocation of unmanaged memory, copies into it, etc. Another approach is just to use AsnWriter/AsnDecoder directly when parsing directory controls; this'd eliminate the impact of using BerConverter on a hot path while maintaining backwards compatibility. If we do this, I'd actually be in favour of obsoleting BerConverter entirely: it's serving no purpose other than wrapping ber_scanf/ber_printf, and there's now a managed, type-safe alternative which does that more efficiently. |
That was largely my suggestion to @bartonjs. private static byte[] EncodeSequenceInteger(int value) {
#if NET8_0_OR_GREATER
AsnWriter writer = new(AsnEncodingRules.BER, initialCapacity: 10);
#else
AsnWriter writer = new(AsnEncodingRules.BER);
#end
using (writer.PushSequence()) {
writer.WriteInteger(value);
}
return writer.Encode();
}
I did consider suggesting that... but backed off when I saw that other people, albeit a small number of them, were using it and might find its templating scheme useful. A quick look at the code to me also suggests that there are a number of allocation improvements that can be made to |
I'm in two minds on the value of the templating scheme for Encode. It's certainly quicker to work with - but somebody who needs to generate a BER object should already know the structure of BER object they want to create. I personally think it's more readable and maintainable to see that structure explicitly laid out in a few lines of an AsnWriter rather than looking up the format strings for ber_printf. I'm still of the opinion that we should mark it as obsolete, but I'd be happy to make the allocation improvements to the obsoleted method (and potentially check to see if some of the defensive copies are necessary - I think a lot of them might be removeable.) When it comes to Decode, the templating scheme is the largest part I dislike, and it's down to the underlying libraries not validating the actual type of the ASN element with the expected type in the format string, proceeding assuming that they match. We've not got a place to add that validation in .NET, and it renders everything in the buffer after that point untrustworthy - it's either flat-out unreadable (i.e. the value is null) or the values are incorrect in ways that can't reliably be detected from managed code. I'd really prefer to either deprecate it in favour of AsnDecoder, or to accept the breaking change ("your format string must now exactly match the incoming BER-formatted byte array.") and use AsnDecoder internally. If we do the latter, I'd prefer to keep the class consistent and use AsnWriter for Encode too (and accept the small behaviour change in sequence length formatting.) |
I'm not an owner for DirectoryServices; but my general view from the product-wide perspective is that I do not believe any breaking changes are justified here. Moving to AsnReader would potentially make sense for bringing up a new platform if ber_scanf was unavailable... but then there'd be a fair bit of hemming and hawing deciding whether to do that on existing platforms or leave them doing their existing P/Invokes. On the one hand: "It's a function not called by many, they probably won't care about the breaks.." I'm not, personally, a fan of how Decode appears to be structured; but I don't know enough about when people want to use it that I have an opinion about marking it as I'm (personally) supportive of anything that wants to move from ber_scanf to using AsnReader (at least partially because I wrote AsnReader)... but only when compatibility has been taken into account. And for BerConverter.Decode it doesn't sound (to me) like the compat risk is worth the gain. |
That's a pity, but a fair point - the API's been around for a long time, and BER generation/parsing is usually close to foundational to the protocols it's used in. I understand wanting to keep 100% backwards compatibility there. I think the current design of Encode is more forgiving here: we know we're generating an array with the correctly-typed BER elements, so if we can replicate the length encoding and if I try to understand what's happening with the BIT STRING processing then AsnWriter becomes a workable option. With that in mind, I think it'd be more reasonable to:
I'd expect these to be community contributions (although the AsnWriter compatibility flag would be an API review.) Once BerConverter's out of the hot path it's not the highest priority, but if you're happy with the approach I'll put it on my backburner. |
At the moment, the
System.DirectoryServices.Protocols.BerConverter
class calls the underlying LDAP library's ber_printf and ber_scanf functions. With the more recentSystem.Formats.Asn1
library, this is no longer needed; we can call the variousAsnWriter
andAsnDecoder
functions instead.The AsnDecoder functions outperform BerConverter by a fair margin, although I don't think it's called enough times to make a measurable difference. The main value lies in being able to remove all of the BerPal class except for FreeBerElement (which is called by LdapConnection.) It'll mean taking a dependency on System.Formats.Asn1, which seems reasonable for an LDAP library.
Performance benchmarks - 3-6x execution time improvement, 3.4-5x memory allocation reduction
These performance benchmarks are for a sequence of 10,000 integers. The AsnWriter uses an appropriately-sized initial buffer, which can be estimated from the format string; without this buffer, the memory allocation of AsnWriterEncode explodes by a factor of 9, to triple that of BerConverterEncode.
BenchmarkDotNet v0.13.12, Windows 11 (10.0.22621.3007/22H2/2022Update/SunValley2)
Intel Core i7-8565U CPU 1.80GHz (Whiskey Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.101
[Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2 [AttachedDebugger]
DefaultJob : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
This will change the BER output on Windows, but Windows is already a slight anomaly. At present, Windows'
ber_printf
encodes a{
(an ASN.1 SEQUENCE) with a four-byte length; OpenLDAP and AsnWriter will only use the number of bytes actually required to encode the sequence length. An example of the changed output for a very simple sequence of one integer (0x10) is:There's also a difference in the ASN.1 BIT STRING. BerConverter doesn't appear to include the unused bit count, while AsnWriter and OpenLDAP do. An example of the changed output for this for the BIT STRING needed for a value of 4 (and two unused bits) is:
Both of these are already noted to a certain degree in BerConverterTests, and switching to AsnWriter and AsnDecoder would make the output consistent across platforms.
Although it's a public API, it's worth nothing that this is only called within the library when building the list of
DirectoryControl
s. There might also be a performance improvement if we change the callers to use AsnDecoder directly, to avoid the overhead of processing a format string which will always be static.The text was updated successfully, but these errors were encountered: