-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One-shot PEM reader #29588
Comments
Here is the proposed API / implementation code: using System;
using System.Collections.Generic;
using System.Text;
// bogus namespace
namespace PEMOneShotNamespace
{
// bogus class name
public class PEMOneShot
{
private const String PREEB_PREFIX = "-----BEGIN ";
private const String PREEB_POSTFIX = "-----";
private const String POSTEB_PREFIX = "-----END ";
private const String POSTEB_POSTFIX = PREEB_POSTFIX;
private const int NOT_FOUND = -1;
/// <summary>
/// Finds a PEM data structure and returns the location of the label, the location of the base64 encoded contents and the end of the PEM structure.
/// This method follows the "lax" definition of PEM within RFC 7468 for maximum compatibility.
/// </summary>
/// <remarks>
/// <para>
/// This method does not allow additional header lines to be present; anything between the encapsulation boundaries is considered to be base64.
/// </para>
/// <para>
/// The <code>label</code> and <code>contents</code> may will not have a valid value and may be <code>null</code> if no valid PEM structure is found.
/// The <code>bytesInContent</code> and <code>endOfPEM</code> values will be set to -1 and 0 respectively if no valid PEM structure is found.
/// </para>
/// </remarks>
///
/// <param name="pemData">Character buffer that - possibly - contains a valid PEM structure.</param>
/// <param name="label">Output parameter which is a slice in the <code>pemData</code> that denotes the validated label.</param>
/// <param name="contents">Output parameter which is a slice in the <code>pemData</code> that denotes the validated base64 encoding.</param>
/// <param name="bytesInContent">Output parameter that shows the precise number of bytes in the base64 content.</param>
/// <param name="endOfPEM">Output parameter set to the next character after the PEM structure.</param>
/// <returns><code>true</code> if and only if a valid PEM structure is found</returns>
public static bool FindNextPem(ReadOnlySpan<char> pemData, out ReadOnlySpan<char> label, out ReadOnlySpan<char> contents, out int bytesInContent, out int endOfPEM)
{
// define default output values returned if no PEM structure or an erroneous PEM structure is detected
label = null;
contents = null;
bytesInContent = NOT_FOUND;
endOfPEM = 0;
// note that IndexOf always starts to parse at the start of a span, so we need to create multiple spans
ReadOnlySpan<char> curSpan = pemData;
int startPreeb = curSpan.IndexOf(PREEB_PREFIX);
if (startPreeb == NOT_FOUND)
{
return false;
}
int startPreebLabel = startPreeb + PREEB_PREFIX.Length;
curSpan = pemData.Slice(startPreebLabel);
int endPreebLabel = curSpan.IndexOf(PREEB_POSTFIX);
if (endPreebLabel == NOT_FOUND)
{
return false;
}
endPreebLabel += startPreebLabel;
int endPreeb = endPreebLabel + PREEB_POSTFIX.Length;
ReadOnlySpan<char> preebLabelSlice = pemData.Slice(startPreebLabel, endPreebLabel - startPreebLabel);
if (!ValidateLabel(preebLabelSlice))
{
return false;
}
curSpan = pemData.Slice(endPreeb);
int startPosteb = curSpan.IndexOf(POSTEB_PREFIX);
if (startPosteb == NOT_FOUND)
{
return false;
}
startPosteb += endPreeb;
int startPostebLabel = startPosteb + POSTEB_PREFIX.Length;
curSpan = pemData.Slice(startPostebLabel);
int endPostebLabel = curSpan.IndexOf(POSTEB_POSTFIX);
if (endPostebLabel == NOT_FOUND)
{
return false;
}
endPostebLabel += startPostebLabel;
int endPosteb = endPostebLabel + POSTEB_POSTFIX.Length;
ReadOnlySpan<char> postebLabel = pemData.Slice(startPostebLabel, endPostebLabel - startPostebLabel);
if (!ValidateLabel(postebLabel))
{
return false;
}
// perform base64 validation at the end
ReadOnlySpan<char> contentSlice = pemData.Slice(endPreeb, startPosteb - endPreeb);
if (!ValidateAndCountBase64Bytes(contentSlice, out int bytes))
{
return false;
}
label = preebLabelSlice;
contents = contentSlice;
bytesInContent = bytes;
endOfPEM = endPosteb;
return true;
}
/// <summary>
/// Validates that the content consists of valid base64: that it doesn't contain invalid characters,
/// too many or misplaced base64 padding characters.
/// </summary>
/// <param name="base64">The base64 encoded input array, using standard base64 with padding</param>
/// <param name="bytes">The number of bytes enocoded by the base64 encoding</param>
/// <returns><code>true</code> if and only if valid base64 is found</returns>
private static bool ValidateAndCountBase64Bytes(ReadOnlySpan<char> base64, out int bytes)
{
// TODO find out if the base64 codec of .NET is compatible with the whitespace (especially w.r.t. the padding characters)
bytes = NOT_FOUND;
int count = 0;
int padCharFound = 0;
int offset = 0;
int end = base64.Length;
while (offset < end)
{
char c = base64[offset++];
// TODO test if IsLetterOrDigit is not too permissive (Unicode after all)
if (Char.IsLetterOrDigit(c) || c == '+' || c == '/')
{
if (padCharFound > 0)
{
return false;
}
count++;
continue;
}
// TODO test if IsWhiteSpace is not too permissive (Unicode after all)
if (Char.IsWhiteSpace(c))
{
continue;
}
if (c == '=')
{
count++;
if (++padCharFound > 2)
{
return false;
}
continue;
}
return false;
}
if (count % 4 != 0)
{
return false;
}
bytes = (count / 4) * 3 - padCharFound;
return true;
}
/// <summary>
/// Validates that the label confirms to RFC 7468.
/// </summary>
/// <param name="label">The label to validate.</param>
/// <returns><code>true</code> if and only if the label consists of valid characters.</returns>
private static bool ValidateLabel(ReadOnlySpan<char> label)
{
// set to true to automatically detect space or hyphen at start of label
bool previousSpaceOrHyphen = true;
foreach (char c in label)
{
// we'll handle spaces 0x20 and hyphens 0x2D separately
if (c < 0x20 || c > 0x7E)
{
return false;
}
if (c == ' ' || c == '-')
{
if (previousSpaceOrHyphen)
{
return false;
}
previousSpaceOrHyphen = true;
}
else
{
previousSpaceOrHyphen = false;
}
}
// do not end with space or hyphen
if (previousSpaceOrHyphen)
{
return false;
}
return true;
}
}
} PS removed one or two copy / paste bugs during edit, nothing big |
Thanks, @owlstead! From a requirements perspective, things that I can see someone wanting to do:
So I think, functionality-wise, it meets the needs. Other things that might be reasonable for PEM (to help identify a type container for this)
Restricting to the cryptography uses of PEM (RFC 7468) makes all of this so much easier than my original thoughts, since the header/attributes section is out of scope. Additionally, I know I'm the one who suggested I'll ignore my last sentence for now, to give the theoretical state of // assembly: System.Security.Cryptography.Encoding
namespace System.Security.Cryptography
{
public static partial class PemEncoding
{
public static bool TryFindNextPem(
ReadOnlySpan<char> pemData,
out ReadOnlySpan<char> label,
out ReadOnlySpan<char> contents,
out int contentDecodeLength,
out int charsRead);
public static bool TryWriteData(
Span<char> destination,
ReadOnlySpan<char> label,
ReadOnlySpan<byte> data,
out int charsWritten);
// The formula, while straightforward, is probably too much to assume a caller would want
// to redefine, so this is a helper for TryWriteData to see if destination is too small.
public static int GetOutputSize(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
public static void WriteData(
TextWriter destination,
ReadOnlySpan<char> label,
ReadOnlySpan<byte> data);
}
} I'm not entirely sold on the method names of any of the methods, just starting the creative juices flowing. In particular, it seems like AsnWriter variants should be proper overloads, so the "Write" name should unify. public static void WriteData(
TextWrite destination,
ReadOnlySpan<char> label,
AsnWriter data); Feels... wrongish. |
OK, I can see that giving the start of the PEM might also be a good idea, as that will automatically show how to get the "description" or pre-amble, where needed (presuming that the right The Maybe we should return the Otherwise I would largely agree with your interface specification. I could, if you want, think of a |
The advantage of emitting the label as a While it's also true that a string can go to a (read-only) span, the string required making a new object and a copy (since .NET strings are length-prepended for performance and null-terminated for interop). So a string (or System.Range) would make sense for returning it as a field in a parse structure, but the span has advantage when it can be done via the |
As for that last comment, I don't see any advantage myself to keep So what seems natural to me is to supply:
I've used "Smurf naming conventions" to allow for Other changes in order of appearance:
Most importantly I removed Very much interested in your opinions! Oh, yeah, got Unit testing working and have a working implementation of above. Added constructors and methods to |
Oh, I can test things now. I used command line to create a test project and then imported in VS. However, before I test I would like to know if the API is OK-ish this way. Note that I'm personally not that happy with evaluating my own application. I am critical enough to think of some pretty nasty tests (boundary tests), but I cannot change my own thought process. So some review will of course be required. |
When reading a large multi-PEM (like /etc/ssl/certs.pem) the allocations of each string would cause the GC to kick in more often, which can have impact on servers. Of course, the API could internally avoid that by caching known (or encountered) strings and returning the string that is equivalent to the identified span... so then it comes down to a) always allocate a string (easy, usable, higher GC impact), b) returned shared strings (more work, usable, low GC impact), c) just return a span (easy, less friendly than string, no GC impact). Since very few different names are expected I'm sure that (a) would end up being (b) if we switched to using this to read /etc/ssl/certs.pem on Linux. Type name: RSA, DSA, and ECDsa definitely already violate the rule, but the rule is that acronyms should either be expanded or be treated as a simple word, so "PEM" (Privacy Enhanced Mail) would be "Pem". I threw "Encoding" on it because "Pem" as a standalone typename felt weird, and it's using the data encoding model from PEM without actually being about Privacy Enhanced Mail. (The actual guideline is "DO capitalize only the first character of acronyms with three or more characters, except the first word of a camel-cased identifier.") Methods/Structs:
So, using "on the fence" means "don't change it", I'd suggest // assembly: System.Security.Cryptography.Encoding
namespace System.Security.Cryptography
{
public static partial class PemEncoding
{
public readonly ref struct FieldsWithBase64
{
public FieldsWithBase64(string label, ReadOnlySpan<char> contentInBase64, int contentSizeBytes);
public string Label;
public ReadOnlySpan<char> ContentInBase64;
public int ContentSizeBytes;
}
public static bool TryFindPem(ReadOnlySpan<char> pemData, out FieldsWithBase64 fields, out Range location);
public static int GetOutputSize(int labelLength, int contentBytesLength);
public static bool TryWrite(Span<char> destination, ReadOnlySpan<char> label, ReadOnlySpan<byte> contentBytes, out int charsWritten);
}
} And a formal API review would likely add Testing: I'd just expect tests to go in src/System.Security.Cryptography.Encoding/tests/, along with the rest of the xunit tests. After you build src/System.Security.Cryptography.Encoding/ref/ with the new API structure the tests should be able to reference it. ( One test that I think you're missing, from the current code: reading a payload with mismatched labels:
The usual flow, per https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md, is
Of course, sometimes it takes actually doing the implementation and tests to understand how the API really needs to work.. Mainly I'm sharing that just because you and I think we've hit a good state, doesn't mean the (rest of the) API review will agree 😄. |
I'm leaning toward If I were using this API (which is a great idea!) my typical usage pattern would probably be something like: // If ROS<char>
FieldsWithBase64 fields = //get fields;
if (fields.Label.SequenceEqual("CERTIFICATE")) {
//
}
// If string
FieldsWithBase64 fields = //get fields;
if (fields.Label == "CERTIFICATE") {
//
} So I don't think using If we feel that a I don't actually like my idea, I much prefer In the proposal the
|
OK, so I've tried to take all your comments into consideration and came up with the following API, which is currently implemented and running some tests; I've simply refactored. public static class PemEncoding
{
public readonly ref struct Base64Fields
{
public Base64Fields(ReadOnlySpan<char> label, ReadOnlySpan<char> base64, int encodedByteSize);
public readonly ReadOnlySpan<char> Label;
public readonly ReadOnlySpan<char> Base64;
public readonly int EncodedByteSize;
public byte[] DecodeBase64();
}
public static bool TryFind(ReadOnlySpan<char> pemData, out Base64Fields fields, out Range location);
public static int GetEncodedSize(int labelSize, int dataSize);
public static bool TryWrite(Span<char> destination, ReadOnlySpan<char> label, ReadOnlySpan<byte> data, out int charsWritten);
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
// implementation only
private static bool ValidateLabel(ReadOnlySpan<char> label);
private static bool ValidateAndCountBase64Bytes(ReadOnlySpan<char> base64, out int encodedByteSize);
private static int SkipWhiteSpace(ReadOnlySpan<char> pemData, int startOffset)
} I've checked that the writers write all the PEM structures in the RFC, doing a full compare afterwards. I'm also skipping / parsing the following PEM's successfully even though they are obviously tosh:
Then I can write them again as strictly formatted PEM. I've added a I'm not completely happy with the Note that I'm quite the polyglot w.r.t. PL, but sometimes it takes a bit of time to get up to speed with e.g. C# best practices - I'm pretty much rooted in Java, taking up Kotlin as well. |
OK, let's discuss the two bears left: streaming support and error handling. The problem with streaming is that it doesn't allow to back track a stream. However, it is required if e.g. large CMS structures need to be supported because those can be of arbitrary length and not fit into memory. It is useful but maybe not required if PEM structures are present in a stream e.g. received through HTTP. As for the large data structures: it seems .NET only supports encoding of data in a It is relatively easy to add streaming for the generating part, but it would not be a symmetric API. I think I saw some requests to do this, so we could basically copy the generation code as I don't see any generic way of merging the two. Now for the error handling. Currently the Finally: should we create a |
That seems reasonable. Using personal experience I would see myself using this for small PEM bundles that supply a certificate chain and reading a few kilobytes into memory shouldn't be problematic. My suggestion now would be to get this API done now and consider streaming as a separate API proposal. It could either be added on later to these types or perhaps a new
Throwing exceptions would probably be useful, though I can't say with any certainly. Some .NET developers have taken
I agree with that, especially if this is intended to be a "lax" parser. |
For the API shape: I think that the nested struct needs to move out to be a peer type. We don't have crisp guidelines here, but it boils down to something like "If the average person working with the API needs to use the type by name, it shouldn't be nested". Since it's the output of the primary API (TryFind) it seems pretty "average user" to me. (So what does get to be a public nested type? Mostly struct-based iterators that most people never see, they just put the call in a foreach and let things happen). "Base64Fields" is a little generic once it moves out. ParsedPemFields, maybe? (Or PemFields?) I don't think that the output struct needs the DecodeBase64 method on it. Since the base64 data is already exposed the caller can just use the existing base64 decoding APIs.
I accept the logic here. Leaving it out until there's a demand seems reasonable.
We have guidance on this, which is (effectively) all
find "B", or throw? I'm, personally, having trouble deciding which I want it to do, so I'm wide open to suggestions. (I think I might be leaning toward false means no preeb was found, and throw for "couldn't successfully get to the end" (bad base64, can't find a posteb))
I think that I agree that there's not a lot of value in the "the first (or first non-whitespace) character wasn't internal static PemFields FindPemAtStart(ReadOnlySpan<char> pem)
{
PemFields fields = PemEncoding.Find(pem, out Range range);
if (range.Start.Value > 0)
{
ReadOnlySpan<char> leader = pem.Slice(0, range.Start.Value);
for (int i = 0; i < leader.Length; i++)
{
if (!char.IsWhitespace(leader[i]))
{
// Use the standard exception
PemEncoding.Find("", out _);
Debug.Fail("Should have thrown");
}
}
}
return fields;
} Yeah, it's more than 3 lines of code, but it's not tricky logic... and given that the RFC says leaders are allowed, it doesn't fit with the class. |
I'm all open to use Ah, and here I just inserted the exceptions. Personally I'm in favor of throwing errors when if a full header line / pre encap boundary is found (i.e. five dashes, BEGIN, some kind of label (any chars) and then five dashes again. Because if anything is wrong, then it is hard to argue that this is not the PEM that you were looking for. Basically you ask to find it a PEM, it finds a full header, but then still returns false. What would you do next as a user? And if no exception is thrown then symmetry is also lost between TryFind and Find. So yeah, I'm not convinced that returning false instead of a I'll try and finish cleaning up the code and post early next week. I haven't got a compiled version of Core 3 here so I'll just post it as a file (or two) here if you don't mind. I've got a holiday coming so time is getting precious (and I need to post soon because the race may eat me, lookup "Pointe du Raz" and then imagine kajaking in it :P ). |
Sorry, things taking too much time. I hope tomorrow or Wednesday. |
To get things moving along again... here's the API proposal that I think we have, so far: namespace System.Security.Cryptography
{
public static class PemEncoding
{
public readonly ref struct Base64Fields
{
public Base64Fields(ReadOnlySpan<char> label, ReadOnlySpan<char> base64, int encodedByteSize);
public readonly ReadOnlySpan<char> Label;
public readonly ReadOnlySpan<char> Base64;
public readonly int EncodedByteSize;
public byte[] DecodeBase64();
}
public static bool TryFind(ReadOnlySpan<char> pemData, out Base64Fields fields, out Range location);
public static int GetEncodedSize(int labelSize, int dataSize);
public static bool TryWrite(Span<char> destination, ReadOnlySpan<char> label, ReadOnlySpan<byte> data, out int charsWritten);
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
}
} I think the only feedback I would have left on this is that With that change, the API proposal becomes: namespace System.Security.Cryptography
{
public static class PemEncoding
{
public readonly ref struct Base64Fields
{
public Base64Fields(ReadOnlySpan<char> label, ReadOnlySpan<char> base64, int encodedByteSize);
public ReadOnlySpan<char> Label { get; }
public ReadOnlySpan<char> Base64 { get; }
public int EncodedByteSize { get; }
public byte[] DecodeBase64();
}
public static bool TryFind(ReadOnlySpan<char> pemData, out Base64Fields fields, out Range location);
public static int GetEncodedSize(int labelSize, int dataSize);
public static bool TryWrite(Span<char> destination, ReadOnlySpan<char> label, ReadOnlySpan<byte> data, out int charsWritten);
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
}
} |
I'll try and spend some more time on it. I've switched from laptop to desktop as it got to slow and ran out of SSD, and I'm currently running under Ubuntu - might switch back to Windows 10 due to audio problems among others... |
I wouldn't worry about implementation too much at this point - this needs to make it past the API review process. They may have significant feedback on the API shape. Unless there is any more feedback on what the public API is, I think the next best step is to get any additional feedback from @bartonjs on the API proposal, and possibly get it flagged as "ready for review". |
I still think the nested type needs to be changed to a peer type (and then it probably needs a name change to PemFields, or something like that). We are currently codifying what the guidelines are for TryWrite. I think that it'll end up being Since the data portion is written as Base64 and Label is limited to (U+0021-U+002C , U+002E-U+007E) we know we're always writing ASCII output. Should we make TryWriteAscii/WriteAscii that writes to bytes instead of chars? And, lastly, S.S.Cryptography is getting a bit bloated. Maybe it's time to consider a new namespace. Here we have PemEncoding and PemFields. ASN.1 will propose AsnReader, AsnWriter, AsnTag (or whatever final names they get). I've heard whispers of CBOR, which would probably add a reader and a writer. Maybe all of these things should be namespace System.Security.Cryptography
{
public static class PemEncoding
{
public static bool TryFind(ReadOnlySpan<char> pemData, out PemFields fields, out Range location);
public static PemFields Find(ReadOnlySpan<char> pemData, out Range location);
public static int GetEncodedSize(int labelSize, int dataSize);
public static bool TryWrite(Span<char> destination, ReadOnlySpan<char> label, ReadOnlySpan<byte> data, out int charsWritten);
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
}
public readonly ref struct PemFields
{
public PemFields(ReadOnlySpan<char> label, ReadOnlySpan<char> base64, int encodedByteSize);
public ReadOnlySpan<char> Label { get; }
public ReadOnlySpan<char> Base64 { get; }
public int EncodedByteSize { get; }
public byte[] DecodeBase64();
}
} (I left out WriteAscii, because it's an easy addition; and didn't change the namespace here because no consensus yet). The only thing that feels weird in this now is the EncodedByteSize value. It totally makes sense for the PemEncoding class to calculate it, it does that on the fly while reading. But should the ctor validate it? Or is it garbage-in garbage-out? I can't think of why anyone (other than PemEncoding) would build one, so maybe it doesn't matter. Or maybe it means it should be an internal ctor to enforce that there's no reason to make one. |
Is there existing API surface in S.S.C that would fit in this new Encoding namespace? Would the new namespace lead to confusion, "Why is X not in the Encoding namespace?" "Because it didn't exist for these existing APIs" |
Looking through the existing types I think my previous concern isn't valid. I'm always weary of namespaces, but it makes sense given the way the rest of corefx is broken up.
I don't mind the idea of this type being a simple way to get information out and not doing the validation at all.
Also probably true, then perhaps that is enough reason to mark the constructor as So this then? namespace System.Security.Cryptography.Encoding
{
public static class PemEncoding
{
public static bool TryFind(ReadOnlySpan<char> pemData, out PemFields fields, out Range location);
public static PemFields Find(ReadOnlySpan<char> pemData, out Range location);
public static int GetEncodedSize(int labelSize, int dataSize);
public static bool TryWrite(Span<char> destination, ReadOnlySpan<char> label, ReadOnlySpan<byte> data, out int charsWritten);
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
}
public readonly ref struct PemFields
{
internal PemFields(ReadOnlySpan<char> label, ReadOnlySpan<char> base64, int encodedByteSize);
public ReadOnlySpan<char> Label { get; }
public ReadOnlySpan<char> Base64 { get; }
public int EncodedByteSize { get; }
public byte[] DecodeBase64();
}
} |
Looks reasonable to me. I think API review will rewrite @owlstead any thoughts? (I'm going to go ahead and mark it as ready for review, but we review things on Tuesdays, so there's at least a couple days between now and when it actually makes it to the front of the queue) |
Feedback:
|
Should it be
The
Here is the proposal with namespace System.Security.Cryptography
{
public static class PemEncoding
{
public static bool TryFind(ReadOnlySpan<char> pemData, out PemFields fields) => throw null;
public static PemFields Find(ReadOnlySpan<char> pemData) => throw null;
public static int GetEncodedSize(int labelLength, int dataLength) => throw null;
public static bool TryWrite(ReadOnlySpan<char> label, ReadOnlySpan<byte> data, Span<char> destination, out int charsWritten) => throw null;
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data) => throw null;
}
public readonly struct PemFields
{
internal PemFields(Range label, Range base64data, Range location, int decodedDataLength) => throw null;
public Range Location { get; }
public Range Label { get; }
public Range Base64Data { get; }
public int DecodedDataLength { get; }
}
} |
I don't think the internal ctor needs ReadOnlyMemory values, since it's just storing ranges now, but the important part for the proposal is that it has an internal ctor :). For DecodedByteSize and/or the word Length, perhaps Part of the reason for the struct having Range values instead of the Span values is that it's easier to map it back to Memory or array values, if delayed processing is desired. (Or you're in an async method and the state machine complains that you have a ref struct). It also sets things up for working with char8/byte later for UTF-8 data, if we wanted. string pemString = await File.ReadAllTextAsync(path);
PemFields pemFields = PemEncoding.Find(pemString);
// Yeah, this uses Substring, but it works.
byte[] der = Convert.FromBase64String(pemString[pemFields.Base64Data]); ReadOnlySpan<char> pemString = ...;
PemFields pemFields = PemEncoding.Find(pemString);
ArrayPool<byte> rented = ArrayPool<byte>.Shared.Rent(pemFields.DecodedDataLength);
if (!Convert.TryConvertFromBase64(pemString[pemFields.Base64Data], rented, out int bytesWritten) ||
bytesWritten != pemFields.DecodedDataLength)
{
throw new InvalidOperationException();
}
key.ImportSubjectPublicKeyInfo(rented, out _);
ArrayPool<byte>.Shared.Return(rented); Assuming that seems sane, the only remaining question is the namespace. I know I'm the one who suggested creating the Encoding subnamespace, but I'm now having second thoughts... mainly because I'm thinking that the AsnReader/AsnWriter/etc types should really go somewhere other than System.Security.Cryptography, since it's also used by LDAP (and people have asked for it in the context of LDAP). I brought that up in the meeting, but we didn't really resolve it 😄. If these would be the only two types in that namespace then we probably want to just shove them in S.S.C with everything else. (Kinda feels like a "there's no right answer" problem.) If we move these types out of S.S.C I'd want to change "Pem" to "SimplePem" or something like that, to indicate it's the PEM variant that doesn't do attributes. |
Oh crud, I didn't mean to include the ctor that accepted ReadOnlyMemory. I was toying around with using ReadOnlyMemory in the ctor and as properties, too. Edited to fix and remove namespace for now. Should we keep the namespace and make it a new project entirely? If we have PEM, and ASN.1, and who-knows-what-else in the future (CBOR?) It might be "just two" for now but I can see more being added in the future. Too late now but I would have also considered some of the PKCS12 implementation useful there as well, too. |
A hallway chat for this suggested that we probably want a new assembly, distributed via NuGet (anticipating netstandard2.x requests)--and probably some engineering trickery to be able to use it as internal implementation details without getting into packaging.
Really it was that crypto doesn't (shouldn't) have enough stuff for a Encodings intermediate namespace (assuming that we want the ASN types to be together in a subnamespace), it'd pretty much be these two types. And CBOR/etc would probably just be separate packages... there's not much need for it within the platform, just as a requested offering. I don't think this would be under S.S.C, because there's not even a "propensity" argument. But maybe this suggests we want System.{Encodings.Binary | BinaryEncodings}.Asn / System.{Encodings.Binary | BinaryEncodings}.{SomethingForCBOR}. (COSE would probably be something subnamespaced under S.S.C (depending on complexity), but that's a thing built on top of COSE, doesn't need to be in the same assembly, or namespace). Also: I see you left |
@vcsjones Once we can close down on |
@bartonjs I amended my comment here which I think is the most "current" proposal with changes: #29588 (comment)
No preference, I don't think? Maybe I had a stronger opinion a month ago that I am unable to recall. Regardless, |
namespace System.Security.Cryptography
{
public static class PemEncoding
{
public static bool TryFind(ReadOnlySpan<char> pemData, out PemFields fields);
public static PemFields Find(ReadOnlySpan<char> pemData);
public static int GetEncodedSize(int labelLength, int dataLength);
public static bool TryWrite(ReadOnlySpan<char> label, ReadOnlySpan<byte> data, Span<char> destination, out int charsWritten);
public static char[] Write(ReadOnlySpan<char> label, ReadOnlySpan<byte> data);
}
public readonly struct PemFields
{
public Range Location { get; }
public Range Label { get; }
public Range Base64Data { get; }
public int DecodedDataLength { get; }
}
} |
I was giving some hints on this StackOverflow question where the idea of a PEM reader / writer for .NET was coming up. Jeremy then posted that if such a reader was available that it would be a nice idea to make it a one-shot, RFC 7468 compliant reader using the new
Span<char>
framework.So I've created a sample implementation / API proposal which performs the parsing of the "lax" (i.e. permissive). I've used the following additional goals / non-goals:
Goals:
Span
;Span
;Non-goals:
Niceties:
Parsing the base64 is easily done afterwards with the existing parser, and we may just want to skip the PEM.
Note that I didn't get my Unit testing working for Core 3, so the implementation has not yet received any significant testing. First let me know what you think.
Implementation notes: this implementation searches for the post-encapsulation boundary (footer) before validating the base64. It might be possible to use a state machine instead to create a single pass parser. This adds significant complexity to the parsing though, and searching for a static string should be relatively performant already.
Code in followup comment for now, as I don't know yet where to put it.
(Edited in):
API Proposal
The text was updated successfully, but these errors were encountered: