Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd methods to convert between hexadecimal strings and bytes #17837
Comments
This comment has been minimized.
This comment has been minimized.
|
Thanks for creating this issue @GSPP. cc: @terrajobst |
This comment has been minimized.
This comment has been minimized.
|
Replacement is not the ONLY way, you can just skip the char if you need to.
I agree that the format is useless in MOST cases... However if you think about it a little more and try to be as outside the box as possible you will clearly see that there is an extra index for every octet, finally then you can achieve some interesting things by realigning the data or using the extra indexes.... |
This comment has been minimized.
This comment has been minimized.
|
@GSPP how about augmenting the BitConverter type with these new methods? I feel like that would be a better place for these since that is already a type people use for converting between various formats. |
This comment has been minimized.
This comment has been minimized.
|
There are at least a couple of other issues tracking something like this: Let's track it with a single issue (this one). It does look like this is something for which there is interest in implementing. @GSPP, @GrabYourPitchforks @KrzysztofCwalina is anyone of you interested in writing up a formal API proposal for this? |
This comment has been minimized.
This comment has been minimized.
|
If the slices API lands before this, I'd go with something like: // all instance methods are thread-safe
public abstract class BinaryEncoding
{
protected abstract void ToBytesOverride(ReadOnlySpan<char> sourceChars, Span<byte> destinationBytes);
protected abstract int GetByteLengthOverride(int stringLength);
protected abstract void ToCharsOverride(ReadOnlySpan<byte> sourceBytes, Span<char> destinationChars);
protected abstract int GetStringLengthOverride(int byteLength);
public byte[] ToBytes(ReadOnlySpan<char> str);
public void ToBytes(ReadOnlySpan<char> sourceChars, Span<byte> destinationBytes);
public int GetByteLength(int stringLength);
public string ToString(ReadOnlySpan<byte> bytes);
public char[] ToChars(ReadOnlySpan<byte> bytes);
public void ToChars(ReadOnlySpan<byte> sourceBytes, Span<char> destinationChars);
public int GetStringLength(int byteLength);
}
public class HexEncoding
{
public static HexEncoding UpperCase { get; }
public static HexEncoding LowerCase { get; }
}perhaps add some additional overloads with |
This comment has been minimized.
This comment has been minimized.
|
We don't take any new API additions if they would be obsolete when |
This comment has been minimized.
This comment has been minimized.
|
Span has landed so this can be unblocked. |
This comment has been minimized.
This comment has been minimized.
|
I propose a slightly different API, instead putting this on the namespace System
{
// NEW methods on EXISTING type
public static class Convert
{
public static byte[] FromHexString(ReadOnlySpan<char> chars); // doesn't match an existing overload of FromBase64String, but could be useful
public static byte[] FromHexString(string s);
public static string ToHexString(byte[] inArray, HexFormattingOptions options = default);
public static string ToHexString(byte[] inArray, int offset, int length, HexFormattingOptions options = default);
public static string ToHexString(ReadOnlySpan<byte> bytes, HexFormattingOptions options = default);
}
[Flags]
public enum HexFormattingOptions
{
None = 0,
InsertLineBreaks = 1,
Lowercase = 2,
}
}It also makes sense IMO to have |
This comment has been minimized.
This comment has been minimized.
|
To give some parity with the public static class Convert
{
public bool TryFromHexString(ReadOnlySpan<char> chars, Span<byte> buffer, out int bytesWritten);
public bool TryFromHexString(string s, Span<byte> buffer, out int bytesWritten);
public bool TryToHexString(ReadOnlySpan<byte> bytes, Span<char> buffer, out int charsWritten, HexFormattingOptions options = default);
}The rationale for this is in my typical usage I expect a hex string to be short (think a SHA digest) which tops out at 64 bytes right now. I'm not strongly tied to these, but worth considering during review. Other considerations, does there need to be an options for the What about continuing from line breaks? |
This comment has been minimized.
This comment has been minimized.
|
@vcsjones I envision |
This comment has been minimized.
This comment has been minimized.
|
I'd not put it on Should this API accept any other input besides spans? It could accept arrays, strings, We certainly need methods that do not throw in the error case. My original proposal missed those.
Is this a widely known standard format? Is it only for base64 or also for hex? New API proposal:
I chose the name prefix to be "Get" so that "TryGet" is fitting. This succinct naming would not be possible when placing these methods in |
This comment has been minimized.
This comment has been minimized.
|
There are many variants of API. It can be used in web,cryptography and many other places. Some hex conversion is used in json for writing non-ASCII chars - \u0000 - \uFFFF I think that .NET Core must provide optimized variants for each purpose Writing API
All above give us more than 100 function declarations. If it will have byte[] inputs - it take much more For example just writing API for ReadOnlySpan as source, with all kinds of separators, and only lowercase give us 6 optimized variants
This is bad idea to place it in existing places, I think, that better to place this API in external netcoreapp2.1+netstandard2.1 package. Something like System.Text.Hex |
This comment has been minimized.
This comment has been minimized.
|
Also, as @vcsjones mentioned for short strings, we need some different allocation mechanism, to avoid dirty hacks for reducing allocations count: (https://docs.microsoft.com/en-us/dotnet/csharp/how-to/modify-string-contents#unsafe-modifications-to-string)
For example in this case we can allocate string memory (x2 of byte array length) and write to it, without copy contructor. |
This comment has been minimized.
This comment has been minimized.
|
I consolidated the different proposals and comments made in this discussion and came up with the following API. I also provide a sample implementation (internally utilizing AVX2, SSE4.1, SSSE3, SSE2 or Scalar/Lookup with the following performance characteristics) of the API to experiment with via a seperate GitHub repository and nuget package. Sample implementation to play arround withhttps://github.com/tkp1n/HexMate Adressed proposals and comments:
Open questions:
API proposalnamespace System
{
// NEW type
[Flags]
public enum HexFormattingOptions
{
None = 0,
InsertLineBreaks = 1,
Lowercase = 2
}
// NEW methods on EXISTING type
public static class Convert
{
// Decode from chars
public static byte[] FromHexCharArray(char[] inArray, int offset, int length) => throw null;
public static bool TryFromHexChars(ReadOnlySpan<char> chars, Span<byte> bytes, out int bytesWritten) => throw null;
// Decode from strings
public static byte[] FromHexString(string s) => throw null;
public static byte[] FromHexString(ReadOnlySpan<char> chars) => throw null;
public static bool TryFromHexString(string s, Span<byte> bytes, out int bytesWritten) => throw null;
// Encode to chars
public static int ToHexCharArray(byte[] inArray, int offsetIn, int length, char[] outArray, int offsetOut, HexFormattingOptions options = default) => throw null;
public static bool TryToHexChars(ReadOnlySpan<byte> bytes, Span<char> chars, out int charsWritten, HexFormattingOptions options = default) => throw null;
// Encode to strings
public static string ToHexString(byte[] inArray, HexFormattingOptions options = default) => throw null;
public static string ToHexString(byte[] inArray, int offset, int length, HexFormattingOptions options = default) => throw null;
public static string ToHexString(ReadOnlySpan<byte> bytes, HexFormattingOptions options = default) => throw null;
}
}
namespace System.Buffers.Text
{
// NEW type
public static class Hex
{
// Decode
public static OperationStatus DecodeFromUtf8(ReadOnlySpan<byte> utf8, Span<byte> bytes, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) => throw null;
public static OperationStatus DecodeFromUtf8InPlace(Span<byte> buffer, out int bytesWritten) => throw null;
// Encode
public static OperationStatus EncodeToUtf8(ReadOnlySpan<byte> bytes, Span<byte> utf8, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true) => throw null;
public static OperationStatus EncodeToUtf8InPlace(Span<byte> buffer, int dataLength, out int bytesWritten) => throw null;
}
} |
This comment has been minimized.
This comment has been minimized.
|
I think we should demonstrate a need for inserting line breaks first. I am not aware of any common scenario where this would be used. If this feature is added then the line length must be variable. I cannot imagine any fixed line length to be appropriate. What are the intended use cases?
By default, the parsing routines should be maximally strict. This means no support for whitespace or any separator. When we parse data we generally want to be as strict as possible in order to find bugs in the originating systems and in order to not accidentally accept invalid data. The motto Parsing should accept any case of the letters I see no need for supporting a separator. This is a rare need. An easy workaround exists at a performance cost: |
This comment has been minimized.
This comment has been minimized.
|
@GrabYourPitchforks, it seems we should survey what internal implementations we have and what all the places are where we expose something similar. It seems new proposals were added after you marked it as read for review. |
This comment has been minimized.
This comment has been minimized.
|
Really looking forward to seeing this. I benchmarked @tkp1n implementation and it's faster than everything else in .NET Core. Compared to these methods: https://stackoverflow.com/a/624379/2828480 |
It is quite common to need to convert bytes to hex strings and back. The .NET Framework does not have an API to do that. Look at what lengths people have gone to to solve this problem for them: How do you convert Byte Array to Hexadecimal String, and vice versa? Many of those solutions are somehow bad (e.g. slow or broken or not validating).
BitConverter.ToString(byte[])generates strings of a practically useless format:BitConverter.ToString(new byte[] { 0x00, 0x01, 0xAA, 0xFF })=>00-01-AA-FF. I don't understand how this decision was made. This seems to be a special solution for some specific purpose (debug output?). Using this API requiresReplace(str, "-", "")which is ugly and slow. Also, this API is not discoverable because it does not have "hex" in it's name.I propose adding an API that does the following:
The use case for char[] in- and outputs is low-allocation scenarios.
The API could look like this:
A possible extension would be to support
StreamandIEnumerable<byte>as sources and destinations. I would consider this out of scope.I feel that converting hexadecimal strings is an important and frequent enough thing that this should be supported well. Of course, a "light" version of this proposal is also feasible (just support for string and byte[] with no user-provided output buffer). Not sure if anyone wants/needs lowercase hex, I don't but others might.
This seems cheap enough to implement. Testing should be easily automatable. I don't see any particular usability problems. This is suitable for community contribution.