Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add System.BinaryData #41686

Closed
terrajobst opened this issue Sep 1, 2020 · 11 comments
Closed

Add System.BinaryData #41686

terrajobst opened this issue Sep 1, 2020 · 11 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@terrajobst
Copy link
Member

terrajobst commented Sep 1, 2020

@KrzysztofCwalina suggested to add a new type called BinaryData. The Azure SDK team found in user studies that people struggle with the various ways we represent binary data (span, memory, byte arrays, streams). They experimented with a new type BinaryData that is a struct-based wrapper around a byte[] and provides constructors, factory methods, and conversion methods to convert data from and to the various representations, including JSON serialization.

Full spec is here: dotnet/designs#155

@terrajobst terrajobst added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Sep 1, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Sep 1, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@terrajobst terrajobst added this to the 6.0.0 milestone Sep 1, 2020
@terrajobst terrajobst added area-System.Text.Json api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 1, 2020
@terrajobst
Copy link
Member Author

terrajobst commented Sep 1, 2020

Video

  • The type is very opinionated in order to achieve usability:
    • Serialization is done with System.Text.Json, with specific options
    • String encoding is always UTF8
  • We're concerned with hard coding the type to System.Text.Json, but we acknowledge that defaults are very valuable. If we need to support other serializers, then we can add an enum
    • We could decide to make the constructor that performs JSON serialization as a factory method. Combined with a hypothetical feature extension statics we could make the type more low level than serialization
    • An alternative design is to replace constructors and factories with instance members (SetBytes, SetObject) which will throw when the underlying array is non-null. That prevents mutation and uses a simple pattern var x = new BinaryData(); x.SetBytes(...);. And on top, we can use regular extension methods to split out JSON serialization.
  • This type can't live in corlib because it would depend on System.Text.Json (and whatever future serializers it needs to support). Hence, most areas in the BCL wouldn't be able to take BinaryData, but technologies above the BCL could (Microsoft.Extensions, ASP.NET Core, Azure SDK etc)
  • Thus, we shouldn't think of this as an exchange type in the BCL. We already have exchange types; the point of this type is unify them by providing conversions; the point isn't to provide yet another type. Thus, we don't see the need to, for example, provide new virtual methods on Stream.
namespace System
{
    public readonly struct BinaryData
    {
        public BinaryData(ReadOnlySpan<byte> data);
        public BinaryData(byte[] data);
        public BinaryData(object jsonSerializable, Type? type = null);
        public BinaryData(ReadOnlyMemory<byte> data);
        public BinaryData(string data);
        public static BinaryData FromBytes(ReadOnlyMemory<byte> data);
        public static BinaryData FromBytes(ReadOnlySpan<byte> data);
        public static BinaryData FromBytes(byte[] data);
        public static BinaryData FromObject<T>(T jsonSerializable, CancellationToken cancellationToken = default);
        public static Task<BinaryData> FromObjectAsync<T>(T jsonSerializable, CancellationToken cancellationToken = default);
        public static BinaryData FromStream(Stream stream);
        public static Task<BinaryData> FromStreamAsync(Stream stream, CancellationToken cancellationToken = default);
        public static BinaryData FromString(string data);
        public static implicit operator ReadOnlyMemory<byte>(BinaryData data);
        public ReadOnlyMemory<byte> ToBytes();
        public T ToObject<T>(CancellationToken cancellationToken = default);
        public ValueTask<T> ToObjectAsync<T>(CancellationToken cancellationToken = default);
        public Stream ToStream();
        [EditorBrowsable(EditorBrowsableState.Never)]
        public override bool Equals(object? obj);
        [EditorBrowsable(EditorBrowsableState.Never)]
        public override int GetHashCode();
        public override string ToString();
    }
}

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 2, 2020

I have a similar type in my own libraries and one thing I've found useful which isn't in this surface is base64 read and write capability.

@terrajobst terrajobst added api-ready-for-review API is ready for review, it is NOT ready for implementation api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 8, 2020
@terrajobst
Copy link
Member Author

terrajobst commented Sep 8, 2020

  • Spec: Initial draft of spec for System.BinaryData designs#155
  • We believe there are usability issue with instance methods as proposed in the spec.
  • We should consider exposing an ObjectSerializer abstraction, like Azure SDK has. This would solve the 3rd party extension problem in general.
  • Using an enum as a format will be harder to in static linking
  • Current POR:
    • Leave constructors
    • Leave statics
    • Consider suffixing FromObject and ToObject

@carlreinke
Copy link
Contributor

carlreinke commented Sep 9, 2020

Regarding constructors vs factory methods, perhaps future tooling could suggest FactoryAttribute-ed static methods from a type when the user types new Whatever. (FactoryAttribute is/was being considered for use in a future version of C# where the compiler would require that an attributed method return a new instance.) Or even without FactoryAttribute, tooling could suggest static methods that return the containing type when the user types new Whatever.

@dersia
Copy link

dersia commented Sep 16, 2020

I didn't see @carlreinke comment when I suggested #42314 but I think this is exactly what you were referring to

@terrajobst terrajobst added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work api-approved API was approved in API review, it can be implemented and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 20, 2020
@terrajobst
Copy link
Member Author

terrajobst commented Oct 20, 2020

Video

  • The type isn't meant to unify different representations
  • Rather, it's a helper to make it easier for users to convert data
  • Thus, it's not meant as high-performance exchange type, like span.
  • It's for "user data", that is, the contract for the API is "give form blob of binary data". It's not for protocol-like APIs where the underlying encoding is part of the API contract, e.g. a REST API.
    • This means the API is meant for things like Azure Blob Storage, Azure Queuing, but not HttpClient
  • We should consider making the type a class. It's unlike to have perf implications (given that constructing it generally requires copying)
    • This also gives us flexibility in the future, e.g. sub-classing
    • Not much value in sealing (we only give up not being able to add abstracts later)
  • The methods that perform JSON serialization/deserialization should take JSON options
    • We should make this the second parameter so we can add new serializers where type remains optional
  • We can remove the FromObjectAsync and ToObjectAsync methods, they are hold over from the serialization abstraction which had async implementations
  • We should remove the constructor and factory that takes a span, because it copies and it collides wit the ReadOnlyMemory<T>
  • We should add an implicit conversion to span (guidelines say that one shouldn't overload between them, so it seems generally fine)
  • We should consider hiding Equals(), GetHashCode()
  • The constructors that take byte[] and ReadOnlyMemory<byte> will not copy; they just wrap the payload. Mutations will be observed.
namespace System
{
    public class BinaryData
    {
        public BinaryData(byte[] data);
        public BinaryData(object jsonSerializable, JsonSerializerOptions options = default, Type? type = null);
        public BinaryData(ReadOnlyMemory<byte> data);
        public BinaryData(string data);
        public static BinaryData FromBytes(ReadOnlyMemory<byte> data);
        public static BinaryData FromBytes(byte[] data);
        public static BinaryData FromObjectAsJson<T>(T jsonSerializable, JsonSerializerOptions options = default, CancellationToken cancellationToken = default);
        public static BinaryData FromStream(Stream stream);
        public static Task<BinaryData> FromStreamAsync(Stream stream, CancellationToken cancellationToken = default);
        public static BinaryData FromString(string data);
        public static implicit operator ReadOnlyMemory<byte>(BinaryData data);
        public static implicit operator ReadOnlySpan<byte>(BinaryData data);
        public ReadOnlyMemory<byte> ToBytes();
        public T ToObjectFromJson<T>(JsonSerializerOptions options = default, CancellationToken cancellationToken = default);
        public Stream ToStream();
        [EditorBrowsable(EditorBrowsableState.Never)]
        public override bool Equals(object? obj);
        [EditorBrowsable(EditorBrowsableState.Never)]
        public override int GetHashCode();
        public override string ToString();
    }
}

@terrajobst terrajobst removed the blocking Marks issues that we want to fast track in order to unblock other important work label Oct 20, 2020
@ReubenBond
Copy link
Member

ReubenBond commented Oct 26, 2020

I believe this should support ReadOnlySequence<byte>, too:

namespace System
{
    public class BinaryData
    {
        public BinaryData(ReadOnlySequence<byte> data);
        public static BinaryData FromBytes(ReadOnlySequence<byte> data);
    }
}

@GSPP
Copy link

GSPP commented Oct 29, 2020

It seems like a layering violation that...

  • ...the namespace System refers to JSON as a concept
  • ...a type that is about representing binary data has ties to a specific serialization format (JSON)

As it stands, this type appears to be a DTO helper object hard-coded to use System.Text.Json. It would more cleanly fit into that namespace and package.

@stephentoub
Copy link
Member

@terrajobst, with https://github.com/dotnet/runtime/tree/8a52f1e948b6f22f418817ec1068f07b8dae2aa5/src/libraries/System.Memory.Data, can this be closed now?

@terrajobst
Copy link
Member Author

Yep

@dotnet dotnet locked as resolved and limited conversation to collaborators Feb 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

8 participants