Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add BinaryData.ContentType #87068

Closed
KrzysztofCwalina opened this issue Jun 2, 2023 · 18 comments
Closed

[API Proposal]: Add BinaryData.ContentType #87068

KrzysztofCwalina opened this issue Jun 2, 2023 · 18 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Jun 2, 2023

Background and motivation

It's often useful to know the format of the binary payload stored in BinaryData. It's also often possible for APIs returning BinaryData to know the format. For example,

BinaryData bytes = blob.Download(); // blob metadata often specifies the format of the bytes stored in blob
var model = new Model();
model.Foo = bytes;
var json = model.ToJson(); // this serializes the model to JSON so we can send it in an HTTP request.

... when the model is being serialized (to JSON) we would like to know if the bytes represent JSON or unknown binary. If JSON, we would serialize it as a JSON value. If unknown binary, we would Base64 encode it and serialize it as a string.

API Proposal

Add the following members to BinaryData

public class BinaryData {

    // new members
    public string ContentType { get; } // MIME type, e.g. "application/json"
    public BinaryData WithContentType(string contentType);

    // new overloads of existing constructors adding contentType parameter
    public BinaryData(ReadOnlyMemory<byte> data, string contentType);
    public BinaryData(byte[] data, string contentType);

    // new overloads of existing factory methods adding contentType parameter
    public static BinaryData FromBytes(byte[] data, string contentType);
    public static BinaryData FromStream(Stream stream,  string contentType);
    public static Task<BinaryData> FromStreamAsync(Stream stream,  string contentType);
}

Existing constructors and factory methods would initialize the ContentType property as follows

  1. FromBytes => "application/octet-stream"
  2. FromStream=> "application/octet-stream"
  3. FromObjectAsJson=> "application/json"
  4. FromString=> "text/plain"
  5. ctor(byte[]) => "application/octet-stream"
  6. ctor(ReadOnlyMemory) => "application/octet-stream"
  7. ctor(object) => "application/json"
  8. ctor(string) => "text/plain"

API Usage

BinaryData bytes = blob.Download(); // blob metadata often specifies the format of the bytes stored in blob
var model = new Model();
model.Foo = bytes;
var json = model.ToJson(); // this serializes the model to JSON so we can send it in an HTTP request.
BinaryData bytes = BinaryData.FromStream(stream, MediaTypeNames.Text.Plain);
blob.Upload(bytes); // sets the blob content type to text/plain

Alternative Designs

No response

Risks

No response

@KrzysztofCwalina KrzysztofCwalina added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 2, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 2, 2023
@ghost
Copy link

ghost commented Jun 2, 2023

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

It's often useful to know the format of the binary payload stored in BinaryData. It's also often possible for APIs returning BinaryData to know the format. For example,

BinaryData bytes = blob.Download(); // blob metadata often specifies the format of the bytes stored in blob
var model = new Model();
model.Foo = bytes;
var json = model.ToJson(); // this serializes the model to JSON so we can send it in an HTTP request.

... when the model is being serialized (to JSON) we would like to know if the bytes represent JSON or unknown binary. If JSON, we would serialize it as a JSON value. If unknown binary, we would Base64 encode it and serialize it as a string.

API Proposal

Add the following members to BinaryData

public class BinaryData {

    // new members
    public string ContentType { get; } // MIME type, e.g. "application/json"
    public BinaryData WithContentType(string contentType);

    // new overloads of existing constructors adding contentType parameter
    public BinaryData(ReadOnlyMemory<byte> data, string contentType);
    public BinaryData(byte[] data, string contentType);

    // new overloads of existing factory methods adding contentType parameter
    public static BinaryData FromBytes(byte[] data, string contentType);
    public static BinaryData FromStream(Stream stream,  string contentType);
    public static Task<BinaryData> FromStreamAsync(Stream stream,  string contentType);
}

Existing constructors and factory methods would initialize the ContentType property as follows

  1. FromBytes => "application/octet-stream"
  2. FromStream=> "application/octet-stream"
  3. FromObjectAsJson=> "application/json"
  4. FromString=> "text/plain"
  5. ctor(byte[]) => "application/octet-stream"
  6. ctor(ReadOnlyMemory) => "application/octet-stream"
  7. ctor(object) => "application/json"
  8. ctor(string) => "text/plain"

API Usage

BinaryData bytes = blob.Download(); // blob metadata often specifies the format of the bytes stored in blob
var model = new Model();
model.Foo = bytes;
var json = model.ToJson(); // this serializes the model to JSON so we can send it in an HTTP request.
BinaryData bytes = BinaryData.FromStream(stream, MediaTypeNames.Text.Plain);
blob.Upload(bytes); // sets the blob content type to text/plain

Alternative Designs

No response

Risks

No response

Author: KrzysztofCwalina
Assignees: -
Labels:

api-suggestion, area-System.Text.Json

Milestone: -

@terrajobst terrajobst added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Jun 2, 2023
@terrajobst terrajobst added this to the 8.0.0 milestone Jun 2, 2023
@ghost
Copy link

ghost commented Jun 3, 2023

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

It's often useful to know the format of the binary payload stored in BinaryData. It's also often possible for APIs returning BinaryData to know the format. For example,

BinaryData bytes = blob.Download(); // blob metadata often specifies the format of the bytes stored in blob
var model = new Model();
model.Foo = bytes;
var json = model.ToJson(); // this serializes the model to JSON so we can send it in an HTTP request.

... when the model is being serialized (to JSON) we would like to know if the bytes represent JSON or unknown binary. If JSON, we would serialize it as a JSON value. If unknown binary, we would Base64 encode it and serialize it as a string.

API Proposal

Add the following members to BinaryData

public class BinaryData {

    // new members
    public string ContentType { get; } // MIME type, e.g. "application/json"
    public BinaryData WithContentType(string contentType);

    // new overloads of existing constructors adding contentType parameter
    public BinaryData(ReadOnlyMemory<byte> data, string contentType);
    public BinaryData(byte[] data, string contentType);

    // new overloads of existing factory methods adding contentType parameter
    public static BinaryData FromBytes(byte[] data, string contentType);
    public static BinaryData FromStream(Stream stream,  string contentType);
    public static Task<BinaryData> FromStreamAsync(Stream stream,  string contentType);
}

Existing constructors and factory methods would initialize the ContentType property as follows

  1. FromBytes => "application/octet-stream"
  2. FromStream=> "application/octet-stream"
  3. FromObjectAsJson=> "application/json"
  4. FromString=> "text/plain"
  5. ctor(byte[]) => "application/octet-stream"
  6. ctor(ReadOnlyMemory) => "application/octet-stream"
  7. ctor(object) => "application/json"
  8. ctor(string) => "text/plain"

API Usage

BinaryData bytes = blob.Download(); // blob metadata often specifies the format of the bytes stored in blob
var model = new Model();
model.Foo = bytes;
var json = model.ToJson(); // this serializes the model to JSON so we can send it in an HTTP request.
BinaryData bytes = BinaryData.FromStream(stream, MediaTypeNames.Text.Plain);
blob.Upload(bytes); // sets the blob content type to text/plain

Alternative Designs

No response

Risks

No response

Author: KrzysztofCwalina
Assignees: -
Labels:

area-System.Memory, area-System.Text.Json, api-ready-for-review

Milestone: 8.0.0

@terrajobst terrajobst added the blocking Marks issues that we want to fast track in order to unblock other important work label Jun 6, 2023
@bartonjs
Copy link
Member

bartonjs commented Jun 8, 2023

Video

  • We feel like the content-type defaulting for the existing byte-based inputs is complicating things, especially since the new property cannot be serialized properly. So the property should be marked nullable, and will default to null except for FromObjectAsJson (application/json) and FromString (text/plain)
  • Added FromString with contentType to square off the scenario of the caller already receiving JSON as a String and being able to do both parts in one call.
  • Based on usage, this seems to be more like the "Media Type" value from HTTP, vs the full "Content Type", so we've renamed it accordingly.
public class BinaryData {

    // new members
    public string? MediaType { get; } // MIME type, e.g. "application/json"
    public BinaryData WithMediaType(string? mediaType);

    // new overloads of existing constructors adding mediaType parameter
    public BinaryData(ReadOnlyMemory<byte> data, string? mediaType);
    public BinaryData(byte[] data, string? mediaType);
    public BinaryData(string data, string? mediaType);

    // new overloads of existing factory methods adding mediaType parameter
    public static BinaryData FromBytes(byte[] data, string? mediaType);
    public static BinaryData FromStream(Stream stream, string? mediaType);
    public static Task<BinaryData> FromStreamAsync(Stream stream, string? mediaType);
    public static BinaryData FromString(string data, string? mediaType);
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 8, 2023
@dersia
Copy link

dersia commented Jun 8, 2023

Making this nullable is the right decision. When we are talking about blob storage, there is also another behavior to keep in mind, when the blob storage is use as static website or the container is made pu lic and blobs are accessed as web resource i.e. using a browser, then the Metadata is used to set the content type of the response. And as far as I know the blob storage when used as static web site does some inferring when the content type is not set when it is returning the blob. So having the option to not set it for blob storage makes sense.

That being said, right now it is at least two calls when uploading blobs to set the content type, so having an option to set it while uploading makes it much easier.

@AlexRadch
Copy link
Contributor

AlexRadch commented Jun 17, 2023

I think you forget to add the FromBytes method with ReadOnlyMemory and mediaType parameters. The new constructor is added but the new method FromBytes is not added.

    // new overloads of existing constructors adding mediaType parameter
    public BinaryData(ReadOnlyMemory<byte> data, string? mediaType); // It is added
    
    // new overloads of existing factory methods adding mediaType parameter
    public static BinaryData FromBytes(ReadOnlyMemory<byte> data, string? mediaType); // Should it be added also?

@AlexRadch
Copy link
Contributor

I want to implement this API. Can I be assigned?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 18, 2023
@JoshLove-msft
Copy link
Contributor

  • Based on usage, this seems to be more like the "Media Type" value from HTTP, vs the full "Content Type", so we've renamed it accordingly.

I don't think there is really a distinction between Media Type and Content Type, other than the ContentType relating to the Content-Type header. The ContentType contains Media Type values. Even the spec provides examples where the charset parameter is specified for "MediaTypes" - https://www.rfc-editor.org/rfc/rfc6838#section-4.2.1

@JoshLove-msft
Copy link
Contributor

/cc @bartonjs @GrabYourPitchforks wondering if we can reconsider naming the property to ContentType instead of MediaType.

@AlexRadch
Copy link
Contributor

/cc @bartonjs @GrabYourPitchforks wondering if we can reconsider naming the property to ContentType instead of MediaType.

We have a complex ContentType class that contains MediaType property inside https://learn.microsoft.com/dotnet/api/system.net.mime.contenttype .

So we have only the MediaType property here, not the ContentType class.

@JoshLove-msft
Copy link
Contributor

Interesting, I don't see any mention in the spec that MediaType should not contain any parameters. https://www.rfc-editor.org/rfc/rfc6838#section-4.2.1

@AlexRadch
Copy link
Contributor

AlexRadch commented Jun 23, 2023

Interesting, I don't see any mention in the spec that MediaType should not contain any parameters. https://www.rfc-editor.org/rfc/rfc6838#section-4.2.1

ContentType class contains not only MediaType but also can contain CharSet for content encoding, Name for a file name that downloaded/uploaded, Boundary for sections in email, and Parameters for other parameters.

@AlexRadch
Copy link
Contributor

My question is:

Should BinaryData be extended only by simple MediaType or should it be extended by full content description with ContentType class (https://learn.microsoft.com/dotnet/api/system.net.mime.contenttype)?

I think the issue should be reviewed to add the ContentType class not only MediaType.

/cc @bartonjs @GrabYourPitchforks

@Neme12
Copy link

Neme12 commented Jul 1, 2023

Regarding serialization, the serializer could use the data URL format to also store the content type in addition to the data: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs
I think this would be a great way to serialize BinaryData if the MediaType isn't null. It's a universal, standard format for storing binary data in a string. The deserializer could check if the string starts with data: - if it does, deserialize the media type as well, otherwise treat it as regular base64. This would be a valid way to do things because base64 itself cannot contain a semicolon.

Of course, it could always use the data URL format from now on when serializing, even when the MediaType is null, but the deserializer still needs to be able to recognize the old format.

It would be great to implement this to ensure that the type remains roundtrippable when serializing.

@Neme12
Copy link

Neme12 commented Jul 1, 2023

I also feel like it should store the full content type (using either System.Net.Mime.ContentType or System.Net.Http.Headers.MediaTypeHeaderValue - btw why are there two types for the same thing? 😄), not just the mime type, so that it could represent things like "this is a deflate compressed tiff". I understand that compression is usually a concern of the transport layer, but still - in case someone has BinaryData with this type of content in it, storing just the fact that the content type is tiff is very misleading. If someone sees that content type, they would assume that the data is a valid tiff format, but it's not, and interpreting it as such by loading the image would fail. Even though it might not be too common, stuff like compression still affects the format the data is in and there should be the ability to store that info in addition to the mime type. Describing such data with just the mime type itself is just plain wrong and misleading for any consumers of this data. Anyone who validates such data would wrongly assume that it's a valid tiff/(anything else) format when it's not.

@tannergooding
Copy link
Member

@KrzysztofCwalina, @stephentoub. Is this critical for .NET 8, or is it fine to slip to .NET 9?

@dersia
Copy link

dersia commented Jul 27, 2023

For us it would be very useful if it is in net8

@KrzysztofCwalina
Copy link
Member Author

KrzysztofCwalina commented Jul 27, 2023

Yes, we have projects that need that asap. If you are running out of time, we would be willing to put some resources to help finish the PR

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 1, 2023
@tannergooding
Copy link
Member

Resolved in #89605

@ghost ghost locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants