Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a stateful CBOR writer #32047

Closed
8 tasks done
bartonjs opened this issue Feb 10, 2020 · 7 comments
Closed
8 tasks done

Create a stateful CBOR writer #32047

bartonjs opened this issue Feb 10, 2020 · 7 comments

Comments

@bartonjs
Copy link
Member

bartonjs commented Feb 10, 2020

Create a reader for CBOR (Concise Binary Object Representation), as defined in IETF RFC 7049.

  • Initial implementation should be done directly into a test project
  • Assume a new, netstandard2.0-based package (hedged bet)
  • After the reader and writer are stable and feature complete, schedule an API review
  • After API review, and associated changes, split things into ref/src/test with the declared package.

Rough project plan (approximately a PR for each stage, and 3-5 days of work)

  • Support writing "unsigned" (positive) integers, negative integers, definite-length byte strings, definite-length text strings, as well as common structure (e.g. producing output).
  • Support writing arrays (definite length only).
  • Support writing maps (definite length only), in caller order.
    • Note that maps are like Dictionary<object, object>... key types are heterogeneous.
  • Support writing maps (definite length only) in canonical order.
    • Probably involves giving the writer a canonicalization mode
  • Support writing major type 7 (float: half, single, double; boolean: true, false; null; undefined)
  • Support writing single values, raw, with "single value" validation.
  • Support writing tags
  • Support important tagged types
    • The two DateTimeOffset types
    • Maybe the two BigInteger types
@bartonjs bartonjs added this to the 5.0 milestone Feb 10, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Feb 10, 2020
@Tornhoof
Copy link
Contributor

May I suggest support for writing Streams?
Use Case:
Elasticsearch supports something called ingest-attachment (https://www.elastic.co/guide/en/elasticsearch/plugins/master/ingest-attachment.html), where you upload binary data (and some metadata) via a rest api to an ingest pipeline and it processes the data. The normal way to do that is via JSON and b64 encoded byte array. The downside of that approach is, that b64 encoding is not really useful if we're talking about 100s of MB of data for ingestion (think large documents for a full text search). So they internally support sending CBOR too, where binary data is just written as is to the cbor stream (+ some header), making this the more efficient way.

Caveat:
To get that working you probably need to design your writer with async support, e.g. via ValueTasks, so you do not need to buffer the streams or other large values (strings).
My own experiments at https://github.com/Tornhoof/PipeCborExperiments/tree/master/src worked well enough for the arbitrarily sized streams approach, I use a highly stripped down version of such a Cborwriter (only map support for strings and streams) for the ES use case above.

@bartonjs bartonjs removed the untriaged New issue has not been triaged by the area owner label Feb 10, 2020
@bartonjs
Copy link
Member Author

Supporting writing directly to streams requires one of:

  • Requiring maps and arrays be prebuilt
  • Supporting indefinite writing
  • Only permitting seekable-writable Streams.

It also, as you observe, requires making async versions of everything.

It would possibly make sense to have something like a CborStreamWriter (assuming the simple writer is called CborWriter) which does write directly to a Stream instead of an internal buffer.

Alternatively and/or additionally, a static CborEncoder type could have static methods that write pre-sized data.

public partial static class CborEncoder
{
    public static bool TryEncodeNumber(int value, Span<byte> destination, out int bytesWritten) => throw null;
    public static void EncodeNumber(int value, Stream destination);
}

The problem that I see with that, is that there isn't any type safety on maps or arrays... e.g. it either says it accepts object[] or T[] with unconstrained T; but it would fail at runtime on EncodeArray<Guid>(...)... this feature is emphatically not a serializer.

The design in my head, FWIW, would be that writing something like https://tools.ietf.org/html/rfc8152#appendix-C.1.1

   98(
     [
       / protected / h'',
       / unprotected / {},
       / payload / 'This is the content.',
       / signatures / [
         [
           / protected / h'a10126' / {
               \ alg \ 1:-7 \ ECDSA 256 \
             } / ,
           / unprotected / {
             / kid / 4:'11'
           },
           / signature / h'e2aeafd40d69d19dfe6e52077c5d7ff4e408282cbefb
   5d06cbf414af2e19d982ac45ac98b8544c908b4507de1e90b717c3d34816fe926a2b
   98f53afd2fa0f30a'
         ]
       ]
     ]
   )

could be done via

byte[] payload = Encoding.UTF8.GetBytes("This is the content.");
CoseKey key = ...;

using (CborWriter writer = new CborWriter(CborCanonicalization.FIDO2))
{
    writer.WriteTag(98);
    using (writer.StartArray())
    {
        // protected
        writer.WriteByteString(ReadOnlySpan<byte>.Empty);
        // unprotected
        writer.WriteStartMap();
        writer.WriteEndMap();
        // payload
        writer.WriteByteString(payload);

        // signatures
        using (writer.StartArray())
        using (writer.StartArray())
        {
            // protected
            using (writer.StartByteStringWrapper())
            using (writer.StartMap())
            {
                writer.WriteNumber((int)CoseProperties.AlgorithmId);
                writer.WriteNumber((int)key.AlgorithmId);
            }

            // unprotected
            using (writer.StartMap())
            {
                writer.WriteNumber((int)CoseProperties.KeyIdentifier);
                writer.WriteByteString(key.Id);
            }

            // signature
            writer.WriteByteString(key.Sign(payload));
        }
    }
}

(Actual code would be broken up into more helper routines, each one writing down their part)

Using a definite encoding and a non-seekable stream; that can only write the tag; it has to hold back the entirety of the outer array until it knows how long it is. Even with a seekable stream, using the canonical CBOR "write the length in the fewest bytes possible" it would have to hold everything in case it mis-predicted the length-length.

Indefinite encoding would work; but that's not compatible with protocols that specify canonical CBOR (such as FIDO2 / WebAuthn). Plus, I completely skipped over the detail of needing to sort the map by encoded key values (RFC 7049 3.9 rule 3).

So... Streams are hard. But, it's good to know that someone wants them... it's an interesting thought experiment.

@eiriktsarpalis eiriktsarpalis self-assigned this Feb 20, 2020
@eiriktsarpalis
Copy link
Member

To consider: adding support for System.Decimal. The CBOR spec already supports decimal fractionals.

@charlesroddie
Copy link

How will this project compare to https://github.com/peteroupc/CBOR?

@charlesroddie
Copy link

charlesroddie commented Mar 23, 2020

Also area-System.Security is the wrong tag for CBOR-related issues/PRs. Should be area-Serialization. With System.CBOR as the namespace to correspond to System.XML and System.Text.Json. COSE is just one application.

@bartonjs
Copy link
Member Author

@charlesroddie Currently it's only "funded" via System.Security; and the namespace is something we're working on.

@eiriktsarpalis
Copy link
Member

Closing this issue as the implementation aspect of this work is done.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

6 participants