Skip to content

Add a @pbIndex annotation and use it for reading and writing using the correct field indices#21

Merged
cb372 merged 11 commits intomasterfrom
field-indices
Dec 19, 2019
Merged

Add a @pbIndex annotation and use it for reading and writing using the correct field indices#21
cb372 merged 11 commits intomasterfrom
field-indices

Conversation

@cb372
Copy link
Copy Markdown
Member

@cb372 cb372 commented Dec 13, 2019

Add an annotation called @pbIndex for annotating message fields with their protobuf indices (field numbers):

case class MyMessage(
  @pbIndex(1) a: Int,
  @pbIndex(5) b: String
)

This annotation is used when writing to protobuf, to write the fields with the correct indices, and when reading from protobuf, to look for the correct index for each field. The previous behaviour was to assume field indices started at 1 and increased sequentially.

Note: This is a breaking change in behaviour for pbdirect, as users now must annotate every field of their message. (We use some shapeless voodoo to verify at compile time that every field is annotated.) Skeuomorph will need to be updated to generate these annotations when it generates code for Mu. I changed my mind and made the annotation optional. For unannotated case classes, the existing behaviour is preserved.

This is the first step towards making pbdirect a "proper" spec-compliant protobuf serializer, with the ability to handle default scalar values, oneof fields, fixed-width integer fields, packed repeated fields, etc.

Also in this PR:

  • refactored the writer code, splitting "message writers" and "field writers" into separate traits
  • refactored the writer tests accordingly
  • added some property-based tests for the roundtrip of a complex message to protobuf and back, to test for consistency between our reader and writer

If people are happy with the annotation approach, in later PRs we can add similar annotations on fields to support:

Other approaches I considered before deciding on using an annotation:

  • tagging the type with shapeless Nat, e.g. case class Msg(a: Int @@ _1, b: String @@ _2)
    • Nats only go up to the magic number 22
    • quite intrusive, e.g. could interfere with Avro serialization in Mu
  • using a type annotation instead, e.g. case class Msg(a: Int @pbIndex(1), b: String @pbIndex(2))
    • should work quite nicely with use of shapeless Coproduct to encode oneof fields, e.g. case class Msg(a: Int @pbIndex(1) :+: String @pbIndex(2) :+: CNil)
    • support for working with type annotations exists in shapeless master but hasn't been released (shapeless hasn't been released for 2 years!)

@cb372 cb372 changed the title [WIP] Add a @pbIndex annotation and use it for writing Add a @pbIndex annotation and use it for reading and writing using the correct field indices Dec 17, 2019
@cb372 cb372 marked this pull request as ready for review December 17, 2019 18:10
@@ -0,0 +1,29 @@
/*
* Copyright (c) 2019 Beyond the lines
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juanpedromoreno What do you think we should do about the copyright headers in this project? I am not a lawyer but it seems strange to add this copyright to newly added files.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds reasonable to me.

We can tweak the sbt-headers accordingly to make the copyright the more accurate possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking into the MIT licence a little.

  • Apparently there is no need for a copyright header in every file. The Apache Foundation recommend it if you are using Apache 2.0, but there is no need for MIT.
  • There is no need to list changes. The example @fedefernandez linked above is from a project using Apache 2.0, which has a requirement to list changes.
  • The only requirement is a LICENSE file containing a copyright notice and the MIT licence text.

So the simplest solution could be:

  • remove the source file headers
  • add a note to the LICENSE file saying 47 Degrees forked the project in 2019 and we claim copyright for any code added or changed since the fork.

What do you think? @juanpedromoreno @fedefernandez

References:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me too 👍

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll do that in a separate PR after I've merged this.

(FieldIndex(annotation.first :: annotation.more.toList), value)
}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the most interesting part of the PR.

Given some case class instance, e.g.

case class MyMessage(@pbIndex(1) a: Int, @pbIndex(2) b: String)
val msg = MyMessage(123, "hello")

we summon the generic representation:

123 :: "hello" :: HNil

and the list of @pbIndex annotations on the fields:

Some(pbIndex(1)) :: Some(pbIndex(2)) :: HNil

We validate that every field is annotated and zip the two HLists together using the zipWithFieldIndex polymorphic function, giving us an HList of tuples:

(FieldIndex(1), 123) :: (FieldIndex(2), "hello") :: HNil

Then we use consWriter and hnilWriter to inductively resolve the correct PBFieldWriter instance for each field and write the fields to the output stream.

case Some(annotation) => FieldIndex(annotation.first :: annotation.more.toList)
}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the writer, but in reverse.

We use the annotations to build an HList of FieldIndexes:

FieldIndex(1) :: FieldIndex(2) :: HNil

and give that HList to the PBProductReader, along with the input bytes.

consProductReader makes use of the head of the HList of FieldIndexes to see what index it should be looking for, then passes the tail of the HList in the recursive step.

Add an annotation called `@pbIndex` for annotating message fields with
their protobuf indices (field numbers). The idea is that skeuomorph will
generate these annotations when it generates code for Mu.

Started work on making use of these annotations to write fields with the
correct field numbers when encoding messages to protobuf.

Note: This is a breaking change in behaviour for pbdirect, as you now
*have to* annotate every field of your message. (We use some shapeless
voodoo to verify at compile time that every field is annotated.)

This is the first step towards making pbdirect a "proper" spec-compliant
protobuf serializer, with the ability to handle default scalar values,
`oneof` fields, decoding messages without relying on field order, etc.

Still to do:

* finish implementation of protobuf encoding
* decide what to do about annotating `oneof` fields (coproducts), as
they have multiple field numbers
* implement protobuf decoding (a bit more complicated)
* write more tests
The existing instance for Functor is still there, but these are added
for convenience so people don't have to remember to add the appropriate
cats imports.
This is to support encoding of shapeless Coproducts as protobuf 'oneof'
fields in future. (That functionality is not implemented yet.)
This at least gives us some confidence that the library is consistent
with itself, even if it's not yet compliant with the protobuf spec.
Copy link
Copy Markdown

@tzimisce012 tzimisce012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment thread src/main/scala/pbdirect/PBFieldWriter.scala
Copy link
Copy Markdown
Member

@juanpedromoreno juanpedromoreno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks @cb372 :)

I'd suggest adding some docs to the README file, but code here LGTM

@@ -0,0 +1,29 @@
/*
* Copyright (c) 2019 Beyond the lines
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds reasonable to me.

We can tweak the sbt-headers accordingly to make the copyright the more accurate possible.

@cb372
Copy link
Copy Markdown
Member Author

cb372 commented Dec 18, 2019

I'd suggest adding some docs to the README file

Good point, will do.

cb372 added a commit to higherkindness/skeuomorph that referenced this pull request Dec 18, 2019
See 47degrees/pbdirect#21

Also tidy up the generated code slightly by putting each field on a
separate line.
Also fixed incorrect byte array contents in example
Copy link
Copy Markdown

@BenFradet BenFradet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I just have a question: thinking from the user's perspective, could we have an all or nothing in terms of specifying @pbIndex? I.e. if none are specified, they are inferred from the generic representation.

@cb372
Copy link
Copy Markdown
Member Author

cb372 commented Dec 19, 2019

@BenFradet I agree, making the annotation mandatory is probably too disruptive for users. I've made it optional in 22380f3.

@cb372 cb372 merged commit cbd378a into master Dec 19, 2019
@cb372 cb372 deleted the field-indices branch December 19, 2019 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants