Skip to content

einride/protobuf-avro-go

Repository files navigation

Protobuf + Avro

Functionality for converting between Protocol Buffers and Avro. This can for example be used to bulk load protobuf messages to BigQuery.

Examples

Examples use the following protobuf message:

message Book {
  string name = 1;
  string author = 2;
  string title = 3;
  bool read = 4;
}

protoavro.InferSchema

Avro schema inference for arbitrary protobuf messages.

func ExampleInferSchema() {
	msg := &library.Book{}
	schema, err := protoavro.InferSchema(msg.ProtoReflect().Descriptor())
	if err != nil {
		panic(err)
	}
	expected := avro.Nullable(avro.Record{
		Type:      avro.RecordType,
		Name:      "Book",
		Namespace: "google.example.library.v1",
		Fields: []avro.Field{
			{Name: "name", Type: avro.Nullable(avro.String())},
			{Name: "author", Type: avro.Nullable(avro.String())},
			{Name: "title", Type: avro.Nullable(avro.String())},
			{Name: "read", Type: avro.Nullable(avro.Boolean())},
		},
	})
	fmt.Println(cmp.Equal(expected, schema))
	// Output: true
}

protoavro.Marshaler

Writes protobuf messages to an Object Container File.

func ExampleMarshaler() {
	var msg library.Book
	var b bytes.Buffer
	marshaller, err := protoavro.NewMarshaler(msg.ProtoReflect().Descriptor(), &b)
	if err != nil {
		panic(err)
	}
	if err := marshaller.Marshal(
		&library.Book{
			Name:   "shelves/1/books/1",
			Title:  "Harry Potter",
			Author: "J. K. Rowling",
		},
		&library.Book{
			Name:   "shelves/1/books/2",
			Title:  "Lord of the Rings",
			Author: "J. R. R. Tolkien",
		},
	); err != nil {
		panic(err)
	}
}

protoavro.Unmarshaler

Reads protobuf messages from a Object Container File.

func ExampleUnmarshaler() {
	var reader io.Reader
	unmarshaller, err := protoavro.NewUnmarshaler(reader)
	if err != nil {
		panic(err)
	}
	for unmarshaller.Scan() {
		var msg library.Book
		if err := unmarshaller.Unmarshal(&msg); err != nil {
			panic(err)
		}
	}
}

Mapping

Messages are mapped as nullable records in Avro. All fields will be nullable. Fields will have the same casing as in the protobuf descriptor.

One ofs are mapped to nullable fields in Avro, where at most one field will be set at a time.

Maps are mapped as a list of records with two fields, key and value. Order of map entries is undefined.

Enums are mapped as enums of string values in Avro.

Some well known types have a special mapping:

Protobuf Avro
wrappers (ex google.protobuf.DoubleValue) Nullable scalars (ex [null, double])
google.protobuf.Any string containing JSON encoding of Any
google.protobuf.Struct string containing JSON encoding of Struct
google.protobuf.Timestamp long.timestamp-micros
google.protobuf.Duration float (seconds)
google.type.Date int.date
google.type.TimeOfDay long.time-micros

Limitations

Avro does not have a native type for timestamps with nanosecond precision. google.protobuf.Timestamp and google.type.TimeOfDay are truncated to microsecond precision when encoded as Avro.