Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Ballerina SerDes module #2964

Closed
MohamedSabthar opened this issue Jun 1, 2022 · 2 comments
Closed

Proposal: Ballerina SerDes module #2964

MohamedSabthar opened this issue Jun 1, 2022 · 2 comments
Labels
module/serdes Issues related to the Ballerina serdes module Status/Implemented Implemented proposals Type/Proposal

Comments

@MohamedSabthar
Copy link
Member

MohamedSabthar commented Jun 1, 2022

Summary

Proposing an effective way to serialize/deserialize Ballerina anydata type to send them efficiently over the wire.

Goals

  • Come up with a package to serialize/deserialize Ballerina anydata that requires minimal user input.
  • Protocol modules such as HTTP, WS, Kafka, NATS, RabbitMQ should be able to seamlessly integrate with this package.
  • Protocol buffers could be used to serialize data in a compact binary format.

Motivation

Ballerina language supports the concept of anydata, which can be sent across the network. In order to do so, it is required to serialize/deserialize anydata in some form. Converting anydata to JSON and sending it over the wire is the current approach. Even though this approach is appropriate in certain use cases, there are more efficient ways to serialize/deserialize data such as proto3 (by Google) or avro (by Hadoop).

Description

As mentioned in the Goals section, the primary goal of this proposal is to come up with a standard library package to serialize and deserialize anydata type. Initially proto3 will be used as the main underlying technology to implement the package. The package will include other technologies such as Avro in future.

Proto3

Protocol Buffers is a method of serializing structured data developed by Google, and proto3 is the latest version of Protocol Buffers. Protocol Buffers use .proto files to define messages (similar to a Ballerina record type) in proto3 syntax and the messages are used to generate the proto schema that can later be populated with data. These generated messages can be serialized/deserialized more efficiently compared to JSON. More information related to proto3 can be found in the official documentation.

Ballerina type proto3 type
int sint64
byte bytes
float double
decimal message
boolean bool
string string
array repeated type
tuple message
map message
table message
record message
Union message
enum message

Following are the example of ballerina type to proto message

Primitvie

  1. int, byte, string, boolean and float uses the flowing message format
Ballerina Proto message
string package = "serdes";
message StringValue {
    string atomicField = 1;
}
int age = 24;
message IntValue {
  sint64 atomicField = 1;
}
  1. decimal type uses the following message format
Ballerina Proto message
decimal salary = 2e5;
BigDecimal
message DecimalValue {
    uint32 scale = 1;
    uint32 precision = 2;
    bytes value = 3;
}

Array

  1. Simple arrays
Ballerina Proto message
type IntArray int[];
message ArrayBuilder {
    repeated sint64 arrayField = 1;
}
type FloatArray float[];
message ArrayBuilder {
    repeated double arrayField = 1;
}
type DecimalArray decimal[];
message ArrayBuilder {
  message DecimalValue {
     uint32 scale  = 1;
     uint32 precision  = 2;
     bytes value  = 3;
  }
  repeated DecimalValue arrayField  = 1;
}
  1. Multidimensional arrays
Ballerina Proto message
type String2DArray string[][];
message ArrayBuilder {
  message ArrayBuilder {
     repeated string arrayField  = 1;
  }
  repeated ArrayBuilder arrayField  = 1;
}
type Decimal3DArray decimal[][][];
message ArrayBuilder {
  message ArrayBuilder {
    message ArrayBuilder {
      message DecimalValue {
         uint32 scale  = 1;
         uint32 precision  = 2;
         bytes value  = 3;
      }
      repeated DecimalValue arrayField  = 1;
    }
    repeated ArrayBuilder arrayField  = 1;
  }
  repeated ArrayBuilder arrayField  = 1;
}

Union

  1. Union with primitive types
Ballerina Proto message
type PrimitiveUnion  int|byte|float|decimal|string?; 
message UnionBuilder {
  message DecimalValue {
     uint32 scale  = 1;
     uint32 precision  = 2;
     bytes value  = 3;
  }
  sint64 int___unionField  = 1;
  bytes byte___unionField  = 2;
  double float___unionField  = 3;
  DecimalValue decimal___unionField  = 4;
  string string___unionField  = 5;
  bool nullField  = 6;
}
`<type>___` prefix added to avoid name collision in protobuf schema generation
  1. Union of multidimensional arrays
Ballerina Proto message
type UnionWithArray int[][]|float[]|string[][][];
message UnionBuilder {
  message int___ArrayBuilder_1 {
     repeated sint64 arrayField  = 1;
  }
  message string___ArrayBuilder_2 {
    message ArrayBuilder {
       repeated string arrayField  = 1;
    }
    repeated ArrayBuilder arrayField  = 1;
  }
  repeated int___ArrayBuilder_1 int___arrayField_2___unionField  = 1;
  repeated double float___arrayField_1___unionField  = 2;
  repeated string___ArrayBuilder_2 string___arrayField_3___unionField  = 3;
}
A (union) member array has the following name format for message field name and nested message name:
  • Field name format: `<type>___arrayField_<dimension>_unionField`
  • Nested message name format: `<type>_ArrayBuilder_<dimension>`
Here `<type>`, `<dimension>` used to avoid name collision in protobuf schema generation.
  1. Union of union-arrays
Ballerina Proto message
type IntOrString int|string;
type FloatOrNill float?;
type UnionArray IntOrString[]|FloatOrNill[];
message UnionBuilder {
  message IntOrString___UnionBuilder {
     sint64 int___unionField  = 1;
     string string___unionField  = 2;
  }
  message FloatOrNill___UnionBuilder {
     double float___unionField  = 1;
     bool nullField  = 2;
  }
  repeated IntOrString___UnionBuilder IntOrString___arrayField_1___unionField  = 1;
  repeated FloatOrNill___UnionBuilder FloatOrNill___arrayField_1___unionField  = 2;
}

Record

  1. Simple record with primitive types
Ballerina Proto message
type Employee record {
    string name;
    byte age;
    int weight;
    float height;
    boolean isMarried;
    decimal salary;
};
message Employee {
  message DecimalValue {
     uint32 scale  = 1;
     uint32 precision  = 2;
     bytes value  = 3;
  }
  string name  = 1;
  bytes age  = 2;
  sint64 weight  = 3;
  double height  = 4;
  bool isMarried  = 5;
  DecimalValue salary  = 6;
}
Proto message name and field names are the same as the ballerina record type name and field names.
  1. Record with arrays fields
Ballerina Proto message
type RecordWithSimpleArrays record {
    string[] stringArray;
    int[] intArray;
    float[] floatArray;
    boolean[] boolArray;
    byte[] byteArray;
};
message RecordWithSimpleArrays {
   repeated string stringArray  = 1;
   repeated sint64 intArray  = 2;
   repeated double floatArray  = 3;
   repeated bool boolArray  = 4;
   bytes byteArray  = 5;
}
type RecordWithMultidimentionalArrays record {
    string[][][] string3DArray;
    decimal[][] decimal2DArray;
};
message RecordWithMultidimentionalArrays {
  message decimal2DArray___ArrayBuilder {
    message DecimalValue {
       uint32 scale  = 1;
       uint32 precision  = 2;
       bytes value  = 3;
    }
    repeated DecimalValue arrayField  = 1;
  }
  message string___ArrayBuilder_3 {
    message ArrayBuilder {
       repeated string arrayField  = 1;
    }
    repeated ArrayBuilder arrayField  = 1;
  }
  repeated string3DArray___ArrayBuilder string3DArray  = 1;
  repeated decimal2DArray___ArrayBuilder decimal2DArray  = 2;
}
  1. Record with union fields
Ballerina Proto message
type RecordWithUnion record {
    int|string? data;
};
message RecordWithUnion {
  message data___UnionBuilder {
     bool nullField  = 1;
     sint64 int___unionField  = 2;
     string string___unionField  = 3;
  }
  data___UnionBuilder data  = 1;
}
Nested message names of union messages are prefixed with ballerina record field name to avoid name collision, generally the union message name follows the form of `<recordFieldName>__UnionBuilder`
  1. Record with cyclic references
Ballerina Proto message
type Node1 record {
    string name;
    Nested2? nested;
};
type Node2 record {
    string name;
    Nested3? nested;
};
type Node3 record {
    string name;
    Nested1? nested;
};
message Node1 {
  message nested___UnionBuilder {
    message Node2 {
      message nested___UnionBuilder {
        message Node3 {
          message nested___UnionBuilder {
             Nested1 Nested1___unionField  = 1;
             bool nullField  = 2;
          }
          string name  = 1;
          nested___UnionBuilder nested  = 2;
        }
        Nested3 Nested3___unionField  = 1;
        bool nullField  = 2;
      }
      string name  = 1;
      nested___UnionBuilder nested  = 2;
    }
    Nested2 Nested2___unionField  = 1;
    bool nullField  = 2;
  }
  string name  = 1;
  nested___UnionBuilder nested  = 2;
}

Map

  1. Map with primitive types
Ballerina Proto message
type MapInt map<int>;
message MapBuilder {
  message MapFieldEntry {
     string key  = 1;
     sint64 value  = 2;
  }
  repeated MapFieldEntry mapField  = 1;
}
type MapDecimal map<decimal>;
message MapBuilder {
  message MapFieldEntry {
    message DecimalValue {
       uint32 scale  = 1;
       uint32 precision  = 2;
       bytes value  = 3;
    }
    string key  = 1;
    DecimalValue value  = 2;
  }
  repeated MapFieldEntry mapField  = 1;
}
  1. Map with records
Ballerina Proto message
type Status record {
    int code;
    string message?;
};
type MapRecord map<Status>;
message MapBuilder {
  message MapFieldEntry {
    message Status {
       sint64 code  = 1;
       string message  = 2;
    }
    string key  = 1;
    Status value  = 2;
  }
  repeated MapFieldEntry mapField  = 1;
}
  1. Map with arrays
Ballerina Proto message
type IntMatrix int[][];
type MapArray map<IntMatrix>;
message MapBuilder {
  message MapFieldEntry {
    message ArrayBuilder {
       repeated sint64 arrayField  = 1;
    }
    string key  = 1;
    repeated ArrayBuilder value  = 2;
  }
  repeated MapFieldEntry mapField  = 1;
}
  1. Map with unions
Ballerina Proto message
type Status record {
    int code;
    string message?;
};
type IntMatrix int[][];
type MapUnion map<Status|IntMatrix>;
message MapBuilder {
  message MapFieldEntry {
    message value___UnionBuilder {
      message Status {
         sint64 code  = 1;
         string message  = 2;
      }
     message int___ArrayBuilder_1 {
        repeated sint64 arrayField  = 1;
      }
      Status Status___unionField  = 1;
      repeated int___ArrayBuilder_1 int___arrayField_2___unionField  = 2;
    }
    string key  = 1;
    value___UnionBuilder value  = 2;
  }
  repeated MapFieldEntry mapField  = 1;
}
  1. Map with maps
Ballerina Proto message
type Status record {
    int code;
    string message?;
};
type IntMatrix int[][];
type MapUnion map<Status|IntMatrix>;
type MapOfMaps map<MapUnion>;
message MapBuilder {
  message MapFieldEntry {
    message MapBuilder {
      message MapFieldEntry {
        message value___UnionBuilder {
          message Status {
             sint64 code  = 1;
             string message  = 2;
          }
         message int___ArrayBuilder_1 {
            repeated sint64 arrayField  = 1;
          }
          Status Status___unionField  = 1;
          repeated int___ArrayBuilder_1 int___arrayField_2___unionField  = 2;
        }
        string key  = 1;
        value___UnionBuilder value  = 2;
      }
      repeated MapFieldEntry mapField  = 1;
    }
    string key  = 1;
    MapBuilder value  = 2;
  }
  repeated MapFieldEntry mapField  = 1;
}

Table

  1. Table with Map constraint
Ballerina Proto message
type Score map;
type ScoreTable table<Score>;
message TableBuilder {
  message MapBuilder {
    message MapFieldEntry {
       string key  = 1;
       sint64 value  = 2;
    }
    repeated MapFieldEntry mapField  = 1;
  }
  repeated MapBuilder tableEntry  = 1;
}
2. Table with record constraint
Ballerina Proto message
type Row record {
    int id;
    string name;
};
type RecordTable table<Row>;
message TableBuilder {
  message Row {
     sint64 id  = 1;
     string name  = 2;
  }
  repeated Row tableEntry  = 1;
}

Tuple

  1. Tuple with primitive type elements
Ballerina Proto message
type PrimitiveTuple [byte, int, float, boolean, string ,decimal];
message TupleBuilder {
  message DecimalValue {
     uint32 scale  = 1;
     uint32 precision  = 2;
     bytes value  = 3;
  }
  bytes element_1  = 1;
  sint64 element_2  = 2;
  double element_3  = 3;
  bool element_4  = 4;
  string element_5  = 5;
  DecimalValue element_6  = 6;
}
  1. Tuple with Union elements
Ballerina Proto message
type TupleWithUnion [byte|string, decimal|boolean];
message TupleBuilder {
  message element_1___UnionBuilder {
     bytes byte___unionField  = 1;
     string string___unionField  = 2;
  }
  message element_2___UnionBuilder {
   message DecimalValue {
      uint32 scale  = 1;
      uint32 precision  = 2;
      bytes value  = 3;
    }
    bool boolean___unionField  = 1;
    DecimalValue decimal___unionField  = 2;
  }
  element_1___UnionBuilder element_1  = 1;
  element_2___UnionBuilder element_2  = 2;
}
  1. Tuple with array elements
Ballerina Proto message
type UnionTupleElement byte|string;
type TupleWithArray [string[], boolean[][], int[][][], UnionTupleElement[]];
message TupleBuilder {
  message int___ArrayBuilder_2 {
    message ArrayBuilder {
       repeated sint64 arrayField  = 1;
    }
    repeated ArrayBuilder arrayField  = 1;
  }
 message UnionTupleElement___UnionBuilder {
    bytes byte___unionField  = 1;
    string string___unionField  = 2;
  }
 message boolean___ArrayBuilder_1 {
    repeated bool arrayField  = 1;
  }
  repeated string element_1  = 1;
  repeated boolean___ArrayBuilder_1 element_2  = 2;
  repeated int___ArrayBuilder_2 element_3  = 3;
  repeated UnionTupleElement___UnionBuilder element_4  = 4;
}
  1. Tuple with record elements
Ballerina Proto message
type Student record {
    string name;
    int courseId;
    decimal fees;
};
type Teacher record {
    string name;
    int courseId;
    decimal salary;
};
type TupleWithRecord [Student, Teacher];
message TupleBuilder {
  message Teacher {
    message DecimalValue {
       uint32 scale  = 1;
       uint32 precision  = 2;
       bytes value  = 3;
    }
    sint64 courseId  = 1;
    string name  = 2;
    DecimalValue salary  = 3;
  }
  message Student {
    message DecimalValue {
      uint32 scale  = 1;
      uint32 precision  = 2;
      bytes value  = 3;
    }
    sint64 courseId  = 1;
    DecimalValue fees  = 2;
    string name  = 3;
  }
  Student element_1  = 1;
  Teacher element_2  = 2;
}
  1. Tuple with tuple elements
Ballerina Proto message
type PrimitiveTuple [byte, int, float, boolean, string ,decimal];
type TupleWithUnion [byte|string, decimal|boolean];
type TupleOfTuples [PrimitiveTuple, TupleWithUnion];
message TupleBuilder {
  message element_2___TupleBuilder {
    message element_1___UnionBuilder {
       bytes byte___unionField  = 1;
       string string___unionField  = 2;
    }
    message element_2___UnionBuilder {
     message DecimalValue {
        uint32 scale  = 1;
        uint32 precision  = 2;
        bytes value  = 3;
      }
      bool boolean___unionField  = 1;
      DecimalValue decimal___unionField  = 2;
    }
    element_1___UnionBuilder element_1  = 1;
    element_2___UnionBuilder element_2  = 2;
  }
  message element_1___TupleBuilder {
    message DecimalValue {
       uint32 scale  = 1;
       uint32 precision  = 2;
       bytes value  = 3;
    }
    bytes element_1  = 1;
    sint64 element_2  = 2;
    double element_3  = 3;
    bool element_4  = 4;
    string element_5  = 5;
    DecimalValue element_6  = 6;
  }
  element_1___TupleBuilder element_1  = 1;
  element_2___TupleBuilder element_2  = 2;
}

Enum

Ballerina enum is a syntactic sugar of union of constant strings thus enum is handled as union in protobuf level

Ballerina Proto message
enum Color {
    RED=”red”,
    GREEN,
    BLUE
}
message UnionBuilder {
   string string___unionField  = 1;
}
const OPEN = "open";
const CLOSE = "close";
type STATE OPEN|CLOSE;
message UnionBuilder {
   string string___unionField  = 1;
}

SerDes API

// serdes Error
public type Error distinct error; 

// Abstract object
public type Schema object {

  public isolated function serialize(anydata data) returns byte[]|Error;

  public isolated function deserialize(byte[] encodedMessage, typedesc<anydata> T = <>) returns T|Error;
}

Ex: Implementing Proto3 version of serialization & deserialization

public class Proto3Schema {
  *Schema;

   // Implement serialize(), deserialize() methods here

   public isolated function deserialize(byte[] encodedMessage,  typedesc<anydata> T = <>) returns T|Error = 
   @java:Method {
      'class: "io.ballerina.stdlib.serdes...."
   }  external;
}

Similar to the above Proto3Schema implementation other versions of serialization & deserialization can be implemented in future (Ex: AvroSchema)

Serialize

import ballerina/serdes;

type Person record {
   string name;
   string age;
}

public function main() returns error? {
  
   Person president = { name: "Joe",  age:70 };

   // This should be able to generate the required schema and then the classes
   serdes:Schema schema = check new Proto3Schema(Person);

   // Serialize president value into a byte array
   byte[] encoded = check schema.serialize(president);
}

Deserialize

import ballerina/serdes;

type Person record {
   string name;
   string age;
}

public function main() returns error {

    byte[] encoded = <bytes>;

    // This should be able to generate the required schema and then the classes
    serdes:Schema schema = check new Proto3Schema(Person);

    // deserialize president value into Person type
    Person president = check schema.deserialize(encoded);
}

Design Considerations

The design of the Standard Library SerDes package can be divided into following 2 phases,

  1. Implement dynamic schema & message generation using the proto3 Java API.

Dynamic Schema creation could be implemented as a separate Java package with the help of Protocol Buffers' Java API. Making this implementation independent of the rest of the code will make it easier to move between different serialization/deserialization technologies such as Avro in the future.

  1. Creating a mapping between Ballerina anydata types and Proto3 message.

A parser should be implemented to convert Ballerina anydata to proto3 message and vice versa. The parser should be able to map Ballerina anydata types to proto3 field types.

Alternatives

Thrift, Avro are some alternative serialization technologies but proto3 has the fastest performance. Comparison between Avro and Protocol buffer can be found here.

@MohamedSabthar MohamedSabthar added Type/Proposal module/serdes Issues related to the Ballerina serdes module labels Jun 1, 2022
@MohamedSabthar
Copy link
Member Author

Related Issue: #780

@shafreenAnfar
Copy link
Contributor

@MohamedSabthar Shall we update description section of the proposal with the mapping between proto schema and Ballerina types.

@shafreenAnfar shafreenAnfar added the Status/Active Proposals that are under review label Jul 11, 2022
@MohamedSabthar MohamedSabthar added Status/Implemented Implemented proposals and removed Status/Active Proposals that are under review labels Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module/serdes Issues related to the Ballerina serdes module Status/Implemented Implemented proposals Type/Proposal
Projects
None yet
Development

No branches or pull requests

2 participants