Skip to content

Commit

Permalink
Updating documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
rooooooooob committed Jul 20, 2021
1 parent 4ad79a6 commit 4d1de9e
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 3 deletions.
48 changes: 48 additions & 0 deletions GENERATING_SERIALIZATION_LIB.md
@@ -0,0 +1,48 @@
# Generating/updating cardano-serialization-lib using this library

We generated the bulk of the base CBOR struct code for our [cardano-serialization-lib](https://github.com/Emurgo/cardano-serialization-lib/). However, there are things that are not supported that IOHK have used, and some things are supported but might need editing post-generation. This document is useful mostly for people maintaining `cardano-serialization-lib`, but parts can be useful for anyone who wants to use this codegen tool on more complex CDDL types.



## pre-processing


Remember that we don't (yet) support multi-file CDDL so everything will have to be inlined into one file. The order technically doesn't matter as we do multi-phase parsing + code generation.


### Specific to IOHK's CDDL

Before we can generate using for example `alonzo.cddl` we must realize that this is not the complete CDDL. Inside of that directory's `mock/` we can find `crypto.cddl` and `extras.cddl` which contain important types. The crypto ones are partially incorrect and are just mocked crypto testing sizes, so care must be taken to ensure that they are of the appropriate size. We mostly do not directly generate these types but instead have some macros inside of the serialization lib's `crypto.rs` + `util.rs` to implement some of them for us. Some other types such as `positive_interval = #6.30([1, 2])` are purely mocked out as I believe IOHK tests against/using the CDDL directly and this made it easier for them.

### General tips

As noted in the readme, not every aspect of CDDL is fully supported by this codegen tool. To work around this it might be necessary to edit the CDDL beforehand. You can define these complex types as something simple like `foo == [0]` and it will generate that base struct which you can then fill in the complex details later. If it is just a single field that is not supported, changing just that one and hand-writing it later is a good approach. Before generics were implemented we used to just inline the types directly into the generic implementation. This could still be done for more complex generics as we only support the normal cases.


### Integer precision

CDDL gives us the `uint` type which can be up to 64 bits, but especially for wasm/js interop, this isn't very easy to use. We provide our own rust-specific types: `u32`, `i32`, `u64`, `i64` when `USE_EXTENDED_PRELUDE` is enabled to have more control over this. This can be a problem as `u64` for example converts to `bigint` in JS when built via `wasm_bindgen`, but your environment might not support this yet. Note that the structs will fail to deserialize if an integer out of these smaller bounds is encountered even if it's valid according to the CDDL, as it won't fit in the `u32` or whichever you selected.



## post-processing

### Deserialization not supported?

Some types such as array-represented records with optional types do not support deserialization but will still generate the rest of the code, including serialization. It might be useful to generate the cddl of these specific types with the optional fields removed so that the library can output a deserialize trait that can help you work to implement the rest. This can be non-trivial, especially if there are multiple optional types, or there is some overalap of types for example the type `foo = [uint, uint?, uint, text? text]` would be non-trivial as we can't just check the next type, as if upon reading the 2nd `uint` we can't know if it was meant for the 2nd or 3rd field until we've parsed more. We also can't always rely on a length check since that is harder for indefinite encoded types causing us to need to read more to figure that out, as well as the case where there are multiple optional types. This non-triviality is precisely why a 100% general solution was not implemented in cddl-codegen.

### Extra checks?

While the tool now supports the `.size` specifier, i.e. `.size (0..32)`, we don't support other modifiers such as `n*m`, `*`, `+`, etc, so these checks will need to be hand-written after-the-fact. Regular `.size` ones are now done for you.

### Constant enums?

cddl-codegen can generate arbitrary type choices, which encompass enums, but this generality leads to extra code generation when we just need a simple constant-valued enumeration. In these cases you might get both a `FooEnum` and `FooKind` from `foo = 0 / 1 / 2` where only one would do the job. This will hopefully be implemented in the tool as a special case, but in the meantime just get rid of `FooKind` and try and edit the code so that you can use `FooEnum` for both, which requires a bit of editing. For a reference see `NetworkId` in `cardano-serialization-lib`. Single-value enums are also ambiguous as is the case for `language = 0 ; Plutus v1` which will generate a function like `pub fn language() -> u32 { return 0; }` whereas what we wanted was an enum with 1 value. To generate that just do `language = 0 / 1` then remove the other variant by hand later, or cddl-codegen will see it as a single isolated constant.

### wasm_bindgen issues?

Type aliases for primitives can potentially lead to the generator to generate `&T` instead of `T` for a parameter even when `T` is a primitive. We should investigate this at some point if it's not fixed by now. `wasm_bindgen` also does not properly support `Option<&T>` so if you have a field like `foo = { ? 0 : &T }` then you will likely want to rewrite the accessors/ctor for that field, or change it to a `Option<T>` and do some cloning. It's unfortunate but until `wasm_bindgen` is improved, there will be a lot of needless cloning to make a memory-safe library.

### to_bytes() / from_bytes()

This is only specific to `cardano-serialization-lib` - We don't use those `ToBytes`/`FromBytes` traits from this codegen tool as we need to provide good errors to both JS consumers of wasm builds, as well as people using the rust code (e.g. mobile bindings). Instead we have a `to_from_bytes!()` macro which we call on every type we wish to have those functions which auto-converts to appropriate error types.
8 changes: 5 additions & 3 deletions README.md
Expand Up @@ -12,7 +12,7 @@ To run, execute `cargo run` from within this directory, and it will read `input.

Generates a `/export/` folder with wasm-compilable rust code (including Cargo.toml, etc) which can then be compiled with `wasm-pack build`.
The `lib.rs` contains all wasm-exposable code that clients of the generated code can use, and `serialization.rs` contians internal implementations for serialization/deserialization.
All structs have a `new(...)` constructor as well as a `to_bytes()`, and all supported ones have a `from_bytes()` exposed within their `lib.rs` impls that call these which (de)serialize to/from byte buffers the CBOR structure.
All structs have a `new(...)` constructor as well as a `to_bytes()` (with `GENERATE_TO_FROM_BYTES` enabled), and all supported ones have a `from_bytes()` exposed within their `lib.rs` impls that call these which (de)serialize to/from byte buffers the CBOR structure.
The constructor will contain all mandatory fields as arguments, whereas optional parameters will have a `set_*` function generated.
There is also a `prelude.rs` for helper code used by both (errors, traits, etc).

Expand All @@ -31,14 +31,17 @@ There is also a `prelude.rs` for helper code used by both (errors, traits, etc).
* Type choices - `foo = uint / tstr`
* Serialization for all supported types.
* Deserialization for almost all supported types (see limitations section).
* CDDL Generics - `foo<T> = [T]`, `bar = foo<uint>`
* Length bounds - `foo = bytes .size (0..32)`
* Support for the CDDL standard prelude (using raw CDDL from the RFC) - `biguint`, etc

We generate getters for all fields, and setters for optional fields. Mandatory fields are set via the generated constructor. All wasm-facing functions are set to take references for non-primitives and clone when needed. Returns are also cloned. This helps make usage from wasm more memory safe.

Identifiers and fields are also changed to rust style. ie `foo_bar = { Field-Name: text }` gets converted into `struct FooBar { field_name: String }`

There are several arguments that are set at the top of `main.rs` to configure code generation:
* `ANNOTATE_FIELDS` - Annotates errors with locational context if set. On by default.
* `BINARY_WRAPPERS` - When we encounter a type that is an alias or transitively an alias for binary bytes, we create a wrapper type for it, as in some use cases those should not be mixed and are crypto keys, hashes, and so on. Otherwise generates a type alias. On by default.
* `USE_EXTENDED_PRELUDE` - Whether to use our extended prelude (`u64`, `i32`, etc)
* `GENERATE_TO_FROM_BYTES` - Generates `to_bytes()` and `from_bytes()` usable from wasm in addition to the `Serialize` and `Deserialize` traits. Off by default.

#### Heterogeneous Arrays
Expand Down Expand Up @@ -69,7 +72,6 @@ Any field that is `T / null` is transformed as a special case into `Option<T>` r
* No support for sockets
* No inlined heterogeneous maps as fields - `foo = ( x: { y: uint, z: uint } )`, but is fine for `bar = { y: uint, z: uint }` then `foo = ( x: bar )`.
* No inlined heterogeneous arrays as fields - `foo: [uint]` is fine but `foo: [uint, tstr]` is not.
* CDDL generics not supported - just edit the cddl to inline it yourself for now.
* Keys in struct-type maps are limited to `uint` and text. Other types are not found anywhere in `shelley.cddl`.
* Optional fixed-value fields not properly supported - `(? foo: 5)`
* Deserialization not supported for maps with nested plain groups - `foo = (uint, uint), bar = { foo, text }` due to maps not being ordered.
Expand Down

0 comments on commit 4d1de9e

Please sign in to comment.