Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3479: [rust] Avro Schema Derive Proc Macro #1631

Merged
merged 34 commits into from
Apr 16, 2022

Conversation

jklamer
Copy link
Contributor

@jklamer jklamer commented Apr 2, 2022

Description

This PR is meant to start what should and will be a long review process. There is outstanding work to be done, but all the major features to ship with I believe are included.
Outstanding work:

  • Use schema equals assertions in tests module to harden types to help stop breaking changes accidentally
  • Extensive README.md and module documentation that looks appropriated in the docs.crates.io location
  • Confirm naming of companion trait
  • Restructure to #[avro(namespace="")] pattern
  • BONUS: added in doc in annotation support
  • Unit test inner functions used within Macro
  • Restructure the crate to be a feature on top of the normal crate as started in Add avro_derive project in the workspace #1579
New Features

The two traits defined within schema.rs

pub trait AvroSchema {
    fn get_schema() -> Schema;
}

pub trait AvroSchemaWithResolved {
    fn get_schema_with_resolved(resolved_schemas: &mut Names) -> Schema;
}

and a proc macro invoked as either

#[derive(AvroSchema)]
struct Test {}
// or
#[derive(AvroSchema)]
#[namespace = "com.testing.namespace"]
struct Test {}
Reasoning/Desires

The best would be to have fn get_schema() -> &'static Schema but I was unable to figure out how to do that without global state and this solution is easy to work with to avoid repeated calls to get_schema() which will create the same valid schema every time. The AvroSchemaWithResolved is a companion trait needed to help resolve cyclic schema dependencies and reuse of named types across the same struct. The derivation of AvroSchema actually derives an implementation for AvroSchemaWithResolved and is converted to a standalone implementation using a blanket implementation where the schema being returned is a root schema.

impl<T> AvroSchema for T
where
    T: AvroSchemaWithResolved,
{
    fn get_schema() -> Schema {
        T::get_schema_with_resolved(&mut HashMap::default())
    }
}
Desired user workflow

Anything that can be serialized the "The serde way" should be able to be serialized/deserialized without further configuration. Anything that can be #[derive(Serialize, Deserialize)] should also be able to add #[derive(Serialize, Deserialize, AvroSchema)]

Caveats

This means that we are not attacking special cases that make sense for Avro because they will not work when integrated with the serde code. Biggest example being Vec<u8> is not derived as Schema::Bytes because it is serialized as Schema::Array . This might be something we can tightly couple with serde attributes later.
Types that cannot be both serialized and deserialized accurately are currently not covered, char u32, u64 namely. Special exception for this rule to non static lifetimed references (They can be serialized and not deserialized).

Current Flow
use apache_avro::Schema;

let raw_schema = r#"
    {
        "type": "record",
        "name": "test",
        "fields": [
            {"name": "a", "type": "long", "default": 42},
            {"name": "b", "type": "string"}
        ]
    }

use apache_avro::Writer;

#[derive(Debug, Serialize)]
struct Test {
    a: i64,
    b: String,
}

// if the schema is not valid, this function will return an error
let schema = Schema::parse_str(raw_schema).unwrap();

let mut writer = Writer::new(&schema, Vec::new());
let test = Test {
    a: 27,
    b: "foo".to_owned(),
};
writer.append_ser(test).unwrap();
let encoded = writer.into_inner();
New Flow
use apache_avro::Writer;

#[derive(Debug, Serialize, AvroSchema)]
struct Test {
    a: i64,
    b: String,
}
// derived schema, always valid or code fails to compile with a descriptive message
let schema = Test::get_schema();

let mut writer = Writer::new(&schema, Vec::new());
let test = Test {
    a: 27,
    b: "foo".to_owned(),
};
writer.append_ser(test).unwrap();
let encoded = writer.into_inner();
crate import

To use this functionality it comes as an optional feature (modeled off serde)

cargo.toml

apache-avro = { version = "X.Y.Z", features = ["derive"] }

Jira

https://issues.apache.org/jira/browse/AVRO-3479

Tests

My PR includes many tests to intent to challenge the macro in common and uncommon use cases.

Documentation

  • WIP see above

@github-actions github-actions bot added the Rust label Apr 2, 2022
@jklamer jklamer changed the title [rust] Avro Schema Derive Proc Macro [rust] Avro Schema Derive Proc Macro [WIP] Apr 2, 2022
@jklamer
Copy link
Contributor Author

jklamer commented Apr 5, 2022

Cc @martin-g

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g martin-g changed the title [rust] Avro Schema Derive Proc Macro [WIP] AVRO-3479: [rust] Avro Schema Derive Proc Macro [WIP] Apr 8, 2022
Add TODOs

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
lang/rust/avro/src/schema.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/Cargo.toml Outdated Show resolved Hide resolved
lang/rust/avro_derive/src/lib.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/src/lib.rs Show resolved Hide resolved
lang/rust/avro_derive/src/lib.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/src/lib.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/tests/derive.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/tests/derive.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/tests/derive.rs Outdated Show resolved Hide resolved
lang/rust/avro_derive/tests/derive.rs Outdated Show resolved Hide resolved
jklamer and others added 5 commits April 9, 2022 00:25
Fix typos.
Format the code/doc.
Apply suggestions by the IDE to use assert_eq!() instead of assert!()

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Member

@jklamer While reading about procedure macros I found https://crates.io/crates/darling. I have the feeling it may simplify our derive functionality.

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Member

While reading about procedure macros I found https://crates.io/crates/darling. I have the feeling it may simplify our derive functionality.

I've just pushed ff75150 - it uses Darling to parse the derive attributes.
I didn't manage to pass the avro_attrs.doc to the quote!() that generates the TokenStream for the Schema :-/ doc: #doc didn't work.

Feel free to revert this commit if you don't like it for any reason!

@jklamer
Copy link
Contributor Author

jklamer commented Apr 12, 2022

@martin-g darling is exactly what I was thinking there should be when I was making it! Thank you for getting it set up! Refactored it a bit to break it up into NamedTypes and their schema options , and fields and their schema options! So we get docs on fields and types out of the box and super easily. Check it out:

#[derive(Debug, Serialize, Deserialize, AvroSchema, Clone, PartialEq)]
    #[avro(namespace = "com.testing.namespace", doc = "A Documented Record")]
    struct TestBasicWithAttributes {
        #[avro(doc = "Documented Field")]
        a: i32,
        b: String,
    }

Also I was able to solve the doc issue where ToTokens on Option was flattening away None by manually preserving it in the output tokens:

fn preserve_optional(op: Option<impl quote::ToTokens>) -> TokenStream {
    match op {
        Some(tt) => quote! {Some(#tt.to_owned())},
        None => quote! {None},
    }
}

@martin-g
Copy link
Member

According to https://docs.rs/syn/latest/syn/struct.Attribute.html the Rust doc comments are attributes! So probably there is no need to use the DeriveInput at all !

@jklamer jklamer changed the title AVRO-3479: [rust] Avro Schema Derive Proc Macro [WIP] AVRO-3479: [rust] Avro Schema Derive Proc Macro Apr 15, 2022
@jklamer
Copy link
Contributor Author

jklamer commented Apr 15, 2022

Call this officially ready*! Except for the one outstanding issue: I can't get the cargo package to recognize the avro_derive dependency that is local:
running it results in the error

   Packaging apache-avro v0.14.0
    Updating crates.io index
error: failed to prepare local package for uploading

Caused by:
  no matching package named `avro_derive` found
  location searched: registry `crates-io`
  required by package `apache-avro v0.14.0`

*for final reviews and revisions

warning: unresolved link to `AvroSchemaComponent`
    --> avro/src/schema.rs:1524:70
     |
1524 | /// through `derive` feature. Do not implement directly, implement [`AvroSchemaComponent`]
     |                                                                      ^^^^^^^^^^^^^^^^^^^ no item named `AvroSchemaComponent` in scope
     |
     = note: `#[warn(rustdoc::broken_intra_doc_links)]` on by default
     = help: to escape `[` and `]` characters, add '\' before them like `\[` or `\]`

warning: `apache-avro` (lib doc) generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 10.13s

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Member

Yes, packaging is not easy! :-)
I just tried strum project and it fails the same way.

The way https://github.com/TedDriggs/darling is structured supports it though!
Here the main project is both a lib (it has ./src/lib.rs) and it is also a workspace with a -core and -macro subprojects.

For consistency.
Add Cargo.toml metadata

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g
Copy link
Member

I think the package build of Darling works just because 0.14.0 is at crates.io...

For some reason Rustdoc build sometimes (not always!) complain that the
import of std::sync::Mutex is not used ...

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
martin-g and others added 3 commits April 16, 2022 23:22
Validate successfully Value::Int into Schema::Long

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
@martin-g martin-g merged commit cbc4372 into apache:master Apr 16, 2022
@martin-g
Copy link
Member

Great work, @jklamer !
Thank you a lot for this contribution!

martin-g pushed a commit that referenced this pull request Apr 16, 2022
* port crate

* namespace port

* dev depends

* resolved against main

* Cons list tests

* rebased onto master resolution

* namespace attribute in derive

* std pointers

* References, testing, and refactoring

* [AVRO-3479] Clean up for PR

* AVRO-3479: Add missing ASL2 headers

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Minor improvements

Add TODOs

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* Schema assertions and PR comments

* test failure fixing

* add readme

* README + implementation guide + bug fix with enclosing namespaces

* AVRO-3479: Minor improvements

Fix typos.
Format the code/doc.
Apply suggestions by the IDE to use assert_eq!() instead of assert!()

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Fix typos

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Use darling crate to parse derive attributes

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* darling for NamedTypes and fields

* AVRO-3479 pr review naming

* AVRO-3479 doc comment doc and small tests

* AVRO-3479 featurize

* AVRO-3479 cargo engineering

* Fix a docu warning:

warning: unresolved link to `AvroSchemaComponent`
    --> avro/src/schema.rs:1524:70
     |
1524 | /// through `derive` feature. Do not implement directly, implement [`AvroSchemaComponent`]
     |                                                                      ^^^^^^^^^^^^^^^^^^^ no item named `AvroSchemaComponent` in scope
     |
     = note: `#[warn(rustdoc::broken_intra_doc_links)]` on by default
     = help: to escape `[` and `]` characters, add '\' before them like `\[` or `\]`

warning: `apache-avro` (lib doc) generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 10.13s

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Rename avro_derive to apache-avro-derive

For consistency.
Add Cargo.toml metadata

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Use fqn for Mutex

For some reason Rustdoc build sometimes (not always!) complain that the
import of std::sync::Mutex is not used ...

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Update darling to 0.14.0

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Fix the version of apache-avro-derive

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Minor cleanups

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Inline a pub function that is used only in avro_derive

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Derive Schema::Long for u32

Validate successfully Value::Int into Schema::Long

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

* AVRO-3479: Bump dependencies to their latest versions

Signed-off-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>

Co-authored-by: Martin Tzvetanov Grigorov <mgrigorov@apache.org>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
(cherry picked from commit cbc4372)
@jklamer
Copy link
Contributor Author

jklamer commented Apr 17, 2022

Wow! Thank you so much for the help and merge @martin-g! Very much enjoyed this.

As far as packaging is concerned, I couldn't get the serde project to package either, so I'm assuming they package and release their crates in order of dependency so that they next package can use it?
So for us that could look like,
Steps:

  1. package/release avro_derive v 0.14.0 (or any version)
  2. package/release avro v 0.14.0 (or any version)

Would that work? / Anywhere I need to add that process to release scripts?

@jklamer jklamer deleted the jklamer/avro-derive branch April 17, 2022 20:46
@martin-g
Copy link
Member

Would that work?

Yes, I think this is how it should be done!

Anywhere I need to add that process to release scripts?

We have a PR draft with automated release steps at #1570.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants