Skip to content

Derive macro to generate database (e.g. Firehose) schema from Rust struct. WIP

License

Notifications You must be signed in to change notification settings

DoumanAsh/shema

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shema

Rust Crates.io Documentation

Derive macro to generate database schema code from Rust struct

All parameters are specified via shema

Struct parameters

  • firehose_schema - Enables firehose schema generation
  • firehose_partition_code - Enables code generation to access partition information
  • firehose_parquet_schema - Enables parquet schema generation similar to AWS Glue's one
  • parquet_code - Specifies to generate parquet code to write struct per schema. This requires parquet and serde_json crates to be added as dependencies

Field parameters

  • json - Specifies that field is to be encoded as json object (automatically derived for std's collections)
  • enumeration - Specifies that field is to be encoded as enumeration (Depending on database, it will be encoded as string or object)
  • index - Specifies that field is to be indexed by underlying database engine (e.g. to be declared a partition key in AWS glue schema)
  • firehose_date_index - Specifies field to be used as timestamp within firehose schema which will produce year, month and day fields. Requires to be of timestamp type. E.g. time::OffsetDateTime
  • rename - Tells to use different name for the field. Argument MUST be string specified as rename = "new_name"

Firehose date index

If specified firehose output will expect RFC3339 encoded string as output during serialization

You should configure HIVE json deserializer with possible RFC3339 formats.

Terraform Reference: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kinesis_firehose_delivery_stream#timestamp_formats-1

Schema output

Following constants will be declared for affected structs:

  • SHEMA_TABLE_NAME - table name in lower case
  • SHEMA_FIREHOSE_SCHEMA - Firehose glue table schema. If enabled.
  • SHEMA_FIREHOSE_PARQUET_SCHEMA - Partquet schema compatible with firehose data stream. If enabled.

Following methods will be defined for affected structs

  • shema_firehose_partition_keys_ref - Returns tuple with references to partition keys
  • shema_firehose_partition_keys - Returns tuple with owned values of partition keys
  • shema_firehose_s3_path_prefix - Returns fmt::Display type that writes full path prefix for S3 destination object
  • shema_is_firehose_s3_path_prefix_valid - Returns true if shema_firehose_s3_path_prefix is valid or not (i.e. no string is empty among partitions)

Following parquet crate traits are implemented:

Firehose specifics

Firehose schema expects flat structure, so any complex struct or array must be serialized as strings

mod prost_wkt_types {
    pub struct Struct;
}

use std::fs;
use shema::Shema;

#[derive(Shema)]
#[shema(firehose_schema, firehose_parquet_schema, firehose_partition_code)]
pub(crate) struct Analytics<'a> {
    #[shema(index, firehose_date_index)]
    ///Special field that will be transformed in firehose as year,month,day
    r#client_time: time::OffsetDateTime,
    r#server_time: time::OffsetDateTime,
    r#user_id: Option<String>,
    #[shema(index)]
    ///Index key will go into firehose's partition_keys
    r#client_id: String,
    #[shema(index)]
    r#session_id: String,
    #[shema(json)]
    r#extras: Option<prost_wkt_types::Struct>,
    #[shema(json)]
    r#props: prost_wkt_types::Struct,
    r#name: String,

    byte: i8,
    short: i16,
    int: i32,
    long: i64,
    ptr: isize,

    float: f32,
    double: f64,
    boolean: bool,
    #[shema(rename = "stroka")]
    strka: &'a str,

    array: Vec<String>,
}

assert_eq!(Analytics::SHEMA_TABLE_NAME, "analytics");

About

Derive macro to generate database (e.g. Firehose) schema from Rust struct. WIP

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages