Derive macro to generate database schema code from Rust struct
All parameters are specified via shema
firehose_schema- Enables firehose schema generationfirehose_partition_code- Enables code generation to access partition informationfirehose_parquet_schema- Enables parquet schema generation similar to AWS Glue's oneparquet_code- Specifies to generate parquet code to write struct per schema. This requiresparquetandserde_jsoncrates to be added as dependencies
json- Specifies that field is to be encoded as json object (automatically derived for std's collections)enumeration- Specifies that field is to be encoded as enumeration (Depending on database, it will be encoded as string or object)index- Specifies that field is to be indexed by underlying database engine (e.g. to be declared a partition key in AWS glue schema)firehose_date_index- Specifies field to be used as timestamp withinfirehoseschema which will produceyear,monthanddayfields. Requires to be oftimestamptype. E.g. time::OffsetDateTimerename- Tells to use different name for the field. Argument MUST be string specified asrename = "new_name"
If specified firehose output will expect RFC3339 encoded string as output during serialization
You should configure HIVE json deserializer with possible RFC3339 formats.
Terraform Reference: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kinesis_firehose_delivery_stream#timestamp_formats-1
SHEMA_TABLE_NAME- table name in lower caseSHEMA_FIREHOSE_SCHEMA- Firehose glue table schema. If enabled.SHEMA_FIREHOSE_PARQUET_SCHEMA- Partquet schema compatible with firehose data stream. If enabled.
shema_firehose_partition_keys_ref- Returns tuple with references to partition keysshema_firehose_partition_keys- Returns tuple with owned values of partition keysshema_firehose_s3_path_prefix- Returnsfmt::Displaytype that writes full path prefix for S3 destination objectshema_is_firehose_s3_path_prefix_valid- Returnstrueifshema_firehose_s3_path_prefixis valid or not (i.e. no string is empty among partitions)
Following parquet crate traits are implemented:
- RecordWriter - Enables write via SerializedFileWriter
Firehose schema expects flat structure, so any complex struct or array must be serialized as strings
mod prost_wkt_types {
pub struct Struct;
}
use std::fs;
use shema::Shema;
#[derive(Shema)]
#[shema(firehose_schema, firehose_parquet_schema, firehose_partition_code)]
pub(crate) struct Analytics<'a> {
#[shema(index, firehose_date_index)]
///Special field that will be transformed in firehose as year,month,day
r#client_time: time::OffsetDateTime,
r#server_time: time::OffsetDateTime,
r#user_id: Option<String>,
#[shema(index)]
///Index key will go into firehose's partition_keys
r#client_id: String,
#[shema(index)]
r#session_id: String,
#[shema(json)]
r#extras: Option<prost_wkt_types::Struct>,
#[shema(json)]
r#props: prost_wkt_types::Struct,
r#name: String,
byte: i8,
short: i16,
int: i32,
long: i64,
ptr: isize,
float: f32,
double: f64,
boolean: bool,
#[shema(rename = "stroka")]
strka: &'a str,
array: Vec<String>,
}
assert_eq!(Analytics::SHEMA_TABLE_NAME, "analytics");