ARROW-5123: [Rust] Parquet derive for simple structs #4140

xrl · 2019-04-11T01:03:08Z

A rebase and significant rewrite of sunchao/parquet-rs#197

Big improvement: I now use a more natural nested enum style, it helps break out what patterns of data types are . The rest of the broad strokes still apply.

Goal

Writing many columns to a file is a chore. If you can put your values in to a struct which mirrors the schema of your file, this derive(ParquetRecordWriter) will write out all the fields, in the order in which they are defined, to a row_group.

How to Use

extern crate parquet;
#[macro_use] extern crate parquet_derive;

#[derive(ParquetRecordWriter)]
struct ACompleteRecord<'a> {
  pub a_bool: bool,
  pub a_str: &'a str,
}

RecordWriter trait

This is the new trait which parquet_derive will implement for your structs.

use super::RowGroupWriter;

pub trait RecordWriter<T> {
  fn write_to_row_group(&self, row_group_writer: &mut Box<RowGroupWriter>);
}

How does it work?

The parquet_derive crate adds code generating functionality to the rust compiler. The code generation takes rust syntax and emits additional syntax. This macro expansion works on rust 1.15+ stable. This is a dynamic plugin, loaded by the machinery in cargo. Users don't have to do any special build.rs steps or anything like that, it's automatic by including parquet_derive in their project. The parquet_derive/src/Cargo.toml has a section saying as much:

[lib]
proc-macro = true

The rust struct tagged with #[derive(ParquetRecordWriter)] is provided to the parquet_record_writer function in parquet_derive/src/lib.rs. The syn crate parses the struct from a string-representation to a AST (a recursive enum value). The AST contains all the values I care about when generating a RecordWriter impl:

the name of the struct
the lifetime variables of the struct
the fields of the struct

The fields of the struct are translated from AST to a flat FieldInfo struct. It has the bits I care about for writing a column: field_name, field_lifetime, field_type, is_option, column_writer_variant.

The code then does the equivalent of templating to build the RecordWriter implementation. The templating functionality is provided by the quote crate. At a high-level the template for RecordWriter looks like:

impl RecordWriter for $struct_name {
  fn write_row_group(..) {
    $({
      $column_writer_snippet
    })
  } 
}

this template is then added under the struct definition, ending up something like:

struct MyStruct {
}
impl RecordWriter for MyStruct {
  fn write_row_group(..) {
    {
       write_col_1();
    };
   {
       write_col_2();
   }
  }
}

and finally THIS is the code passed to rustc. It's just code now, fully expanded and standalone. If a user ever changes their struct MyValue definition the ParquetRecordWriter will be regenerated. There's no intermediate values to version control or worry about.

Viewing the Derived Code

To see the generated code before it's compiled, one very useful bit is to install cargo expand more info on gh, then you can do:

$WORK_DIR/parquet-rs/parquet_derive_test
cargo expand --lib > ../temp.rs

then you can dump the contents:

struct DumbRecord {
    pub a_bool: bool,
    pub a2_bool: bool,
}
impl RecordWriter<DumbRecord> for &[DumbRecord] {
    fn write_to_row_group(
        &self,
        row_group_writer: &mut Box<parquet::file::writer::RowGroupWriter>,
    ) {
        let mut row_group_writer = row_group_writer;
        {
            let vals: Vec<bool> = self.iter().map(|x| x.a_bool).collect();
            let mut column_writer = row_group_writer.next_column().unwrap().unwrap();
            if let parquet::column::writer::ColumnWriter::BoolColumnWriter(ref mut typed) =
                column_writer
            {
                typed.write_batch(&vals[..], None, None).unwrap();
            }
            row_group_writer.close_column(column_writer).unwrap();
        };
        {
            let vals: Vec<bool> = self.iter().map(|x| x.a2_bool).collect();
            let mut column_writer = row_group_writer.next_column().unwrap().unwrap();
            if let parquet::column::writer::ColumnWriter::BoolColumnWriter(ref mut typed) =
                column_writer
            {
                typed.write_batch(&vals[..], None, None).unwrap();
            }
            row_group_writer.close_column(column_writer).unwrap();
        }
    }
}

now I need to write out all the combinations of types we support and make sure it writes out data.

Procedural Macros

The parquet_derive crate can ONLY export the derivation functionality. No traits, nothing else. The derive crate can not host test cases. It's kind of like a "dummy" crate which is only used by the compiler, never the code.

The parent crate cannot use the derivation functionality, which is important because it means test code cannot be in the parent crate. This forces us to have a third crate, parquet_derive_test.

I'm open to being wrong on any one of these finer points. I had to bang on this for a while to get it to compile!

Potentials For Better Design

Recursion could be limited by generating the code as "snippets" instead of one big quote! AST generator. Or so I think. It might be nicer to push generating each columns writing code to another loop.
It would be nicer if I didn't have to be so picky about data going in to the write_batch function. Is it possible we could make a version of the function which accept Into<DataType> or similar? This would greatly simplify this derivation code as it would not need to enumerate all the supported types. Something like write_generic_batch(&[impl Into<DataType>]) would be neat. (not tackling in this generation of the plugin)
Another idea to improving writing columns, could we have a write function for Iterators? I already have a Vec<DumbRecord>, if I could just write a mapping for accessing the one value, we could skip the whole intermediate vec for write_batch. Should have some significant memory advantages. (not tackling in this generation of the plugin, it's a bigger parquet-rs enhancement)
~~It might be worthwhile to derive a parquet schema directly from a struct definition. That should stamp out opportunities for type errors.~~ (moved to ARROW-367: converter json <=> Arrow file format for Integration tests #203)

Status

I have successfully integrated this work with my own data exporter (takes postgres/couchdb and outputs a single parquet file).

I think this code is worth including in the project, with the caveat that it only generates simplistic RecordWriters. As people start to use we can add code generation for more complex, nested structs. We can convert the nested matching style to a fancier looping style. But for now, this explicit nesting is easier to debug and understand (to me at least!).

xrl · 2019-04-11T01:51:15Z

@sadikovi @sunchao I hope we can pick up where we left off on the last PR 😆

codecov-io · 2019-04-11T02:13:27Z

Codecov Report

Merging #4140 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4140      +/-   ##
==========================================
- Coverage   87.78%   87.74%   -0.05%     
==========================================
  Files         758      758              
  Lines       92394    92193     -201     
  Branches     1251     1251              
==========================================
- Hits        81112    80893     -219     
- Misses      11165    11179      +14     
- Partials      117      121       +4

Impacted Files	Coverage Δ
go/arrow/math/int64_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/memory/memory_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/math/float64_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/math/uint64_avx2_amd64.go	`0% <0%> (-100%)`	⬇️
go/arrow/memory/memory_amd64.go	`28.57% <0%> (-14.29%)`	⬇️
go/arrow/math/math_amd64.go	`31.57% <0%> (-5.27%)`	⬇️
js/src/ipc/metadata/json.ts	`92.39% <0%> (-4.35%)`	⬇️
cpp/src/arrow/csv/column-builder.cc	`95.32% <0%> (-1.76%)`	⬇️
cpp/src/parquet/arrow/reader.cc	`84.15% <0%> (-1.48%)`	⬇️
cpp/src/plasma/thirdparty/ae/ae.c	`70.75% <0%> (-0.95%)`	⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d333f4...45348e7. Read the comment docs.

sunchao · 2019-04-11T03:00:15Z

Thanks for the PR @xrl ! will take a look at it soon.

xrl · 2019-04-12T23:16:48Z

I'm now adding support for casting NaiveDateTime to a i64 to support TIMESTAMP_MILLIS. This is a feature that could use some design work, or feedback. I think I can support other timestamp types too but NaiveDateTime is the "most accurate" because it assumes UTC which I think is the most compatible with how parquet treats timestamps?

In any case, to be clear, the follow now automatically works:

#[derive(ParquetRecordWriter)]
struct MyStruct {
  timestamp: NaiveDateTime
}

are there are other logical types that would be useful in this preliminary release? Timestamps scratch my itch since I'm translating records from postgres over to parquet and our app uses a lot of timezone-free timestamps.

sunchao

Thanks @xrl ! I have some comments on the PR. Still reading it so may have more coming. :)

rust/parquet_derive/README.md

rust/parquet_derive/Cargo.toml

rust/parquet_derive/src/lib.rs

rust/parquet_derive/src/parquet_field.rs

sunchao · 2019-04-15T06:17:20Z

rust/parquet_derive/src/parquet_field.rs

+                syn::PathArguments::AngleBracketed(angle_args) => {
+                    let mut gen_args_iter = angle_args.args.iter();
+                    let first_arg = gen_args_iter.next().unwrap();
+                    assert!(gen_args_iter.next().is_none());


Can these be simplified to?

let first_arg = &angle_args.args[0];

I want to make sure there is only one generic argument to the type, it's just an assertion for preconditions so the code doesn't mysteriously break further in the code generation. We don't support things like Map<&str, bool>, for example. Perhaps it should be a different kind of error? This is the only assert! in the code so I think I should go to the more standard unimplemented!()

In that case you can use this:

assert!(angle_args.args.len() == 1); let first_arg = &angle_args.args[0];

It's better than using iterator.

Good point.

sadikovi

Great work! As long as this has a sufficient documentation on how to use this feature, it should be good to merge.

rust/parquet_derive_test/src/lib.rs

rust/parquet_derive_test/Cargo.toml

dev/release/00-prepare.sh

rust/parquet_derive/README.md

sunchao

Thanks @xrl . Some more comments.

rust/parquet_derive/src/lib.rs

rust/parquet_derive/src/parquet_field.rs

rust/parquet_derive_test/src/lib.rs

emkornfield · 2019-10-16T06:01:55Z

@xrl @sunchao is this change still relevant, it doesn't look like there has been any update in 6 months?

xrl · 2019-10-19T17:09:18Z

@emkornfield yes, this is still relevant. I need to address some final points and make sure this still compiles with the latest parquet-rs. I have been using this in production without a hitch and it's time to push the work over the finish line.

xrl · 2019-10-24T02:05:44Z

@sunchao can you take another look at this PR? :)

xrl · 2019-11-29T16:37:14Z

@sunchao ping :)

sunchao · 2019-12-02T18:40:49Z

Sorry @xrl - didn't see your comment earlier. Will take a look this week.

sunchao

Thanks @xrl for continuing work on this, and sorry for the late review. I think this overall looks pretty good! I just have some cosmetic comments plus one suggestion on the API. Let me know what you think. :)

sunchao · 2019-12-20T18:19:18Z

rust/parquet_derive/Cargo.toml

+
+[package]
+name = "parquet_derive"
+version = "0.13.0"


nit: this version is out of date.

sunchao · 2019-12-20T18:20:52Z

rust/parquet_derive_test/Cargo.toml

+
+[package]
+name = "parquet_derive_test"
+version = "0.13.0"


same - the version is out of date. we need to keep it the same as the Arrow version.

sunchao · 2019-12-20T18:24:20Z

rust/parquet_derive/src/lib.rs

+
+    (quote! {
+    impl#generics RecordWriter<#derived_for#generics> for &[#derived_for#generics] {
+      fn write_to_row_group(&self, row_group_writer: &mut Box<parquet::file::writer::RowGroupWriter>) -> Result<(), parquet::errors::ParquetError> {


nit: long line.

sunchao · 2019-12-20T18:55:20Z

rust/parquet/src/record/record_writer.rs

+use super::super::file::writer::RowGroupWriter;
+
+pub trait RecordWriter<T> {
+    fn write_to_row_group(


Thinking whether we can have a higher API, so that users do not need to directly manipulate row groups. Instead, could we pass a file writer, like the following?

fn write(&self, file: &mut dyn FileWriter) -> Result<()>;

this internally will write a row group for each call.

sunchao · 2019-12-20T18:55:48Z

rust/parquet/src/record/record_writer.rs

+    fn write_to_row_group(
+        &self,
+        row_group_writer: &mut Box<RowGroupWriter>,
+    ) -> Result<(), ParquetError>;


nit: we can import and use the result type from parquet_rs so this can just be Result<()>.

sunchao · 2019-12-20T18:58:06Z

rust/parquet_derive/src/parquet_field.rs

+        let when = Field::from(&fields[0]);
+        assert_eq!(when.writer_snippet().to_string(),(quote!{
+            {
+                let vals : Vec<_> = records.iter().map(|rec| rec.henceforth.signed_duration_since(chrono::NaiveDate::from_ymd(1970, 1, 1)).num_days() as i32).collect();


nit: long lines - let's keep them within 90 chars.

sunchao · 2019-12-20T18:59:58Z

rust/parquet_derive/src/lib.rs

+/// }
+/// ```
+///
+#[proc_macro_derive(ParquetRecordWriter)]


Instead of ParquetRecordWriter, what do you think of making it shorter, such as ParquetWrite, ParquetSerialize, etc?

Yeah, the trick is ParquetRecordWriter does both serialization (conversion to column type from a variety of inputs types) and writing out batches of data. I'll think about this more, but I'm leaning towards ParquetWrite or ParquetWriter.

ParquetWriter 👍

rust/parquet_derive/src/parquet_field.rs

sunchao · 2019-12-20T19:03:02Z

rust/parquet_derive/src/parquet_field.rs

+    ///
+    /// Can only generate writers for basic structs, for example:
+    ///
+    /// struct Record {


nit: quotes around these?

bryantbiggs · 2020-01-19T14:13:30Z

hey all, any update on this - it would be great to start using this

xrl · 2020-01-21T16:08:09Z

@bryantbiggs this feature is still near and dear to my heart, my team uses it everyday with great success. I will have to carve out some time to address the PR feedback.

Are you open to trying this code out in your project? Could you give any feedback?

bryantbiggs · 2020-03-15T16:05:04Z

hello @xrl - my apologies for the delayed response, just coming back around to this project. I created a branch off of yours to start making the recommended changes and will be running this on our current project. Any input/feedback is appreciated - I'd love to get this merged in if possible master...clowdhaus:parquet_derive

bryantbiggs · 2020-03-15T16:05:30Z

also @sunchao ☝️

xrl · 2020-03-15T16:11:41Z

@bryantbiggs thanks for the assist! I have not had the bandwidth to work on this more (and parquet writer/schema derive have been working great for us). Happy to merge in your work and keep this ball rolling.

wesm · 2020-04-01T22:12:21Z

It looks like this PR still has some activity. We're trying to close out stale PRs, but it would be great to see this work completed at some point, so I'll leave this open :)

bryantbiggs · 2020-04-01T22:58:20Z

yes sorry I was poking at this but got stuck at this release issue. any insights @wesm or @andygrove on release tags:

Failure: test_version_post_tag(PrepareTest)
/home/runner/work/arrow/arrow/dev/release/00-prepare-test.rb:320:in `test_version_post_tag'
     317:       prepare("VERSION_PRE_TAG",
     318:               "VERSION_POST_TAG")
     319:     end
  => 320:     assert_equal([
     321:                    {
     322:                      path: "c_glib/configure.ac",
     323:                      hunks: [
/home/runner/work/arrow/arrow/dev/release/00-prepare-test.rb:36:in `block (2 levels) in setup'
/home/runner/work/arrow/arrow/dev/release/00-prepare-test.rb:31:in `chdir'
/home/runner/work/arrow/arrow/dev/release/00-prepare-test.rb:31:in `block in setup'
/opt/hostedtoolcache/Ruby/2.6.5/x64/lib/ruby/2.6.0/tmpdir.rb:93:in `mktmpdir'
/home/runner/work/arrow/arrow/dev/release/00-prepare-test.rb:28:in `setup'
<[{:hunks=>
   [["-m4_define([arrow_glib_version], 1.0.0)",
     "+m4_define([arrow_glib_version], 2.0.0-SNAPSHOT)"]],
  :path=>"c_glib/configure.ac"},
 {:hunks=>[["-version = '1.0.0'", "+version = '2.0.0-SNAPSHOT'"]],
  :path=>"c_glib/meson.build"},
 {:hunks=>[["-pkgver=1.0.0", "+pkgver=1.0.0.9000"]],
  :path=>"ci/scripts/PKGBUILD"},
 {:hunks=>
   [["-set(ARROW_VERSION \"1.0.0\")",
     "+set(ARROW_VERSION \"2.0.0-SNAPSHOT\")"]],

nevi-me · 2020-09-13T06:21:35Z

@sunchao given the age of this PR, I'd like to propose merging it if CI is green, we can make further changes in separate PRs. I suspect that if people start using the functionality we'd be able to get more eyes on the code.

sunchao · 2020-09-13T21:54:55Z

@nevi-me sounds good to me - let's do that.

nevi-me · 2020-09-14T09:46:46Z

@kszucs I might need help :( I believe I updated dev/release/00-prepare-test.rb correctly, but I'm still getting a test failure. I'm on Windows, so not sure of how I can test locally

kszucs · 2020-09-14T09:50:16Z

Will try to take a look at it, also cc @kou

kou · 2020-09-14T10:45:18Z

Could you apply 0001-Update-parquet_derive-version.txt? (I don't know why I can't push to tureus:parquet_derive.)

$ git am < 0001-Update-parquet_derive-version.txt

nevi-me · 2020-09-14T10:56:48Z

Could you apply 0001-Update-parquet_derive-version.txt? (I don't know why I can't push to tureus:parquet_derive.)
$ git am < 0001-Update-parquet_derive-version.txt

it's because the repo was forked under the organisation :(

Thanks, I see what I missed now. I've pushed your patch

nevi-me · 2020-09-14T11:53:23Z

I've merged this now, the CI failures were unrelated.

Thanks for the contribution, and apologies that we've taken this long to get this merged @xrl @bryantbiggs. I'll try find some time to look at the comments that weren't addressed on this PR.

A rebase and significant rewrite of sunchao/parquet-rs#197 Big improvement: I now use a more natural nested enum style, it helps break out what patterns of data types are . The rest of the broad strokes still apply. Goal === Writing many columns to a file is a chore. If you can put your values in to a struct which mirrors the schema of your file, this `derive(ParquetRecordWriter)` will write out all the fields, in the order in which they are defined, to a row_group. How to Use === ``` extern crate parquet; #[macro_use] extern crate parquet_derive; #[derive(ParquetRecordWriter)] struct ACompleteRecord<'a> { pub a_bool: bool, pub a_str: &'a str, } ``` RecordWriter trait === This is the new trait which `parquet_derive` will implement for your structs. ``` use super::RowGroupWriter; pub trait RecordWriter<T> { fn write_to_row_group(&self, row_group_writer: &mut Box<RowGroupWriter>); } ``` How does it work? === The `parquet_derive` crate adds code generating functionality to the rust compiler. The code generation takes rust syntax and emits additional syntax. This macro expansion works on rust 1.15+ stable. This is a dynamic plugin, loaded by the machinery in cargo. Users don't have to do any special `build.rs` steps or anything like that, it's automatic by including `parquet_derive` in their project. The `parquet_derive/src/Cargo.toml` has a section saying as much: ``` [lib] proc-macro = true ``` The rust struct tagged with `#[derive(ParquetRecordWriter)]` is provided to the `parquet_record_writer` function in `parquet_derive/src/lib.rs`. The `syn` crate parses the struct from a string-representation to a AST (a recursive enum value). The AST contains all the values I care about when generating a `RecordWriter` impl: - the name of the struct - the lifetime variables of the struct - the fields of the struct The fields of the struct are translated from AST to a flat `FieldInfo` struct. It has the bits I care about for writing a column: `field_name`, `field_lifetime`, `field_type`, `is_option`, `column_writer_variant`. The code then does the equivalent of templating to build the `RecordWriter` implementation. The templating functionality is provided by the `quote` crate. At a high-level the template for `RecordWriter` looks like: ``` impl RecordWriter for $struct_name { fn write_row_group(..) { $({ $column_writer_snippet }) } } ``` this template is then added under the struct definition, ending up something like: ``` struct MyStruct { } impl RecordWriter for MyStruct { fn write_row_group(..) { { write_col_1(); }; { write_col_2(); } } } ``` and finally _THIS_ is the code passed to rustc. It's just code now, fully expanded and standalone. If a user ever changes their `struct MyValue` definition the `ParquetRecordWriter` will be regenerated. There's no intermediate values to version control or worry about. Viewing the Derived Code === To see the generated code before it's compiled, one very useful bit is to install `cargo expand` [more info on gh](https://github.com/dtolnay/cargo-expand), then you can do: ``` $WORK_DIR/parquet-rs/parquet_derive_test cargo expand --lib > ../temp.rs ``` then you can dump the contents: ``` struct DumbRecord { pub a_bool: bool, pub a2_bool: bool, } impl RecordWriter<DumbRecord> for &[DumbRecord] { fn write_to_row_group( &self, row_group_writer: &mut Box<parquet::file::writer::RowGroupWriter>, ) { let mut row_group_writer = row_group_writer; { let vals: Vec<bool> = self.iter().map(|x| x.a_bool).collect(); let mut column_writer = row_group_writer.next_column().unwrap().unwrap(); if let parquet::column::writer::ColumnWriter::BoolColumnWriter(ref mut typed) = column_writer { typed.write_batch(&vals[..], None, None).unwrap(); } row_group_writer.close_column(column_writer).unwrap(); }; { let vals: Vec<bool> = self.iter().map(|x| x.a2_bool).collect(); let mut column_writer = row_group_writer.next_column().unwrap().unwrap(); if let parquet::column::writer::ColumnWriter::BoolColumnWriter(ref mut typed) = column_writer { typed.write_batch(&vals[..], None, None).unwrap(); } row_group_writer.close_column(column_writer).unwrap(); } } } ``` now I need to write out all the combinations of types we support and make sure it writes out data. Procedural Macros === The `parquet_derive` crate can ONLY export the derivation functionality. No traits, nothing else. The derive crate can not host test cases. It's kind of like a "dummy" crate which is only used by the compiler, never the code. The parent crate cannot use the derivation functionality, which is important because it means test code cannot be in the parent crate. This forces us to have a third crate, `parquet_derive_test`. I'm open to being wrong on any one of these finer points. I had to bang on this for a while to get it to compile! Potentials For Better Design === - [x] Recursion could be limited by generating the code as "snippets" instead of one big `quote!` AST generator. Or so I think. It might be nicer to push generating each columns writing code to another loop. - [X] ~~It would be nicer if I didn't have to be so picky about data going in to the `write_batch` function. Is it possible we could make a version of the function which accept `Into<DataType>` or similar? This would greatly simplify this derivation code as it would not need to enumerate all the supported types. Something like `write_generic_batch(&[impl Into<DataType>])` would be neat.~~ (not tackling in this generation of the plugin) - [X] ~~Another idea to improving writing columns, could we have a write function for `Iterator`s? I already have a `Vec<DumbRecord>`, if I could just write a mapping for accessing the one value, we could skip the whole intermediate vec for `write_batch`. Should have some significant memory advantages.~~ (not tackling in this generation of the plugin, it's a bigger parquet-rs enhancement) - [X] ~~It might be worthwhile to derive a parquet schema directly from a struct definition. That should stamp out opportunities for type errors.~~ (moved to apache#203) Status === I have successfully integrated this work with my own data exporter (takes postgres/couchdb and outputs a single parquet file). I think this code is worth including in the project, with the caveat that it only generates simplistic `RecordWriter`s. As people start to use we can add code generation for more complex, nested structs. We can convert the nested matching style to a fancier looping style. But for now, this explicit nesting is easier to debug and understand (to me at least!). Closes apache#4140 from xrl/parquet_derive Lead-authored-by: Xavier Lange <xrlange@gmail.com> Co-authored-by: Neville Dipale <nevilledips@gmail.com> Co-authored-by: Bryant Biggs <bryantbiggs@gmail.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Neville Dipale <nevilledips@gmail.com>

sunchao changed the title ~~[ARROW-5123][RUST] Parquet derive for simple structs~~ ARROW-5123: [Rust] Parquet derive for simple structs Apr 11, 2019

sunchao added Component: Rust Component: Parquet labels Apr 11, 2019

sunchao reviewed Apr 15, 2019

View reviewed changes

sadikovi approved these changes Apr 16, 2019

View reviewed changes

rust/parquet_derive_test/src/lib.rs Outdated Show resolved Hide resolved

rust/parquet_derive_test/src/lib.rs Show resolved Hide resolved

rust/parquet_derive_test/Cargo.toml Outdated Show resolved Hide resolved

rust/parquet_derive_test/Cargo.toml Outdated Show resolved Hide resolved

kou reviewed Apr 17, 2019

View reviewed changes

dev/release/00-prepare.sh Outdated Show resolved Hide resolved

kou reviewed Apr 17, 2019

View reviewed changes

rust/parquet_derive/README.md Show resolved Hide resolved

sunchao reviewed Apr 23, 2019

View reviewed changes

kszucs force-pushed the master branch 2 times, most recently from ed180da to 85fe336 Compare July 22, 2019 19:29

brainstorm mentioned this pull request Nov 11, 2019

Parquet Output Filesize sunchao/parquet-rs#211

Open

sunchao reviewed Dec 20, 2019

View reviewed changes

wesm force-pushed the master branch from 5fe5b88 to aa55967 Compare April 19, 2020 22:47

kszucs force-pushed the master branch from 1b71ca7 to 5093b80 Compare April 20, 2020 19:21

xrl and others added 15 commits September 13, 2020 08:12

Cast num_days to i32

934aa8f

UUID feature and tests

e8e3381

Propagate errors

8d557e5

Switch to parse_quote

ec7086c

Less unwrap

4766b53

Switch to simpler recursive definition

e91f9e5

nit

c98cef7

Some comments for a tricky bit

7bfa155

No more generic x variable, use better names

57a05b0

cargo fmt

001ae1b

another rust fmt

63de331

Sample program for parquet derive in the rust doc

bf0a468

Simplify checking length of generic args, skip using an iterator

4ca84ad

- Update versions to latest

d04b112

Update error messages

76c65ef

nevi-me force-pushed the parquet_derive branch from 98b6906 to 76c65ef Compare September 13, 2020 06:15

nevi-me added 2 commits September 14, 2020 09:07

fix version in README

90a8f12

fix release script and failing test

abb8fe2

Update parquet_derive version

d1d7261

nevi-me closed this in 90e474d Sep 14, 2020

asfimport mentioned this pull request Sep 14, 2020

[Rust] derive RecordWriter from struct definitions #21608

Closed

ARROW-5123: [Rust] Parquet derive for simple structs #4140

ARROW-5123: [Rust] Parquet derive for simple structs #4140

Conversation

xrl commented Apr 11, 2019

Goal

How to Use

RecordWriter trait

How does it work?

Viewing the Derived Code

Procedural Macros

Potentials For Better Design

Status

xrl commented Apr 11, 2019

codecov-io commented Apr 11, 2019 • edited Loading

Codecov Report

sunchao commented Apr 11, 2019

xrl commented Apr 12, 2019 • edited Loading

sunchao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunchao Apr 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sadikovi left a comment

Choose a reason for hiding this comment

sunchao left a comment

Choose a reason for hiding this comment

emkornfield commented Oct 16, 2019

xrl commented Oct 19, 2019

xrl commented Oct 24, 2019

xrl commented Nov 29, 2019

sunchao commented Dec 2, 2019

sunchao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryantbiggs commented Jan 19, 2020

xrl commented Jan 21, 2020

bryantbiggs commented Mar 15, 2020

bryantbiggs commented Mar 15, 2020

xrl commented Mar 15, 2020

wesm commented Apr 1, 2020

bryantbiggs commented Apr 1, 2020

nevi-me commented Sep 13, 2020

sunchao commented Sep 13, 2020

nevi-me commented Sep 14, 2020

kszucs commented Sep 14, 2020

kou commented Sep 14, 2020

nevi-me commented Sep 14, 2020

nevi-me commented Sep 14, 2020

codecov-io commented Apr 11, 2019 •

edited

Loading

xrl commented Apr 12, 2019 •

edited

Loading

sunchao Apr 18, 2019 •

edited

Loading