Add write method to Json Writer #1383

matthewmturner · 2022-03-02T06:14:03Z

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter · 2022-03-02T06:27:57Z

Codecov Report

Merging #1383 (eb8bda4) into master (483a502) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head eb8bda4 differs from pull request most recent head bdfcc27. Consider uploading reports for the commit bdfcc27 to get more accurate results

@@            Coverage Diff             @@
##           master    #1383      +/-   ##
==========================================
+ Coverage   83.00%   83.02%   +0.01%     
==========================================
  Files         181      181              
  Lines       52994    53016      +22     
==========================================
+ Hits        43990    44015      +25     
+ Misses       9004     9001       -3

Impacted Files	Coverage Δ
arrow/src/json/writer.rs	`92.11% <100.00%> (+0.33%)`	⬆️
parquet_derive/src/parquet_field.rs	`66.21% <0.00%> (+0.22%)`	⬆️
arrow/src/datatypes/field.rs	`54.10% <0.00%> (+0.30%)`	⬆️
arrow/src/datatypes/datatype.rs	`66.80% <0.00%> (+0.39%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 483a502...bdfcc27. Read the comment docs.

alamb

LGTM -- thanks @matthewmturner

v1gnesh · 2022-03-04T04:18:50Z

Hello @alamb and @matthewmturner,

Any comments on how the work in these 2 projects overlaps with work on JSON/Arrow that happens here (in arrow-rs):

https://github.com/chmp/serde_arrow
https://github.com/simd-lite/simd-json-derive

alamb · 2022-03-04T19:51:50Z

Hi @v1gnesh I am not familiar with those crates. This is probably a question better directed at the authors of them (and if there is anything they wanted to contribute back to the arrow crate)

@chmp @Licenser

chmp · 2022-03-04T21:35:06Z

Sure happy to sum up serde_arrow. First: at the moment it's only an experiement. I found it useful for some private data processing, and thought maybe it's also helpful to others. At the moment the status is usuable, but only those cases are handled, that I needed so far.

The basic idea is to allow record batches to be used as a "file format" for serde. A typical way of using serde is to take some Rust structs, to implement serialize, and then to generate JSON from these structs. serde_arrow allows you to use the same structs, but to generate record batches. So you don't have to use the arrow builder API, but can simply derive serde::Serialize and then call serde_arrow::to_record_batch. The reverse process, reading structs from a record batch is also supported. One complication is, that the data models of serde and arrow do not match. For that reason serde_arrow offers additional logic to let the user specify how to translate the serde data model into arrow one and back again.

For eaxmple:

#[derive(Serialize)]
struct Example {
    a: f32,
    b: i32,
}

let examples = vec![
    Example { a: 1.0, b: 1 },
    Example { a: 2.0, b: 2 },
];

let schema = serde_arrow::Schema::from_records(&examples)?;
let batch = serde_arrow::to_record_batch(&examples, schema)?;

alamb · 2022-03-04T23:16:06Z

That is interesting -- @chmp -- I tried to do something similar with IOx (to create RecordBatches from vec's of objects we already had and then expose them as SQL queryable tables)

chmp · 2022-03-05T11:59:56Z

That sound's awesome could you point me to the impl for some inspiration? In case you're interested, I typed up an explanation for how serde_arrow works here.

alamb · 2022-03-05T12:21:46Z

@chmp this is as far as I got: https://github.com/influxdata/influxdb_iox/issues/1013#issuecomment-806741682

We eventually went with manual conversion to Arrow :
https://github.com/influxdata/influxdb_iox/blob/main/db/src/system_tables.rs

chmp · 2022-03-09T18:06:19Z

Thanks for the link. The code looks actually quite similar. Def. gonna borrow the idea to support iterables directly :)

alamb · 2022-03-10T11:34:10Z

Yeah, I would love a way to support automatically creating arrow arrays from something that implemented serde -- so many cool projects to do I just ran out of time ;)

Add write method

1467ca9

github-actions bot added the arrow Changes to the arrow crate label Mar 2, 2022

Add docs

bdfcc27

alamb approved these changes Mar 3, 2022

View reviewed changes

alamb merged commit 4642b3e into apache:master Mar 3, 2022

chmp mentioned this pull request May 1, 2022

Arrow-rs now generate schema & batches with json values of json file chmp/serde_arrow#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add write method to Json Writer #1383

Add write method to Json Writer #1383

matthewmturner commented Mar 2, 2022

codecov-commenter commented Mar 2, 2022

alamb left a comment

v1gnesh commented Mar 4, 2022

alamb commented Mar 4, 2022

chmp commented Mar 4, 2022

alamb commented Mar 4, 2022

chmp commented Mar 5, 2022

alamb commented Mar 5, 2022

chmp commented Mar 9, 2022

alamb commented Mar 10, 2022

Add write method to Json Writer #1383

Add write method to Json Writer #1383

Conversation

matthewmturner commented Mar 2, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

codecov-commenter commented Mar 2, 2022

Codecov Report

alamb left a comment

Choose a reason for hiding this comment

v1gnesh commented Mar 4, 2022

alamb commented Mar 4, 2022

chmp commented Mar 4, 2022

alamb commented Mar 4, 2022

chmp commented Mar 5, 2022

alamb commented Mar 5, 2022

chmp commented Mar 9, 2022

alamb commented Mar 10, 2022