Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write method to Json Writer #1383

Merged
merged 2 commits into from Mar 3, 2022

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #1340

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 2, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #1383 (eb8bda4) into master (483a502) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head eb8bda4 differs from pull request most recent head bdfcc27. Consider uploading reports for the commit bdfcc27 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1383      +/-   ##
==========================================
+ Coverage   83.00%   83.02%   +0.01%     
==========================================
  Files         181      181              
  Lines       52994    53016      +22     
==========================================
+ Hits        43990    44015      +25     
+ Misses       9004     9001       -3     
Impacted Files Coverage Δ
arrow/src/json/writer.rs 92.11% <100.00%> (+0.33%) ⬆️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.22%) ⬆️
arrow/src/datatypes/field.rs 54.10% <0.00%> (+0.30%) ⬆️
arrow/src/datatypes/datatype.rs 66.80% <0.00%> (+0.39%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 483a502...bdfcc27. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- thanks @matthewmturner

@alamb alamb merged commit 4642b3e into apache:master Mar 3, 2022
@v1gnesh
Copy link

v1gnesh commented Mar 4, 2022

Hello @alamb and @matthewmturner,

Any comments on how the work in these 2 projects overlaps with work on JSON/Arrow that happens here (in arrow-rs):

https://github.com/chmp/serde_arrow
https://github.com/simd-lite/simd-json-derive

@alamb
Copy link
Contributor

alamb commented Mar 4, 2022

Hi @v1gnesh I am not familiar with those crates. This is probably a question better directed at the authors of them (and if there is anything they wanted to contribute back to the arrow crate)

@chmp @Licenser

@chmp
Copy link

chmp commented Mar 4, 2022

Sure happy to sum up serde_arrow. First: at the moment it's only an experiement. I found it useful for some private data processing, and thought maybe it's also helpful to others. At the moment the status is usuable, but only those cases are handled, that I needed so far.

The basic idea is to allow record batches to be used as a "file format" for serde. A typical way of using serde is to take some Rust structs, to implement serialize, and then to generate JSON from these structs. serde_arrow allows you to use the same structs, but to generate record batches. So you don't have to use the arrow builder API, but can simply derive serde::Serialize and then call serde_arrow::to_record_batch. The reverse process, reading structs from a record batch is also supported. One complication is, that the data models of serde and arrow do not match. For that reason serde_arrow offers additional logic to let the user specify how to translate the serde data model into arrow one and back again.

For eaxmple:

#[derive(Serialize)]
struct Example {
    a: f32,
    b: i32,
}

let examples = vec![
    Example { a: 1.0, b: 1 },
    Example { a: 2.0, b: 2 },
];

let schema = serde_arrow::Schema::from_records(&examples)?;
let batch = serde_arrow::to_record_batch(&examples, schema)?;

@alamb
Copy link
Contributor

alamb commented Mar 4, 2022

That is interesting -- @chmp -- I tried to do something similar with IOx (to create RecordBatches from vec's of objects we already had and then expose them as SQL queryable tables)

@chmp
Copy link

chmp commented Mar 5, 2022

That sound's awesome could you point me to the impl for some inspiration? In case you're interested, I typed up an explanation for how serde_arrow works here.

@alamb
Copy link
Contributor

alamb commented Mar 5, 2022

@chmp this is as far as I got: https://github.com/influxdata/influxdb_iox/issues/1013#issuecomment-806741682

We eventually went with manual conversion to Arrow :
https://github.com/influxdata/influxdb_iox/blob/main/db/src/system_tables.rs

@chmp
Copy link

chmp commented Mar 9, 2022

Thanks for the link. The code looks actually quite similar. Def. gonna borrow the idea to support iterables directly :)

@alamb
Copy link
Contributor

alamb commented Mar 10, 2022

Yeah, I would love a way to support automatically creating arrow arrays from something that implemented serde -- so many cool projects to do I just ran out of time ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add write method to JsonWriter
5 participants