Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Arbitrary JSON values in JSON Reader (#4905) #4911

Merged
merged 4 commits into from
Oct 12, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Oct 9, 2023

Which issue does this PR close?

Closes #4905

Rationale for this change

We should support decoding JSON payloads that don't have an object as the root.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 9, 2023
@tustvold
Copy link
Contributor Author

This does represent a fairly minor performance regression, but I'm not too concerned

small_bench_primitive   time:   [7.0711 µs 7.0744 µs 7.0778 µs]
                        change: [+2.9718% +3.2020% +3.6122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

large_bench_primitive   time:   [2.4153 ms 2.4161 ms 2.4169 ms]
                        change: [+5.7843% +5.8318% +5.8805%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

small_bench_list        time:   [13.203 µs 13.211 µs 13.220 µs]
                        change: [+1.2264% +1.3806% +1.5824%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really solid to me -- thank you @tustvold

I had some documentation suggestions, but nothing I think that is required before merging

Comment on lines 24 to 26
//! The reader is agnostic to whitespace, including `\n` and `\r`, and will ignore such characters.
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including
//! but not limited to newline-delimited JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor suggestion to improve the wording here

Suggested change
//! The reader is agnostic to whitespace, including `\n` and `\r`, and will ignore such characters.
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including
//! but not limited to newline-delimited JSON.
//! The reader ignores whitespace between JSON values, including `\n` and `\r`.
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including
//! but not limited to newline-delimited JSON.

schema,
}
}

/// Create a new [`ReaderBuilder`] with the provided [`FieldRef`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create a new [`ReaderBuilder`] with the provided [`FieldRef`]
/// Create a new [`ReaderBuilder`] that will parse JSON values with a root schema of [`FieldRef`].

Perhaps we can add a note in new() that says it does require the root to be an object like {..}.

I wonder if new_from_field might be a more descriptive name (as this isn't making a new Field, it is making a new reader that reads data with the type on the field

false,
)?;
let (data_type, nullable) = match self.is_field {
false => (DataType::Struct(self.schema.fields.clone()), false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not allow null root fields with structs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because RecordBatch can't support nulls at the root level

@@ -297,7 +297,7 @@ macro_rules! next {
pub struct TapeDecoder {
elements: Vec<TapeElement>,

num_rows: usize,
cur_row: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cur_row: usize,
/// logical row being decoded
cur_row: usize,

@tustvold tustvold merged commit 90bc5ec into apache:master Oct 12, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for reading JSON Array to Arrow
2 participants