You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The implementation of decode json to arrow array need convert batch_size of json str to serde_json Value .
this equires a lot of memory for serde_json Value. if with a big batch_size will OOM , usually a large batch_size will have a good compression rate.
Describe the solution you'd like
current implementation in pseudocode:
for batch in value_iter{letmut rows:Vec<Value> = Vec::with_capacity(batch_size);let arrays = convert_function(rows)}
If convert ony one json str to serde_json Value will save 3x-5x memory or more, i didn't record carefully .
I had implement a version in our online product in this way , because we use a large batch_size . the pseudocde is
let field_builder:Vec<Box<dynArrayBuilder>> = create_array_builder(batch_size);for(i, row)in value_iter.enumerate(){let value = serde_json::from_str(row);for(index, field)in schema.field.fields{let col_name = field.name();
field_builder[i].append(value.get(col_name))}if i == batch_size{let array_refs = builder.iter_mut().map(|builder| builder.finish()).collect();
.....
}}
this implementation didn't effect the performance.
But it didn't support deep nested list and map.
I'm not sure this is a elegant way for this. or it's possiable to support deep nested list and map.
if this is a good idea , I can try to make a PR for this .
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The implementation of decode json to arrow array need convert batch_size of json str to serde_json Value .
this equires a lot of memory for serde_json Value. if with a big batch_size will OOM , usually a large batch_size will have a good compression rate.
arrow-rs/arrow-json/src/reader.rs
Line 685 in e1b5657
Describe the solution you'd like
current implementation in pseudocode:
If convert ony one json str to serde_json Value will save 3x-5x memory or more, i didn't record carefully .
I had implement a version in our online product in this way , because we use a large batch_size . the pseudocde is
this implementation didn't effect the performance.
But it didn't support deep nested list and map.
I'm not sure this is a elegant way for this. or it's possiable to support deep nested list and map.
if this is a good idea , I can try to make a PR for this .
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: