-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic JSON support #3
Conversation
@michaelmior thanks for the implementation - the JSON idea is very nice :) I have made some comments. Let me know if you need any help. If you wouldn't like to deal with it, I can take it over from you. |
@michaelmior I allowed myself to add my two cents here. Hope you don't mind :) Pushed commit that makes process_row allow to accept both &str and String. This allows to keep previous performance of CSV parsing. Also moved file_type test out of the loop just in case that could cause some regressions. It adds some code duplication, so possibly in the future we should refactor this portion. Especially, if we will want to add another file type. @piobab I would propose merging this as this PR after your final review. |
@kodieg Thanks! I just rebased and made one minor change. |
Removed two new clippy warnings regarding &Vec<...> refs in function arguments. Replaced with slices. |
If all else looks good, let me know if you want to rebase to clean up the messy history. |
src/pipeline.rs
Outdated
.map({ | ||
|c| { | ||
if !c.complex { | ||
smallvec![parsed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use other types for non-complex column as well (something similar as it is for arrays):
let elem = parsed.at_key(&c.name).unwrap();
let value = match elem.get_type() {
dom::element::ElementType::String => elem.get_string().unwrap(),
_ => elem.minify(),
};
smallvec![value]
After modification we can read such lines:
{"users": 1, "products": ["p1", "p2"], "brands": ["b1", "b2"]}
{"users": 2, "products": ["p2", "p3", "p4"], "brands": ["b1"]}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Although personally I would lean towards disallowing array values here since they should probably be treated as complex. This is why I only allowed strings initially. Although certainly other atomic values would make sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right but column config is here for "support". If there is some array it is treated as single entity and a. user see it in the output file. Otherwise the user should prepare json file with just strings.
@michaelmior I made a bench for TSV based on very big file - no regression. We will add benches to CI pipeline in other PR. I've made a review. Please take a look. After your changes I think we can merge. |
I've addressed a couple final issues and rebased into a single commit so I think this is ready to be merged. |
I'd like to be able to use Cleora for JSON data, so this is an attempt to implement support.
I'm not an experienced Rustacean, so there's definitely some cruft here to be removed, but it's basically working.
There is now a file type parameter that can be passed in to parse input data as JSON.
Note that I've removed any mention of CSV since it doesn't actually seem to be supported, but correct if I'm wrong.
If this seems like something you may eventually want to merge, happy to take suggestions on improvements