-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for down-converting ion to JSON #310
Conversation
This commit adds functionality to the ion text writer to allow for a basic lossy conversion to JSON encoding. An existing issue that needs some consensus is how multiple top-level values are handled. With this commit multiple top-level items are serialized with no delimeter. This is supported by some JSON parsers, such the one used by the tool jq, but may be an issue with others. A simple hacky fix can be added to place all top-level values into a list.
The appveyor build is failing with:
The error returned (value: 7) is MacOS build with gcc-11 failed due to an assertion in the linker.. which is.. neat.
|
MacOS issue was due to a bug in Xcode 14.0.1. I'm running 14.1 and I was unable to reproduce the issue, so I've added a build step for macos builds in the GH Actions workflow to set the default xcode to 14.1 (it is installed in the image, just not used by default). That seems to have fixed the MacOS issue.. now on to the windows issue.. |
Reproduced the issue in the windows build, tracked it down to a misplaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Minor comments inline.
Regarding how multiple top-level values should be handled, I'm fine with the behavior as-is. Namely, that newlines will separate top-level values when pretty
is enabled, but not when pretty
is disabled. If users need to use the output with a JSON parser that doesn't support multiple top-level values, they can easily accommodate that manually, e.g. by wrapping everything in a single list.
We could add a lines
format option, but I see that as orthogonal to this change, as it could apply to both text Ion and JSON. I'm aware of use cases where "Ion lines" has come in handy.
if (json_downconvert) | ||
ION_PUT(pwriter->output, '"'); | ||
IONCHECK(_ion_writer_text_append_ascii_cstr(pwriter->output, temp)); | ||
if (json_downconvert) | ||
ION_PUT(pwriter->output, '"'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't defined a formal style guide for this repo, but let's use braces for every if
(for all new code anyway).
|
||
if (pwriter->depth == 0 && pwriter->annotation_count == 0 && pstr->value[0] == '$' | ||
&& _ion_symbol_table_parse_version_marker(pstr, NULL, NULL)) { | ||
// The text $ion_<int>_<int> is reserved for the IVMs. This is a no-op. | ||
SUCCEED(); | ||
} | ||
else { | ||
char quote = ION_TEXT_WRITER_IS_JSON() ? '"':'\''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use down_convert
here for consistency.
{ | ||
iENTER; | ||
SIZE written; | ||
char quote_char = (pwriter->options.json_downconvert) ? '"' : '\''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ION_TEXT_WRITER_IS_JSON()
for consistency
ION_PUT(poutput, '\''); | ||
if (as_ascii) { | ||
IONCHECK(_ion_writer_text_append_escaped_string(poutput, p_str, '\'')); | ||
if (_ion_symbol_needs_quotes(p_str, system_identifiers_need_quotes) || pwriter->options.json_downconvert) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ION_TEXT_WRITER_IS_JSON()
for consistency
if (!down_convert) | ||
image = _ion_writer_get_control_escape_string(c); | ||
else | ||
image = _ion_writer_get_control_escape_string_json(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a minor preference to swap the bodies and lose the negation. Disregard if this style was chosen deliberately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that wasn't done for any particular reason, definitely reads better without the negation.
IONJSON_CMP("-inf", "null"); | ||
IONJSON_CMP("1.0e0", "1"); | ||
IONJSON_CMP("1.5e0", "1.5"); | ||
IONJSON_CMP("1e-5", "1e-05"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's... interesting? Is 05
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exponent of 5 is needed to force the float to get rendered in exponential notation. The literal 05
in the string isn't required by anything, it's just the way ion-c is currently rendering exponents to text.
IONJSON_CMP("1d0", "1"); | ||
IONJSON_CMP("1.", "1"); | ||
IONJSON_CMP("1d-0", "1"); | ||
IONJSON_CMP("0d1", "0E+1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we're using lowercase e
for some, and uppercase E
for others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the way decimals are rendered differs from how floats are. The decNumber library handles the decimals, while we're using snprintf
for floats. I'm adding a task to get those following the same rules to the plan. Negligible priority I'd imagine, but annoying none the less.
Thank You @tgregg! I'm going to merge this tonight and follow up with a PR for the changes mentioned above if no one has any arguments against that. |
Issue #, if available: n/a
Description of changes:
This PR adds functionality to the ion text writer to allow for a basic lossy conversion to JSON encoding. The down-conversion is done in the same manner described in the cookbook, with a couple minor differences that will be corrected soon.
The most noticeable difference is the ranges in which unicode is escaped. Currently, any non-ascii characters are emitted as unicode escape sequences. I'll follow up with a fix for that soon. I have identified the issue, and just need to implement the fix and test it.
An existing issue that needs some consensus is how multiple top-level values are handled. With this commit multiple top-level items are serialized with no delimeter. This is supported by some JSON parsers, such the one used by the tool jq, but may be an issue with others. A simple hacky fix can be added to place all top-level values into a list, but I didn't feel comfortable with the implementation unless it is truly needed. Another option might be to use
JSON Lines(website seems to be down).The changes add a new option to
ION_WRITER_OPTIONS
calledjson_downconvert
which, when true will format any output written to the writer using the JSON friendly formatting described in the cookbook link (minus the above notes).An example of converting ion data to json, can be found in
test/test_ion_text.cpp
in theconvert_to_json
function.This PR also extends the CLI to allow the output formats
json
andjson-pretty
, which perform the down conversion with pretty printing off, and on, respectively. Examples of usingion process
to perform downconversion has also been added to the help.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.