Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a 'lines' text writer format #415

Merged
merged 8 commits into from
Oct 3, 2022
Merged

Add a 'lines' text writer format #415

merged 8 commits into from
Oct 3, 2022

Conversation

jobarr-amzn
Copy link
Contributor

@jobarr-amzn jobarr-amzn commented Sep 9, 2022

Description of changes:
I needed to generate newline-centric Ion data from binary and didn't have an easy way to do it, so I ended up doing something more involved with regular expressions, a small perl script, and reasoning about indentation. This is the capability that I wish I had then.

I've also added a space_between_top_values parameter to the RawTextWriterBuilder so that top-level values can be handled differently, and extracted a heap-allocated WhitespaceConfig struct as suggested in the comment around raw text writer whitespace configuration.

Please let me know wherever this can be improved, my Rust is as of yet very unoxidized.

Representatives of each format, from the documentation:

default

{foo: 1, bar: 2, baz: 3} [1, 2, 3] true "hello"

lines

{foo: 1, bar: 2, baz: 3}
[1, 2, 3]
true
"hello"

pretty

{
    foo: 1,
    bar: 2,
    baz: 3
}
[
    1,
    2,
    3
]
true
"hello"

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jobarr-amzn
Copy link
Contributor Author

I've already patched ion-cli to support --format newline as well, and will fire that PR out after this ships. Happy to take suggestions on the newline name as well. oneline? Something else?

}
} else {
// Otherwise, this is not the first value in this container. Emit the container's
// delimiter (for example: in a list, write a `,`) before we write the value itself.
self.write_value_delimiter()?;
write!(&mut self.output, "{}", self.space_between_values)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider hooking into the write_value_delimiter fn that's already switching on ContainerType or otherwise combining this with that logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did look at it, but I didn't see a satisfactory way. If I had to put this in write_value_delimiter then I'd still have to switch there on whether or not to add the whitespace_config.space_between_values. It also looks to me like write_value_delimiter is focused on the syntactic delimiters, not the presentation/whitespace additions.

I'd rather leave all the "spacing" concerns in one place, I think.

This also makes me wonder whether the builder ought to validate that all the supplied whitespace values are semantically whitespace.

@rmarrowstone
Copy link
Contributor

Per your note, I don't love the 'newline' name either. I don't have a great suggestion, but would suggest something like 'distinct' or '(per|each)(Item|datum)' something something...

@jobarr-amzn
Copy link
Contributor Author

Oof, this didn't show up in cargo build locally but I need to address the new TextKind in ion_c_writer.rs#L17.

I'll have to go to back to the drawing board a bit.

@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Merging #415 (58a19c3) into main (b53ba24) will increase coverage by 0.00%.
The diff coverage is 83.78%.

@@           Coverage Diff           @@
##             main     #415   +/-   ##
=======================================
  Coverage   83.40%   83.41%           
=======================================
  Files          83       83           
  Lines       15813    15909   +96     
=======================================
+ Hits        13189    13270   +81     
- Misses       2624     2639   +15     
Impacted Files Coverage Δ
src/text/text_writer.rs 78.72% <0.00%> (-7.33%) ⬇️
src/value/writer.rs 61.53% <0.00%> (+4.39%) ⬆️
src/text/raw_text_writer.rs 87.50% <87.73%> (+1.60%) ⬆️
src/lib.rs 73.04% <0.00%> (-1.04%) ⬇️
src/text/raw_text_reader.rs 90.58% <0.00%> (+0.16%) ⬆️
src/reader.rs 82.40% <0.00%> (+0.30%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Some superficial comments below.

src/text/raw_text_writer.rs Outdated Show resolved Hide resolved
src/text/raw_text_writer.rs Show resolved Hide resolved
src/text/raw_text_writer.rs Outdated Show resolved Hide resolved
) -> RawTextWriterBuilder {
self.space_after_field_name = space_after_container_start.into();
self.whitespace_config.space_after_field_name = space_after_container_start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some dissonance here that I probably caused but can't remember why. Should space_after_container_start and space_after_field_name have two separate setters? Or should we unify their names somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is just a bug? Here's the relevant snippet of the compact text format configuration:

    // Single space between field names and values
    space_after_field_name: " ",
    // The first value in a container appears next to the opening delimiter
    space_after_container_start: "",

and here's the relevant snippet of the pretty text format configuration:

    // Field names and values are separated by a single space
    space_after_field_name: " ",
    // The first value in a container appears on a line by itself
    space_after_container_start: "\n",

Both of them have different values for these two fields, and those different values will be used. It's only this setter which conflates them, and no tests exercise this setter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests that exercise the builder setters.

src/text/raw_text_writer.rs Show resolved Hide resolved
Comment on lines 168 to 169
// Otherwise use the compact/default layout from `DEFAULT_WS_CONFIG`
..DEFAULT_WS_CONFIG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

indentation: String,
space_after_field_name: String,
space_after_container_start: String,
whitespace_config: Box<WhitespaceConfig>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this drops 96 bytes of stack space down to 8! 🙌

src/text/text_writer.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! 🙌

There's a clippy suggestion about using !is_empty instead of != "". Could you address that before merging?

@jobarr-amzn jobarr-amzn changed the title Add a 'newline' text writer format Add a 'lines' text writer format Oct 3, 2022
@jobarr-amzn jobarr-amzn merged commit ec8384d into main Oct 3, 2022
@jobarr-amzn jobarr-amzn deleted the newline-text-format branch October 3, 2022 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants