Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy reader support for s-expressions #627

Merged
merged 28 commits into from
Sep 1, 2023
Merged

Lazy reader support for s-expressions #627

merged 28 commits into from
Sep 1, 2023

Conversation

zslayton
Copy link
Contributor

Builds on outstanding PRs #612, #613, #614, #616, #617, #619, #620, #621, #622, and #623.

Adds support for reading text s-expressions in the lazy reader.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Aug 18, 2023

Codecov Report

Patch coverage: 61.09% and project coverage change: -0.25% ⚠️

Comparison is base (eda4e2b) 81.54% compared to head (f935c64) 81.30%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #627      +/-   ##
==========================================
- Coverage   81.54%   81.30%   -0.25%     
==========================================
  Files         123      123              
  Lines       23158    23446     +288     
  Branches    23158    23446     +288     
==========================================
+ Hits        18885    19062     +177     
- Misses       2549     2649     +100     
- Partials     1724     1735      +11     
Files Changed Coverage Δ
src/lazy/decoder.rs 60.00% <ø> (ø)
src/lazy/encoding.rs 0.00% <ø> (ø)
src/lazy/raw_value_ref.rs 71.67% <ø> (ø)
src/lazy/text/encoded_value.rs 75.38% <0.00%> (-0.59%) ⬇️
src/lazy/value_ref.rs 74.07% <ø> (ø)
src/lazy/binary/raw/sequence.rs 45.34% <32.35%> (-14.66%) ⬇️
src/lazy/any_encoding.rs 42.47% <38.35%> (-6.91%) ⬇️
src/lazy/text/raw/sequence.rs 53.07% <42.85%> (-0.21%) ⬇️
src/lazy/sequence.rs 69.78% <65.33%> (-3.30%) ⬇️
src/lazy/text/raw/reader.rs 87.74% <75.00%> (-0.91%) ⬇️
... and 7 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ cargo fmt rearranged the imports a bit in this file.

Comment on lines +56 to +57
List(s) => count_sequence_children(s.iter())?,
SExp(s) => count_sequence_children(s.iter())?,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ In binary Ion, the parsing logic for the bodies of lists and for s-expressions is identical. However, in text the parsing is substantially different. Not only do they have different delimiters for opening ([ vs (), closing (] vs ), and between values (, vs whitespace-or-nothing), but s-expressions also allow for the special grammar production of operators.

In order to accommodate these differences without introducing runtime overhead via branching or dynamic dispatch, I had to breaking the LazySequence type into LazyList and LazySExp types that could house their own logic. This change was also made in the raw level and accounts for the majority of this diff.

Meanwhile, the Sequence type is still unified in the Element API. I think this is reasonable, as the Element API is intentionally divorced from the reading logic needed to deserialize a stream into Elements.

fn count_sequence_children(lazy_sequence: &LazyBinarySequence) -> IonResult<usize> {
fn count_sequence_children<'a, 'b>(
lazy_sequence: impl Iterator<Item = IonResult<LazyBinaryValue<'a, 'b>>>,
) -> IonResult<usize> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This example program takes an iterator to generically handle either a list or sexp. We can always introduce a LazySequence trait later if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ All of the changes in this file come from splitting the LazyRawAnySequence type into LazyRawAnyList and LazyRawAnySExp. There is no interesting new logic.

@@ -188,7 +188,7 @@ mod tests {
let lazy_list = reader.next()?.expect_value()?.read()?.expect_list()?;
// Exercise the `Debug` impl
println!("Lazy Raw Sequence: {:?}", lazy_list);
let mut list_values = lazy_list.iter();
let mut list_values = lazy_list.sequence.iter();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ In the binary reader, the LazyRawBinaryList and LazyRawBinarySExp types each wrap an instance of a LazyRawBinarySequence helper type that houses their shared parsing logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The changes in this file all come from splitting the LazySequence type into distinct LazyList and LazySExp types. There is no interesting new logic.

Comment on lines -170 to -199
match self.ion_type() {
IonType::SExp => {
write!(f, "(")?;
for value in self {
write!(
f,
"{:?} ",
value
.map_err(|_| fmt::Error)?
.read()
.map_err(|_| fmt::Error)?
)?;
}
write!(f, ")")?;
}
IonType::List => {
write!(f, "[")?;
for value in self {
write!(
f,
"{:?},",
value
.map_err(|_| fmt::Error)?
.read()
.map_err(|_| fmt::Error)?
)?;
}
write!(f, "]")?;
}
_ => unreachable!("LazySequence is only created for list and sexp"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This removal gives a taste of the sort of split logic a unified LazySequence would need to house at each level of abstraction. Now the List and SExp types can simply house their respective logic.

Comment on lines +361 to +379
/// Matches an optional annotation sequence and a trailing value.
pub fn match_annotated_value(self) -> IonParseResult<'data, LazyRawTextValue<'data>> {
pair(
opt(Self::match_annotations),
whitespace_and_then(Self::match_value),
)
.map(|(maybe_annotations, mut value)| {
if let Some(annotations) = maybe_annotations {
value.encoded_value = value
.encoded_value
.with_annotations_sequence(annotations.offset(), annotations.len());
// Rewind the value's input to include the annotations sequence.
value.input = self.slice_to_end(annotations.offset() - self.offset());
}
value
})
.parse(self)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This method was moved, not changed. You'll see the removal immediately following.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Half of the changes in this file come from changind LazyRawTextSequence into LazyRawTextList. The other half come from introducing the new LazyRawTextSExp type; these contain interesting new logic. I'll call out where the new logic begins.

@@ -171,6 +146,139 @@ impl<'data> Iterator for RawTextSequenceIterator<'data> {
}
}

// ===== S-Expressions =====
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ New logic begins here.

@zslayton zslayton marked this pull request as ready for review August 18, 2023 20:02
"(a)",
"(a b)",
"(a++)",
"(++a)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case with operator with different characters in it.

Suggested change
"(++a)",
"(++a)",
"(a+=b)",

}

impl MatchedSymbol {
pub fn read<'data>(
&self,
matched_input: TextBufferView<'data>,
) -> IonResult<RawSymbolTokenRef<'data>> {
use MatchedSymbol::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a local use right before the match expression really cleans it up a lot.

This is something new-ish for you, I think. I've noticed this in a few places in the lazy reader PRs, but I don't remember seeing it when I was working on Ion Rust 6-9 months ago. Anyway, it's very nice, and I'm going to be copying this from you.

Base automatically changed from lazy-timestamps to main August 28, 2023 16:34
@zslayton zslayton self-assigned this Aug 29, 2023
@zslayton zslayton merged commit efd90a7 into main Sep 1, 2023
17 of 18 checks passed
@zslayton zslayton deleted the lazy-sexps branch September 1, 2023 14:54
zslayton pushed a commit that referenced this pull request Sep 7, 2023
zslayton added a commit that referenced this pull request Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants