Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy reader support for s-expressions #627

Merged
merged 28 commits into from
Sep 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
e0a83d8
Top-level nulls, bools, ints
zslayton Jul 16, 2023
89f79aa
Consolidate impls of AsUtf8 w/helper fn
zslayton Jul 25, 2023
840be4d
Improved TextBufferView docs, removed DataSource
zslayton Jul 25, 2023
5db1ff0
Adds lazy text floats
zslayton Jul 27, 2023
07d4a70
Adds LazyRawTextReader support for comments
zslayton Jul 27, 2023
181e0a5
Adds LazyRawTextReader support for reading strings
zslayton Jul 28, 2023
357ca8f
clippy fixes
zslayton Jul 28, 2023
716ff34
Fix a couple of unit tests
zslayton Jul 29, 2023
e29fec5
Less ambitious float eq comparison
zslayton Jul 29, 2023
8f79a36
Adds LazyRawTextReader support for reading symbols
zslayton Aug 1, 2023
4cb9b2b
Adds more doc comments
zslayton Aug 1, 2023
54470d2
More doc comments
zslayton Aug 1, 2023
78014e7
Adds `LazyRawTextReader` support for reading lists
zslayton Aug 3, 2023
a6a3aa8
Adds `LazyRawTextReader` support for structs
zslayton Aug 10, 2023
4fc9078
More doc comments
zslayton Aug 10, 2023
11174ac
Adds `LazyRawTextReader` support for reading IVMs
zslayton Aug 10, 2023
719dbaa
Initial impl of a LazyRawAnyReader
zslayton Aug 11, 2023
f603872
Improved comments.
zslayton Aug 11, 2023
4696ca5
Adds LazyRawTextReader support for annotations
zslayton Aug 11, 2023
c7129ac
Adds lazy reader support for timestamps
zslayton Aug 14, 2023
44435ea
Lazy reader support for s-expressions
zslayton Aug 18, 2023
d50e05b
Fixed doc comments
zslayton Aug 18, 2023
8283422
Fix internal doc link
zslayton Aug 18, 2023
4b53bb3
Merge remote-tracking branch 'origin/main' into lazy-timestamps
zslayton Aug 23, 2023
60d5a17
Incorporates review feedback
zslayton Aug 23, 2023
db9718d
Matcher recognizes +00:00 as Zulu
zslayton Aug 23, 2023
37264a3
Merge remote-tracking branch 'origin/lazy-timestamps' into lazy-sexps
zslayton Aug 23, 2023
f935c64
Merge remote-tracking branch 'origin/main' into lazy-sexps
zslayton Aug 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions examples/lazy_read_all_values.rs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ cargo fmt rearranged the imports a bit in this file.

Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
#[cfg(feature = "experimental-lazy-reader")]
use ion_rs::IonResult;

#[cfg(not(feature = "experimental-lazy-reader"))]
fn main() {
println!("This example requires the 'experimental-lazy-reader' feature to work.");
}

#[cfg(feature = "experimental-lazy-reader")]
use ion_rs::IonResult;

#[cfg(feature = "experimental-lazy-reader")]
fn main() -> IonResult<()> {
lazy_reader_example::read_all_values()
}

#[cfg(feature = "experimental-lazy-reader")]
mod lazy_reader_example {
use std::fs::File;
use std::process::exit;

use memmap::MmapOptions;

use ion_rs::lazy::r#struct::LazyBinaryStruct;
use ion_rs::lazy::reader::LazyBinaryReader;
use ion_rs::lazy::sequence::LazyBinarySequence;
use ion_rs::lazy::value::LazyBinaryValue;
use ion_rs::lazy::value_ref::ValueRef;
use ion_rs::IonResult;
use memmap::MmapOptions;
use std::fs::File;
use std::process::exit;

pub fn read_all_values() -> IonResult<()> {
let args: Vec<String> = std::env::args().collect();
Expand Down Expand Up @@ -53,14 +53,17 @@ mod lazy_reader_example {
fn count_value_and_children(lazy_value: &LazyBinaryValue) -> IonResult<usize> {
use ValueRef::*;
let child_count = match lazy_value.read()? {
List(s) | SExp(s) => count_sequence_children(&s)?,
List(s) => count_sequence_children(s.iter())?,
SExp(s) => count_sequence_children(s.iter())?,
Comment on lines +56 to +57
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ In binary Ion, the parsing logic for the bodies of lists and for s-expressions is identical. However, in text the parsing is substantially different. Not only do they have different delimiters for opening ([ vs (), closing (] vs ), and between values (, vs whitespace-or-nothing), but s-expressions also allow for the special grammar production of operators.

In order to accommodate these differences without introducing runtime overhead via branching or dynamic dispatch, I had to breaking the LazySequence type into LazyList and LazySExp types that could house their own logic. This change was also made in the raw level and accounts for the majority of this diff.

Meanwhile, the Sequence type is still unified in the Element API. I think this is reasonable, as the Element API is intentionally divorced from the reading logic needed to deserialize a stream into Elements.

Struct(s) => count_struct_children(&s)?,
_ => 0,
};
Ok(1 + child_count)
}

fn count_sequence_children(lazy_sequence: &LazyBinarySequence) -> IonResult<usize> {
fn count_sequence_children<'a, 'b>(
lazy_sequence: impl Iterator<Item = IonResult<LazyBinaryValue<'a, 'b>>>,
) -> IonResult<usize> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This example program takes an iterator to generically handle either a list or sexp. We can always introduce a LazySequence trait later if desired.

let mut count = 0;
for value in lazy_sequence {
count += count_value_and_children(&value?)?;
Expand Down
184 changes: 144 additions & 40 deletions src/lazy/any_encoding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ use crate::lazy::binary::raw::r#struct::{
LazyRawBinaryField, LazyRawBinaryStruct, RawBinaryStructIterator,
};
use crate::lazy::binary::raw::reader::LazyRawBinaryReader;
use crate::lazy::binary::raw::sequence::{LazyRawBinarySequence, RawBinarySequenceIterator};
use crate::lazy::binary::raw::sequence::{
LazyRawBinaryList, LazyRawBinarySExp, RawBinarySequenceIterator,
};
use crate::lazy::binary::raw::value::LazyRawBinaryValue;
use crate::lazy::decoder::private::{
LazyContainerPrivate, LazyRawFieldPrivate, LazyRawValuePrivate,
Expand All @@ -22,7 +24,9 @@ use crate::lazy::text::raw::r#struct::{
LazyRawTextField, LazyRawTextStruct, RawTextStructIterator,
};
use crate::lazy::text::raw::reader::LazyRawTextReader;
use crate::lazy::text::raw::sequence::{LazyRawTextSequence, RawTextSequenceIterator};
use crate::lazy::text::raw::sequence::{
LazyRawTextList, LazyRawTextSExp, RawTextListIterator, RawTextSExpIterator,
};
use crate::lazy::text::value::{LazyRawTextValue, RawTextAnnotationsIterator};
use crate::{IonResult, IonType, RawSymbolTokenRef};

Expand All @@ -36,7 +40,8 @@ pub struct AnyEncoding;
impl<'data> LazyDecoder<'data> for AnyEncoding {
type Reader = LazyRawAnyReader<'data>;
type Value = LazyRawAnyValue<'data>;
type Sequence = LazyRawAnySequence<'data>;
type List = LazyRawAnyList<'data>;
type SExp = LazyRawAnySExp<'data>;
type Struct = LazyRawAnyStruct<'data>;
type AnnotationsIterator = RawAnyAnnotationsIterator<'data>;
}
Expand Down Expand Up @@ -246,101 +251,200 @@ impl<'data> Iterator for RawAnyAnnotationsIterator<'data> {
}
}

// ===== Sequences ======
// ===== Lists ======

#[derive(Debug, Clone)]
pub struct LazyRawAnyList<'data> {
encoding: LazyRawListKind<'data>,
}

#[derive(Debug, Clone)]
pub enum LazyRawListKind<'data> {
Text_1_0(LazyRawTextList<'data>),
Binary_1_0(LazyRawBinaryList<'data>),
}

impl<'data> LazyContainerPrivate<'data, AnyEncoding> for LazyRawAnyList<'data> {
fn from_value(value: LazyRawAnyValue<'data>) -> Self {
match value.encoding {
LazyRawValueKind::Text_1_0(v) => LazyRawAnyList {
encoding: LazyRawListKind::Text_1_0(LazyRawTextList::from_value(v)),
},
LazyRawValueKind::Binary_1_0(v) => LazyRawAnyList {
encoding: LazyRawListKind::Binary_1_0(LazyRawBinaryList::from_value(v)),
},
}
}
}

pub struct RawAnyListIterator<'data> {
encoding: RawAnyListIteratorKind<'data>,
}

pub enum RawAnyListIteratorKind<'data> {
Text_1_0(RawTextListIterator<'data>),
Binary_1_0(RawBinarySequenceIterator<'data>),
}

impl<'data> Iterator for RawAnyListIterator<'data> {
type Item = IonResult<LazyRawAnyValue<'data>>;

fn next(&mut self) -> Option<Self::Item> {
match &mut self.encoding {
RawAnyListIteratorKind::Text_1_0(i) => i
.next()
.map(|value_result| value_result.map(|value| value.into())),
RawAnyListIteratorKind::Binary_1_0(i) => i
.next()
.map(|value_result| value_result.map(|value| value.into())),
}
}
}

impl<'data> LazyRawSequence<'data, AnyEncoding> for LazyRawAnyList<'data> {
type Iterator = RawAnyListIterator<'data>;

fn annotations(&self) -> <AnyEncoding as LazyDecoder<'data>>::AnnotationsIterator {
self.as_value().annotations()
}

fn ion_type(&self) -> IonType {
match &self.encoding {
LazyRawListKind::Text_1_0(s) => s.ion_type(),
LazyRawListKind::Binary_1_0(s) => s.ion_type(),
}
}

fn iter(&self) -> Self::Iterator {
match &self.encoding {
LazyRawListKind::Text_1_0(s) => RawAnyListIterator {
encoding: RawAnyListIteratorKind::Text_1_0(s.iter()),
},
LazyRawListKind::Binary_1_0(s) => RawAnyListIterator {
encoding: RawAnyListIteratorKind::Binary_1_0(s.iter()),
},
}
}

fn as_value(&self) -> LazyRawAnyValue<'data> {
match &self.encoding {
LazyRawListKind::Text_1_0(s) => (s.as_value()).into(),
LazyRawListKind::Binary_1_0(s) => (s.as_value()).into(),
}
}
}

impl<'data> From<LazyRawTextList<'data>> for LazyRawAnyList<'data> {
fn from(value: LazyRawTextList<'data>) -> Self {
LazyRawAnyList {
encoding: LazyRawListKind::Text_1_0(value),
}
}
}

impl<'data> From<LazyRawBinaryList<'data>> for LazyRawAnyList<'data> {
fn from(value: LazyRawBinaryList<'data>) -> Self {
LazyRawAnyList {
encoding: LazyRawListKind::Binary_1_0(value),
}
}
}

// ===== SExps =====

#[derive(Debug, Clone)]
pub struct LazyRawAnySequence<'data> {
encoding: LazyRawSequenceKind<'data>,
pub struct LazyRawAnySExp<'data> {
encoding: LazyRawSExpKind<'data>,
}

#[derive(Debug, Clone)]
pub enum LazyRawSequenceKind<'data> {
Text_1_0(LazyRawTextSequence<'data>),
Binary_1_0(LazyRawBinarySequence<'data>),
pub enum LazyRawSExpKind<'data> {
Text_1_0(LazyRawTextSExp<'data>),
Binary_1_0(LazyRawBinarySExp<'data>),
}

impl<'data> LazyContainerPrivate<'data, AnyEncoding> for LazyRawAnySequence<'data> {
impl<'data> LazyContainerPrivate<'data, AnyEncoding> for LazyRawAnySExp<'data> {
fn from_value(value: LazyRawAnyValue<'data>) -> Self {
match value.encoding {
LazyRawValueKind::Text_1_0(v) => LazyRawAnySequence {
encoding: LazyRawSequenceKind::Text_1_0(LazyRawTextSequence::from_value(v)),
LazyRawValueKind::Text_1_0(v) => LazyRawAnySExp {
encoding: LazyRawSExpKind::Text_1_0(LazyRawTextSExp::from_value(v)),
},
LazyRawValueKind::Binary_1_0(v) => LazyRawAnySequence {
encoding: LazyRawSequenceKind::Binary_1_0(LazyRawBinarySequence::from_value(v)),
LazyRawValueKind::Binary_1_0(v) => LazyRawAnySExp {
encoding: LazyRawSExpKind::Binary_1_0(LazyRawBinarySExp::from_value(v)),
},
}
}
}

pub struct RawAnySequenceIterator<'data> {
encoding: RawAnySequenceIteratorKind<'data>,
pub struct RawAnySExpIterator<'data> {
encoding: RawAnySExpIteratorKind<'data>,
}

pub enum RawAnySequenceIteratorKind<'data> {
Text_1_0(RawTextSequenceIterator<'data>),
pub enum RawAnySExpIteratorKind<'data> {
Text_1_0(RawTextSExpIterator<'data>),
Binary_1_0(RawBinarySequenceIterator<'data>),
}

impl<'data> Iterator for RawAnySequenceIterator<'data> {
impl<'data> Iterator for RawAnySExpIterator<'data> {
type Item = IonResult<LazyRawAnyValue<'data>>;

fn next(&mut self) -> Option<Self::Item> {
match &mut self.encoding {
RawAnySequenceIteratorKind::Text_1_0(i) => i
RawAnySExpIteratorKind::Text_1_0(i) => i
.next()
.map(|value_result| value_result.map(|value| value.into())),
RawAnySequenceIteratorKind::Binary_1_0(i) => i
RawAnySExpIteratorKind::Binary_1_0(i) => i
.next()
.map(|value_result| value_result.map(|value| value.into())),
}
}
}

impl<'data> LazyRawSequence<'data, AnyEncoding> for LazyRawAnySequence<'data> {
type Iterator = RawAnySequenceIterator<'data>;
impl<'data> LazyRawSequence<'data, AnyEncoding> for LazyRawAnySExp<'data> {
type Iterator = RawAnySExpIterator<'data>;

fn annotations(&self) -> <AnyEncoding as LazyDecoder<'data>>::AnnotationsIterator {
todo!()
self.as_value().annotations()
}

fn ion_type(&self) -> IonType {
match &self.encoding {
LazyRawSequenceKind::Text_1_0(s) => s.ion_type(),
LazyRawSequenceKind::Binary_1_0(s) => s.ion_type(),
LazyRawSExpKind::Text_1_0(s) => s.ion_type(),
LazyRawSExpKind::Binary_1_0(s) => s.ion_type(),
}
}

fn iter(&self) -> Self::Iterator {
match &self.encoding {
LazyRawSequenceKind::Text_1_0(s) => RawAnySequenceIterator {
encoding: RawAnySequenceIteratorKind::Text_1_0(s.iter()),
LazyRawSExpKind::Text_1_0(s) => RawAnySExpIterator {
encoding: RawAnySExpIteratorKind::Text_1_0(s.iter()),
},
LazyRawSequenceKind::Binary_1_0(s) => RawAnySequenceIterator {
encoding: RawAnySequenceIteratorKind::Binary_1_0(s.iter()),
LazyRawSExpKind::Binary_1_0(s) => RawAnySExpIterator {
encoding: RawAnySExpIteratorKind::Binary_1_0(s.iter()),
},
}
}

fn as_value(&self) -> LazyRawAnyValue<'data> {
match &self.encoding {
LazyRawSequenceKind::Text_1_0(s) => (s.as_value()).into(),
LazyRawSequenceKind::Binary_1_0(s) => (s.as_value()).into(),
LazyRawSExpKind::Text_1_0(s) => (s.as_value()).into(),
LazyRawSExpKind::Binary_1_0(s) => (s.as_value()).into(),
}
}
}

impl<'data> From<LazyRawTextSequence<'data>> for LazyRawAnySequence<'data> {
fn from(value: LazyRawTextSequence<'data>) -> Self {
LazyRawAnySequence {
encoding: LazyRawSequenceKind::Text_1_0(value),
impl<'data> From<LazyRawTextSExp<'data>> for LazyRawAnySExp<'data> {
fn from(value: LazyRawTextSExp<'data>) -> Self {
LazyRawAnySExp {
encoding: LazyRawSExpKind::Text_1_0(value),
}
}
}

impl<'data> From<LazyRawBinarySequence<'data>> for LazyRawAnySequence<'data> {
fn from(value: LazyRawBinarySequence<'data>) -> Self {
LazyRawAnySequence {
encoding: LazyRawSequenceKind::Binary_1_0(value),
impl<'data> From<LazyRawBinarySExp<'data>> for LazyRawAnySExp<'data> {
fn from(value: LazyRawBinarySExp<'data>) -> Self {
LazyRawAnySExp {
encoding: LazyRawSExpKind::Binary_1_0(value),
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion src/lazy/binary/raw/reader.rs
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ mod tests {
let lazy_list = reader.next()?.expect_value()?.read()?.expect_list()?;
// Exercise the `Debug` impl
println!("Lazy Raw Sequence: {:?}", lazy_list);
let mut list_values = lazy_list.iter();
let mut list_values = lazy_list.sequence.iter();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ In the binary reader, the LazyRawBinaryList and LazyRawBinarySExp types each wrap an instance of a LazyRawBinarySequence helper type that houses their shared parsing logic.

assert_eq!(list_values.next().expect("first")?.ion_type(), IonType::Int);
assert_eq!(
list_values.next().expect("second")?.ion_type(),
Expand Down
Loading
Loading