Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements writing delimited and length-prefixed sequence types #701

Merged
merged 13 commits into from
Feb 8, 2024

Conversation

zslayton
Copy link
Contributor

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor Author

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ PR tour

let untyped_ptr: *mut () = ptr.cast();
untyped_ptr
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ These methods were redundantly defined in two different modules, so I created a private unsafe_helpers module to house them.

// container.
// 2. a delimited container, this will be the parent's own encoding buffer, to which the delimited
// container start opcode has already been written.
buffer: &'value mut BumpVec<'top, u8>,
}

impl<'value, 'top> BinaryContainerWriter_1_1<'value, 'top> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ I did some refactoring of lifetimes that allowed me to remove some methods/types.

Previously, each container writer had two lifetime parameters and could instantiate an affiliated type with a single lifetime parameter to be used as a closure argument.

  • BinaryContainerWriter_1_1<'value, 'top> -> BinaryContainerValuesWriter_1_1<'value>
  • BinaryListWriter_1_1<'value, 'top> -> BinaryListValuesWriter_1_1<'value>
  • BinarySExpWriter_1_1<'value, 'top> -> BinarySExpValuesWriter_1_1<'value>
  • BinarStructWriter_1_1<'value, 'top> -> BinaryStructFieldsWriter_1_1<'value>

Now, the two-lifetime types can be used as the closure argument themselves which simplifies things quite a bit.

@@ -24,6 +22,7 @@ use crate::{
pub struct BinaryValueWriter_1_1<'value, 'top> {
allocator: &'top BumpAllocator,
encoding_buffer: &'value mut BumpVec<'top, u8>,
delimited_containers: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ Where previously I was using type families to statically express different configurations (e.g. with/without annotations), I'm going to start moving towards branching on stored values.

pub fn with_delimited_containers(mut self) -> Self {
self.delimited_containers = true;
self
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This builder-style config method allows users to write code like:

writer
  .value_writer()
  .with_delimited_containers()
  .write_sexp(|sexp| {
    for text in str_array {
      sexp.write_string(text)?;
    }
  })?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you start using delimited containers, and then you want to write a non-delimited container, how would you do that? E.g.: I want to write a struct where one field is a delimited list and another field is a length-prefixed list; or I want to write a delimited list that contains one or more length-prefixed s-expressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each value gets its own ValueWriter; they're single-use. In your example, a value writer would be created for the outer struct, and another would be created for each of the nested field values, allowing you to configure the encoding options for each one separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I want to write a delimited list that contains one or more length-prefixed s-expressions

writer
  .value_writer()
  .with_delimited_containers()
  .write_list(|list| {
    list
      // The type's `WriteAsIon` impl will use the default ValueWriter configuration (currently
      // always length-prefixed)
      .write([1, 2, 3].as_sexp())?
      // Also uses the default ValueWriter configuration (currently always length-prefixed)
      .write_sexp(|sexp| sexp.write(1)?.write(2)?.write(3)?)?
      // Allows you to explicitly configure the value writer
      .value_writer()
        .with_delimited_containers()
        .write([1, 2, 3].as_sexp())?;
    Ok(())
  })?;

The above would produce a delimited list that contains two length-prefixed s-expressions and a delimited s-expression.

>(
mut self,
sexp_fn: F,
) -> IonResult<()> {
self.sexp_writer().write_values(sexp_fn)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ While it was very tempting to try and write a generic "sequence" implementation for the list/sexp impls to call, in practice I found that it required a lot of generics and ended up being not much shorter. I decided to just give each their own impl for simplicity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an idea, not a request for a change.
If you really want to DRY it up, I think it should work to do something like this:

    fn write_length_prefixed_list<
        F: for<'a> FnOnce(&mut <Self as ValueWriter>::ListWriter<'a>) -> IonResult<()>,
    >(
        mut self,
        list_fn: F,
    ) -> IonResult<()> {
        const LIST_NIBBLE: u8 = 0xA;
        self.write_length_prefixed_sequence(LIST_NIBBLE, |container_writer| {
            // Create a BinaryListWriter_1_1 to pass to the user's closure.
            list_fn(&mut BinaryListWriter_1_1::new(container_writer))
        })
    }

    fn write_length_prefixed_sexp<
        F: for<'a> FnOnce(&mut <Self as ValueWriter>::SExpWriter<'a>) -> IonResult<()>,
    >(
        mut self,
        sexp_fn: F,
    ) -> IonResult<()> {
        const SEXP_NIBBLE: u8 = 0xB;
        self.write_length_prefixed_sequence(SEXP_NIBBLE, |container_writer| {
            // Create a BinarySExpWriter_1_1 to pass to the user's closure.
            sexp_fn(&mut BinarySExpWriter_1_1::new(container_writer))
        })
    }

    fn write_length_prefixed_sequence<F: FnOnce(BinaryContainerWriter_1_1) -> IonResult<()>>(mut self, container_nibble: u8, container_fn: F) -> IonResult<()> {
        // We're writing a length-prefixed sexp/list, so we need to set up a space to encode the list's children.
        let child_encoding_buffer = self
            .allocator
            .alloc_with(|| BumpVec::new_in(self.allocator));
        let mut container_writer =
            BinaryContainerWriter_1_1::new(self.allocator, child_encoding_buffer);

        // Pass it to the closure, allowing the user to encode child values.
        container_fn(container_writer)?;

        // Write the appropriate opcode for a sexp of this length
        let encoded_length = container_writer.buffer().len();
        match encoded_length {
            0..=15 => {
                let opcode = (container_nibble << 4) | encoded_length as u8;
                self.push_byte(opcode);
            }
            _ => {
                let opcode = 0xF | container_nibble; // SExp or List w/FlexUInt length
                self.push_byte(opcode);
                FlexUInt::write_u64(self.encoding_buffer, encoded_length as u64)?;
            }
        }
        self.encoding_buffer
            .extend_from_slice(sexp_writer.container_writer.buffer());
        Ok(())
    }

type StructWriter<'a> = BinaryStructFieldsWriter_1_1<'a>;
type ListWriter<'a> = BinaryListWriter_1_1<'value, 'top>;
type SExpWriter<'a> = BinarySExpWriter_1_1<'value, 'top>;
type StructWriter<'a> = BinaryStructWriter_1_1<'value, 'top>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ The aforementioned change in closure argument types.

let untyped_ptr: *mut () = ptr.cast();
untyped_ptr
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗺️ This is the second file where these methods were defined; they were removed and usages updated to use the unsafe_helpers module.

pub fn with_delimited_containers(mut self) -> Self {
self.delimited_containers = true;
self
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you start using delimited containers, and then you want to write a non-delimited container, how would you do that? E.g.: I want to write a struct where one field is a delimited list and another field is a length-prefixed list; or I want to write a delimited list that contains one or more length-prefixed s-expressions?

>(
mut self,
sexp_fn: F,
) -> IonResult<()> {
self.sexp_writer().write_values(sexp_fn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an idea, not a request for a change.
If you really want to DRY it up, I think it should work to do something like this:

    fn write_length_prefixed_list<
        F: for<'a> FnOnce(&mut <Self as ValueWriter>::ListWriter<'a>) -> IonResult<()>,
    >(
        mut self,
        list_fn: F,
    ) -> IonResult<()> {
        const LIST_NIBBLE: u8 = 0xA;
        self.write_length_prefixed_sequence(LIST_NIBBLE, |container_writer| {
            // Create a BinaryListWriter_1_1 to pass to the user's closure.
            list_fn(&mut BinaryListWriter_1_1::new(container_writer))
        })
    }

    fn write_length_prefixed_sexp<
        F: for<'a> FnOnce(&mut <Self as ValueWriter>::SExpWriter<'a>) -> IonResult<()>,
    >(
        mut self,
        sexp_fn: F,
    ) -> IonResult<()> {
        const SEXP_NIBBLE: u8 = 0xB;
        self.write_length_prefixed_sequence(SEXP_NIBBLE, |container_writer| {
            // Create a BinarySExpWriter_1_1 to pass to the user's closure.
            sexp_fn(&mut BinarySExpWriter_1_1::new(container_writer))
        })
    }

    fn write_length_prefixed_sequence<F: FnOnce(BinaryContainerWriter_1_1) -> IonResult<()>>(mut self, container_nibble: u8, container_fn: F) -> IonResult<()> {
        // We're writing a length-prefixed sexp/list, so we need to set up a space to encode the list's children.
        let child_encoding_buffer = self
            .allocator
            .alloc_with(|| BumpVec::new_in(self.allocator));
        let mut container_writer =
            BinaryContainerWriter_1_1::new(self.allocator, child_encoding_buffer);

        // Pass it to the closure, allowing the user to encode child values.
        container_fn(container_writer)?;

        // Write the appropriate opcode for a sexp of this length
        let encoded_length = container_writer.buffer().len();
        match encoded_length {
            0..=15 => {
                let opcode = (container_nibble << 4) | encoded_length as u8;
                self.push_byte(opcode);
            }
            _ => {
                let opcode = 0xF | container_nibble; // SExp or List w/FlexUInt length
                self.push_byte(opcode);
                FlexUInt::write_u64(self.encoding_buffer, encoded_length as u64)?;
            }
        }
        self.encoding_buffer
            .extend_from_slice(sexp_writer.container_writer.buffer());
        Ok(())
    }

src/unsafe_helpers.rs Outdated Show resolved Hide resolved
Base automatically changed from write-timestamp to main February 8, 2024 14:06
@zslayton zslayton merged commit cac2784 into main Feb 8, 2024
20 checks passed
@zslayton zslayton deleted the write-sequences branch February 8, 2024 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants