Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement FlatArrayMessageReader #25

Closed
ArtemGr opened this issue Feb 6, 2015 · 18 comments
Closed

implement FlatArrayMessageReader #25

ArtemGr opened this issue Feb 6, 2015 · 18 comments

Comments

@ArtemGr
Copy link

ArtemGr commented Feb 6, 2015

Hi! In http://youtu.be/A65w-qoyTYg?t=14m57s you say that reading from a Cap'n Proto message is comparable to a struct field access, or at least I've got that impression from your talk.
Now, testing it with a simple benchmark

#[bench] fn capnp_unpack (bencher: &mut Bencher) {
  bencher.bytes = 3;

  let encoded: Vec<u8> = {
    let mut builder = MallocMessageBuilder::new_default();
    { let mut cache_value = builder.init_root::<cache_value::Builder>();
      cache_value.set_value (b"foo"); }
    let mut buf = Vec::with_capacity (128);
    capnp::serialize::write_message (&mut buf, &builder) .unwrap();
    buf};

  bencher.iter (|&:| {
    let mut bytes = encoded.as_slice();
    let reader = capnp::serialize::new_reader (&mut bytes, capnp::ReaderOptions::new()) .unwrap();
    let value = reader.get_root::<cache_value::Reader>();
    assert! (value.get_value() == b"foo");
  });}

which uses the following schema

struct CacheValue {
  expires @0 :UInt64;
  value @1 :Data;
  tags @2 :List(Text);
}

shows that reading from a Cap'n Proto message is much more expensive than a struct access.
I get 3931 ns/iter from it.
(In fact, the LMDB database where I keep this value is faster at returning it from the database (616 ns/iter!) than Cap'n Proto is at decoding it).

Theoretically I'd imagine that the decoding should involve just some memory accesses, but examining the new_reader code I see a couple of Vec allocations there.

So,
Am I doing something wrong?
Is it a state of the art Cap'n Proto performance or will it be improved?

@dwrensha
Copy link
Member

dwrensha commented Feb 6, 2015

Those allocations are indeed probably the most expensive part of what you're doing. It should be possible to avoid them, although I admit that the interfaces I've provided don't make it obvious how to do so.

To avoid any allocations, you want to use a SegmentArrayMessageReader. That'll be easiest if your messages are always a single segment. If you have an upper bound n on the size of your messages, you can ensure that they are all a single segment by using BuilderOptions with first_segment_words equal to n.

You should write your messages using MessageBuilder::get_segments_for_output(), which just gives you the raw segments. When you later want to read the message back in, you should supply the segments exactly as you got them.

For this to work, you'll probably need to be able convert between &[u8] and &[Word]. I think you'll need to resort to an unsafe cast for this. (Hm... I should probably provide functions that do this conversion for you.) For some example code that does a similar conversion, see this zeromq converter (which may be somewhat outdated).

@ArtemGr
Copy link
Author

ArtemGr commented Feb 6, 2015

Thanks for the pointers. Sounds complicated.
Any chance capnp::serialize::new_reader will work without those allocations some day or are they required by the protocol design?

@dwrensha
Copy link
Member

dwrensha commented Feb 6, 2015

The allocations are not required by the protocol; they are required because we need a place to put the data.

capnp::serialize::new_reader converts an input stream to an MessageReader that owns its segments, so it needs to be able to allocate memory for those segments.

We could avoid some allocation by implementing a ScratchSpaceMessageReader, similar to what we currently have for ScratchSpaceMallocMessageBuilder. It would work by borrowing a slice of memory and using that space for its first segment. If you are processing message in sequence, you'd be able to reuse that memory between messages. If you have an upper bound on message size, this could allow you to avoid most of the allocations you would otherwise need.

@ArtemGr
Copy link
Author

ArtemGr commented Feb 6, 2015

If you are processing message in sequence, you'd be able to reuse that memory between messages.

I imagine it will be possible to avoid the allocation by using a stack-allocated array, eh?

ScratchSpaceMessageReader looks like what we need! Should I leave this request open till then?

@dwrensha
Copy link
Member

dwrensha commented Feb 6, 2015

Yes. Leave this open until we've got something working for you. :)

@ArtemGr
Copy link
Author

ArtemGr commented Apr 5, 2015

capnp::serialize::new_reader converts an input stream to an MessageReader that owns its segments, so it needs to be able to allocate memory for those segments.

Isn't one of the promises of Cap'n Proto protocol that one can mmap a message and never lift a finger at deserialization, because the bytes are in the needed format already?

(There's a lot about it on https://capnproto.org/: no encoding/decoding step, mmap, random access).

I take it the Rust version doesn't really implement that part?

@dwrensha
Copy link
Member

dwrensha commented Apr 5, 2015

If you have a &[&[Word]] representing your message's segments, you can use it to create a new SegmentArrayMessageReader, which won't need to allocate new space for the segments.

The part that's missing right now in capnproto-rust is something that takes a &[u8] representing a serialized message, reads the segment table, and slices it into a &[&[Word]]. This is basically what is done by FlatArrayMessageReader in the C++ implementation. I think this would be fairly straightforward to implement. Maybe you'd like to take it on? :)

@ArtemGr
Copy link
Author

ArtemGr commented Apr 5, 2015

Yup, I'd like to try. Will check the FlatArrayMessageReader, thanks.

@ArtemGr
Copy link
Author

ArtemGr commented Apr 6, 2015

Implemented in userspace:

#[derive(Debug)] pub enum FlatArrayMessageReaderError {
  MessageTooShort,
  MessageEndsInTable,
  MessageEndsPrematurely}

/// Unpacking the Cap'n Proto stream serialization.
///
/// cf. https://capnproto.org/encoding.html, "Serialization Over a Stream"
pub fn flat_array_message_reader<V,R> (bytes: &[u8], mut visitor: V) -> Result<R, FlatArrayMessageReaderError>
where V: FnMut(&[&[capnp::Word]])->R {
  use capnp::Word;
  use capnp::private::endian::WireValue;
  use std::mem::{transmute, size_of, uninitialized};
  use std::slice::from_raw_parts;

  // Since zero segments can not be encoded we assume there is at least one segment,
  // therefore the message is at least 8 bytes: segment count + the size of the first segment.
  if bytes.len() < 8 {return Err (FlatArrayMessageReaderError::MessageTooShort)}

  let array = || -> *const Word {unsafe {transmute (bytes.as_ptr())}};
  let table = || -> *const WireValue<u32> {unsafe {transmute (bytes.as_ptr())}};
  let segment_count = unsafe {(*table()).get()} + 1;
  let mut offset = segment_count / 2 + 1;
  if bytes.len() < offset as usize * size_of::<Word>() {return Err (FlatArrayMessageReaderError::MessageEndsInTable)}

  let size_of_segment = |segment_number| {unsafe {(*table().offset (segment_number as isize + 1)).get()}};
  let slice_at = |offset, segment_size| {
    if bytes.len() < (offset + segment_size) as usize * size_of::<Word>() {return Err (FlatArrayMessageReaderError::MessageEndsPrematurely)}
    Ok (unsafe {from_raw_parts (array().offset (offset as isize), segment_size as usize)})};

  if segment_count == 1 {
    let segment0 = try! (slice_at (offset, size_of_segment (0)));
    Ok (visitor (&[segment0]))
  } else if segment_count <= 64 {
    let mut segments: [&[Word]; 64] = unsafe {uninitialized()};
    for i in 0 .. segment_count {
      let segment_size = size_of_segment (i);
      let segment = try! (slice_at (offset, segment_size));
      segments[i as usize] = segment;
      offset += segment_size;}
    Ok (visitor (&segments[0 .. segment_count as usize]))
  } else {
    let mut segments: Vec<&[Word]> = Vec::with_capacity (segment_count as usize);
    for i in 0 .. segment_count {
      let segment_size = size_of_segment (i);
      let segment = try! (slice_at (offset, segment_size));
      segments.push (segment);
      offset += segment_size;}
    Ok (visitor (&segments[..]))
  }}

On a very small message I see a speedup from 243 ns/iter to 160 ns/iter.

@ArtemGr
Copy link
Author

ArtemGr commented Apr 9, 2015

BTW, I've been thinking how one would go about creating an object, e.g. FlatArrayMessageReader, instead of using the visitor. Problem is, I'd like to keep the one-segment path and small-number-of-segments path to have different stack requirements, e.g. just [&[Word]; 1] for one segment and [&[Word]; 64] for a small number of segments, but I'm not sure one can do this without resorting to boxing or some other kind of memory allocation.

Am I missing something?

Would the [&[Word]; 64] version be good enough for the one-segment path?
(Speed-wise it is, what I'm asking is whether it's okay that the stack is wasted?)

P.S. One obvious approach is to teach capnproto-rust to use some special type instead of &[&[Word]], then we could simply use the memory layout as it is, without having to copy segment locations to an array of slices. But I won't take this long path.

@dwrensha
Copy link
Member

Hm. Your implementation and comments got me thinking. Maybe we should change ReaderArena::new() to take a &[Option<&[Word]>] instead of a &[&[Word]]. Then we could have something like:


impl <'a> FlatArrayMessageReader<'a> {
   pub fn new<'b>(message : &'b [Word],
                  space_for_segment_array: &'b mut [Option<&[Word]>],
                  overflow_space_for_segment_array : &mut Vec<Option<&[Word]>>)
                 -> FlatArrayMessageReader<'b> {
      // ...
    }
}

And it could be used like this:

let space = [None, None, None]; // Caller chooses how much space will suffice for the typical case.
let overflow_space = Vec::new();
let reader = FlatArrayMessageReader::new(message, &mut space, &mut overflow_space);
reader.get_root::<foo::Reader>();
// ...

This would avoid unnecessary allocations and retain maximal flexibility, but it might be somewhat cumbersome to use.

Thoughts?

@dwrensha
Copy link
Member

I think the lifetimes might not quite work out in the proposal I sketched in the last comment. But it seems like there ought to be some way to fix that...

@ArtemGr
Copy link
Author

ArtemGr commented Apr 10, 2015 via email

@dwrensha dwrensha changed the title Decoding speed? implement FlatArrayMessageReader Jul 9, 2015
@dwrensha
Copy link
Member

My recent changes in b1c54c9 should make this significantly easier.

I think that it's now straightforward to implement something like this:

fn read_message_from_flat_array<'a>(&'a [Word]) -> message::Reader<message::SegmentArray<'a>> {
   // ...
}

@ArtemGr
Copy link
Author

ArtemGr commented Jul 16, 2015

It's much nicer!

I'm using it the old closure-way still:

    let bytes = try! (keys.get::<&[u8]> (&hash));
    try! (flat_array_message_reader (bytes, |words| {
      let reader = capnp::message::Reader::new (capnp::message::SegmentArray::new (words), capnp::message::ReaderOptions::new());

Should look into spawning the Reader. Thanks.

@dwrensha
Copy link
Member

I've pushed some changes that include a new function ::capnp::serialize::read_message_from_words(). It's probably not yet quite as efficient as your flat_array_message_reader() above, but it should be straightfoward to optimize it if desired. @ArtemGr, if you end up trying out the new function, I'd be curious to hear about how its performance compares to what you're doing now.

@dwrensha
Copy link
Member

dwrensha commented Feb 4, 2017

Closing, because I think the problems brought up here have long since been solved. @ArtemGr, please feel free to open a new issue if you are still having problems.

@dwrensha dwrensha closed this as completed Feb 4, 2017
@ArtemGr
Copy link
Author

ArtemGr commented Feb 6, 2017

NP! Sorry I wasn't able to make a FlatArrayMessageReader in a timely manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants