Shared memory support for imaging #54

rpavlik · 2015-03-09T20:17:09Z

To avoid having to break up images into small chunks.

Current approach

Fixed-size circular queue of buffers, with sequence numbers. If a recipient wants to keep an image, it needs to copy it out. (Alternately, could use semaphores per-buffer to keep images around.)

Previous approach

Looking at a reference-counted, publish-subscribe model, with a per-subscriber message queue that can hold a reference-counting smart pointer to image buffers.

Subscriber registry:

Subscriber opens subscriber registry, adds self with unique ID.
Subscriber opens message buffer keyed by ID.
When subscriber exits, removes self from registry.

Right now, have an implementation of the subscriber registry using shared memory, but that doesn't correctly handle apps closing "badly" - something like a pipe would be better.

General flow - Publisher

On data send: publisher allocates block of the right size/alignment, managed by a smart pointer, and puts data in it, adding a ref. (ref count = 1)
Publisher get subscriber list and puts a message in each subscriber's buffer, with each owning a ref. (ref count = 1 + subscribercount)
Publisher releases its ref to that data. (ref count = 1 + subscribercount)

General flow - Subscriber

When subscriber updates, checks for messages in its buffer.
If one found, the reference to the data block is moved to the message handler and the message is handled (maintaining a proper reference count)

Detail/Complication

To maintain a proper reference count on the shared memory image blocks, we need each pending message to hold a reference (smart pointer) to the image block it corresponds to. However, then, we must make sure that each message is serviced somehow - either by an active application, or by some cleanup when that application exits - otherwise we'll exhaust our shared memory segment with images that nobody is really looking at.

russell-taylor · 2015-05-14T15:42:33Z

The complication seems like an extreme case of the general problem where the producer is sending faster than one of the consumers can handle messages. This may happen often in a chain of image analysis. This is a real problem in VRPN, where a sender can flood the network and the client can't ever get out of the message-handling loop -- there is an ugly hack there to only process a certain number of messages per mainloop() and then a slightly less ugly-source quench to slow the sender down. But source-quench doesn't work well when there are multiple clients.

Rather than trying to track each message and who cares about it, consider switching to a circular queue of fixed size for each published message queue. Include a sequence ID on the messages, so that the receivers can know when they have missed one or more messages because they were too slow. Then the client can choose always to request the most-recent image or to try and gobble up all of the ones that arrived since it last received if it doesn't want to miss any.

This is agnostic to the number of listeners and decouples tracking, but requires the receiver to copy the message out if it doesn't want it to disappear out from under it (or block the sender).

A hybrid approach that avoids the copy-out would be to have each client increase a semaphore on an image when it opens it and decrease it when it is done with it. If the server finds the next image it wants to write locked, it can allocate a new buffer for its circular queue, pull the image out of the queue, and leave the orphan to be cleaned up when the last client is done and reduces the semaphore. This reduces the memory leak to those buffers that an application had locked before it died rather than all un-handled images. Cleanup for that app could reduce the semaphore on all of its locked buffers. A watch-dog timer could also sweep through all semaphored images and get rid of ones that hung around longer than a specified timeout.

VRguy · 2015-05-14T15:44:51Z

I agree with the circular queue approach, which is what we previously
discussed. The size of the queue can be defined in the producer or
during the instantiation of the plugin

On 5/14/2015 11:42 AM, Russell Taylor wrote:

The complication seems like an extreme case of the general problem
where the producer is sending faster than one of the consumers can
handle messages. This may happen often in a chain of image analysis.
This is a real problem in VRPN, where a sender can flood the network
and the client can't ever get out of the message-handling loop --
there is an ugly hack there to only process a certain number of
messages per mainloop() and then a slightly less ugly-source quench to
slow the sender down. But source-quench doesn't work well when there
are multiple clients.

Rather than trying to track each message and who cares about it,
consider switching to a circular queue of fixed size for each
published message queue. Include a sequence ID on the messages, so
that the receivers can know when they have missed one or more messages
because they were too slow. Then the client can choose always to
request the most-recent image or to try and gobble up all of the ones
that arrived since it last received if it doesn't want to miss any.

This is agnostic to the number of listeners and decouples tracking,
but requires the receiver to copy the message out if it doesn't want
it to disappear out from under it (or block the sender).

A hybrid approach that avoids the copy-out would be to have each
client increase a semaphore on an image when it opens it and decrease
it when it is done with it. If the server finds the next image it
wants to write locked, it can allocate a new buffer for its circular
queue, pull the image out of the queue, and leave the orphan to be
cleaned up when the last client is done and reduces the semaphore.
This reduces the memory leak to those buffers that an application had
locked before it died rather than all un-handled images. Cleanup for
that app could reduce the semaphore on all of its locked buffers. A
watch-dog timer could also sweep through all semaphored images and get
rid of ones that hung around longer than a specified timeout.

—
Reply to this email directly or view it on GitHub
#54 (comment).

rpavlik · 2015-05-14T15:47:01Z

OK, that seems reasonable. I thought we had decided on reference-counting to reduce copying subsequent to discussing a circular queue. In any case, it is substantially easier, esp. in the absence of pipes.

russell-taylor · 2015-05-14T15:48:08Z

The hybrid approach avoids copying.

rpavlik · 2015-05-14T15:49:50Z

Sounds good to me - this is why team dev is good :D

rpavlik · 2015-05-21T23:27:22Z

OK, I have a functional prototype of this, with some todos and some commit squashing required, in the shared-memory branch. Successfully passing 640x480 video from my webcam via shared memory, with no modifications to the OSVR pluginkit or clientkit API required. (I changed the plugin to not scale down the image, but that was it.) CPU usage is very low, so that's promising.

Ended up using a "interprocess sharable mutex" for each of the ring buffer entries, and returning smart pointers that hold a sharable lock on the buffer entry they correspond to. (Couldn't quite work out how to use a semaphore here, but this seems effectively like a reader-writer lock so a logical choice)

A VRPN message is now sent with the image metadata, information on connecting to the shared memory, and a sequence number, after the server has copied the image into shared memory and released all of its locks. (If the image is small enough we still also send the old kind of message too)

Currently remaining issues;

Client access violation (crash) on shutdown, related to freeing of buffers. It's apparently succeeding enough to free the lock, since it's not blocking the server, but not acceptable. May need reference counting vs own/free. Have to dig into it more.
Using the "generic" backend (which I've chosen by default) - the client doesn't notice that the shared memory segment it's using is no longer the one being used by the server after a server restart (during operation).
Potential control of buffer size from the server/plugin side: right now, ring buffer gets allocated on the first image send, to the exact right size, and a hardcoded number of entries. (If the size changes at runtime, the segment is abandoned and we start a new one exactly the new size)
Client side ability to skip frames/only get the latest. The ring buffer has a method to get the latest, rather than a particular sequence number, which might be desired, though since the timestamp comes in the VRPN message that information would be missing.

rpavlik · 2015-05-21T23:30:00Z

(oh and multiple clients works fine, no apparent change in performance)

russell-taylor · 2015-05-22T06:54:01Z

You might try pulling the VRPN pull request dealing with shared memory on the vrpn_ConnectionManager. It causes multi-threaded code from another project to crash on exit sometimes when a bunch of threads stop at the same time. This may fix the crash on shutdown.

rpavlik · 2015-05-22T16:27:07Z

Well, the crash is not in VRPN code, so I'm pretty sure it's not that, but I will look at that.

rpavlik · 2015-05-26T22:47:04Z

OK, #131 integrates this: I've solved the crash on shutdown, but have left the other open items listed above.

russell-taylor · 2015-06-01T12:34:37Z

If the sequence number is included in both the VRPN message and the shared-memory message, then the application should be able to determine the appropriate timestamp and other metadata for a given message. OSVR interface may be able to keep a table of these and handle the matching if needed. Seems like it would be possible to have to delay the delivery of a shared-memory message until the corresponding VRPN description arrives.

rpavlik · 2015-06-01T16:17:05Z

Ah, but that's already what we're doing: the IPC ring buffer has a method "get by sequence number", which gets called when we get the VRPN message. (That's how the client is signalled that there is a frame - it doesn't do any regular checking of shared memory, avoids an IPC mutex that way) The open item was about the fact that we can also (internally, atomically, if we so desire) grab the most recent frame, not just the last frame we received a message about (get latest in addition to get by sequence number). Not sure if that's a meaningful difference.

rpavlik · 2015-11-06T22:47:13Z

Closing this since it's in and usable.

rpavlik added the enhancement label Mar 9, 2015

godbyk self-assigned this Apr 8, 2015

godbyk added the in progress label Apr 8, 2015

rpavlik modified the milestone: 1.0 May 8, 2015

rpavlik assigned rpavlik and unassigned godbyk May 14, 2015

rpavlik closed this as completed Nov 6, 2015

rpavlik removed the in progress label Nov 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared memory support for imaging #54

Shared memory support for imaging #54

rpavlik commented Mar 9, 2015

russell-taylor commented May 14, 2015

VRguy commented May 14, 2015

rpavlik commented May 14, 2015

russell-taylor commented May 14, 2015

rpavlik commented May 14, 2015

rpavlik commented May 21, 2015

rpavlik commented May 21, 2015

russell-taylor commented May 22, 2015

rpavlik commented May 22, 2015

rpavlik commented May 26, 2015

russell-taylor commented Jun 1, 2015

rpavlik commented Jun 1, 2015

rpavlik commented Nov 6, 2015

Shared memory support for imaging #54

Shared memory support for imaging #54

Comments

rpavlik commented Mar 9, 2015

Current approach

Previous approach

Subscriber registry:

General flow - Publisher

General flow - Subscriber

Detail/Complication

russell-taylor commented May 14, 2015

VRguy commented May 14, 2015

rpavlik commented May 14, 2015

russell-taylor commented May 14, 2015

rpavlik commented May 14, 2015

rpavlik commented May 21, 2015

rpavlik commented May 21, 2015

russell-taylor commented May 22, 2015

rpavlik commented May 22, 2015

rpavlik commented May 26, 2015

russell-taylor commented Jun 1, 2015

rpavlik commented Jun 1, 2015

rpavlik commented Nov 6, 2015