Sync ReadLimiter #201

appaquet · 2020-09-11T13:57:16Z

As mentioned in #191 and #121, capnp::message::Reader<capnp::serialize::OwnedSegments> isn't Sync because of the Cell<u64> in the ReadLimiter.

This solution will only work with platforms that have AtomicU64 and only works from 1.34 onward. This modification could be put behind a feature if this affects any user of the crate.

Let me know !

dwrensha · 2020-09-12T17:39:06Z

Thanks!

Do you know which platforms support AtomicU64? This async-std pull request seems to suggest the Arm, MIPS, and Powerpc do not support it. I would prefer not to break capnp on those platforms.

dwrensha · 2020-09-12T17:43:07Z

capnp/src/private/arena.rs

    }

    #[inline]
    pub fn can_read(&self, amount: u64) -> Result<()> {
-        let current = self.limit.get();
-        if amount > current {
+        let read = self.read.fetch_add(amount, Ordering::Relaxed) + amount;


This potentially allows self.read to overflow and wrap around.

I thought about it, but it also return an error at first if it reaches the limit, so I would expect this to be an undefined behaviour to call it afterward.

My first implementation was a load + cas, but that would be definitely less performant.

We could also have a second AtomicBool that indicates that we reached the limit and return right away. That could potentially be over the limit, but decreases odds of overflow.

appaquet · 2020-09-12T18:27:10Z

Makes sense. I couldn't find the list of targets that this crate was tested against. Could be a good idea to add tests, or even just check compilation against them (ex: using cross) in CI.

How you would go about it ? We could have a different implementation for platforms that don't support it. It could also be a feature to support Sync that would need to be explicitly enabled, or fallback to non-sync implementation if it's not a supported platform.

Let me know !

dwrensha · 2020-09-13T16:28:40Z

Some options:

We could use some kind of conditional compilation (feature flag or platform detection) to control when the AtomicU64 is used. I worry that this approach might add unpleasant surprises for downstream users, with things becoming !Sync in unexpected situations.
Assuming that AtomicU32 is sufficiently widely supported, we could use it instead of AtomicU64. The downside is that the read limiter would then capped at 32 gigabytes, but, come to think of it, that doesn't seem too bad, as long as we provide an explicit way to opt out of read limiting. It seems reasonable to expect that all untrusted messages people care about will be smaller than that (though maybe there are some use cases that I have not imagined).
We could make read limiting customizable via a trait, perhaps folding it into the ReaderSegments trait. Then message::Reader<S> could be Sync or not depending on whether S is Sync. It's unclear to me exactly what the new API would look like, but this feels doable to me.

appaquet · 2020-09-13T17:46:17Z

Another option would be to use AtomicUsize instead. It has the benefit from being potentially of u64 size on 64 bits platforms while being the most portable type according to https://doc.rust-lang.org/std/sync/atomic/index.html.

In both AtomicU32 and AtomicUsize option, we will have to change the type of the limit in ReaderOptions to the appropriate type though, which is an API breaking change. It could be left to a u64 and checked at runtime to make sure it doesn't overflow the u32 or usize.

In the 3rd option, if I understand correctly, you'd move the read limiting responsibility to the ReaderSegments implementations? Or would you see it more like a ReaderSegments implementation that wraps an inner ReaderSegments but with the read limiting capability ? Nevertheless, I see this 3rd option as being probably more work and API breaking changes since the we're moving the limiting functionally out of the reader.

appaquet · 2020-09-22T23:13:22Z

@dwrensha Pushed an updated version that uses AtomicUsize behind a feature flag. I would have liked to use proper conditional compilation and compile the sync version when the AtomicUsize is available, but rust-lang/rust#32976 will need to be stabilized before. Radium could also be used as a replacement to AtomicUsize that fallbacks to a Cell<usize> if it isn't available, but I don't think the extra dependency is worth it for this sole usage.

dwrensha · 2020-09-23T01:26:20Z

Thanks!

Having thought about this some more, I think I prefer option 2 above, i.e. switching to AtomicU32. You are correct that it would be a breaking change, so it would require a version bump. We would probably want to wait until we have other pending breaking changes, so things can be batched together and we minimize version churn. I'm not sure when that would be, but historically it has been something like every six months.

If we want to quickly land a fix/workaround, then maybe your feature flag is the way to go. I hesitate because I don't like adding this kind of complexity, but maybe it's worth it.

Are you able to workaround the problem on your end by sharing an OwnedSegments or SegmentArray object rather than a message::Reader object?

appaquet · 2020-09-24T01:28:30Z

Unfortunately, sharing segments and creating a Reader for every access makes it impossible for me to have a proper lifetime on the message. When reading a message, its lifetime is tied to the Reader, not to the underlying segments, which means that I can't carry around that message or its data without keeping the Reader. Self-referential structs would have helped me here by providing me a way to wrap the Reader and my message together in a struct to be carried around.

dwrensha · 2020-10-03T18:40:27Z

I was curious about which targets support which atomics, so I tried cross compiling the following program on a sampling of targets:

#![no_std]

pub fn foo() {
    let x = core::sync::atomic::AtomicUsize::new(0);
    let x = core::sync::atomic::AtomicU64::new(0);
    let x = core::sync::atomic::AtomicU32::new(0);
}

targets that support `AtomicUsize`, `AtomicU64`, and `AtomicU32`

x86_64-unknown-linux-gnu
arm-unknown-linux-gnueabi
i686-unknown-linux-gnu
wasm32-unknown-unknown
aarch64-unknown-linux-gnu

targets that support `AtomicUsize` and `AtomicU32`, but not `AtomicU64`

arm-linux-androideabi
powerpc-unknown-linux-gnu
thumbv7em-none-eabihf

targets that do not support `AtomicUsize`, `AtomicU64`, or `AtomicU32`

riscv32i-unknown-none-elf

These results are more or less what I expected: while the most important targets support AtomicU64, there are some targets that we should care about that don't support it.

I'm not sure what's going on with riscv32i-unknown-none-elf.

appaquet · 2020-10-03T20:49:06Z

I think riscv32i-unknown-none-elf just doesn't support atomics in fact: https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/riscv32i_unknown_none_elf.rs#L20

Looks like Radium also requires disabling atomics for these targets: ferrilab/radium#3

That's why I think having a feature flag to disable the sync reader is the best option to support all cases correctly, at the expense of conditional code.

Let me know if you want me to switch to AtomicU32 instead of AtomicUsize in this PR. Otherwise, if you don't want to support sync readers, just let me know too. You're the maintainer afterall. I'll try to work around it on my side or perhaps find an alternative.

appaquet · 2020-11-19T14:05:24Z

@dwrensha What's your thoughts on this? Do you think we could get this done? Otherwise I need to plan a refactor or an alternative on my side since I can't publish without this.

dwrensha · 2020-11-21T14:12:07Z

My current feeling is that the best solution to this and a bunch of other problems will be to expose inner Arena type as a type parameter. So instead of having a foo::Reader<'a> as you do today, you would have a foo::Reader<'a, A>, where A: ReaderArena. That way, if your arena implementation is Sync, then the typesystem will remember it, and you can do the sharing-between-threads that you need to. Moreover, we will be able to avoid a bunch of dynamic dispatch, which seems to be slowing down performance, and we'll be able make alignment checking configurable via the trait system, rather than through feature flags.

The main obstacle is that getting this to work (particularly with capnp::traits::Owned) probably requires generic associated types, and it's unclear when those will land on stable rust.

In the meantime, it could make sense to land a targeted temporary fix, like what you've proposed here. I don't like adding complexity when it seems like there is a simplifying solution within reach, but maybe it's worth it in this case.

appaquet · 2020-11-22T14:23:27Z

Sounds good for the arena as a type parameter. Eager to see GATs landing too at some point, as it will also help simplify some code on my side too.

As for this temporary fix, from my opinion, since it's pretty constrained and isn't exposed to the users (other than the new feature), I think it's a good trade-off to get the reader Sync, but I'm probably biased ;) Let me know if there is anything you'd like me to change in my changes, or feel free to do them if it can get them to land earlier.

dwrensha · 2020-11-26T15:31:26Z

capnp/src/private/read_limiter.rs

+
+        #[inline]
+        pub fn can_read(&self, amount: usize) -> Result<()> {
+            let read = self.read.load(Ordering::Relaxed) + amount;


It looks to me like this can overflow if self.read.load(Ordering::Relaxed) + amount > core::usize::MAX.

Why not write this is the same way as the non-Sync version, with a single limit: AtomicUsize field that decreases on each call? I think overflows and underflows are easier to avoid in that case.

Well, unless I don't see an obvious solution here, the way it's done in the non-sync version is possible because it's single threaded. In a multi-threaded version, I need to make sure that the limit underflowing is handled correctly. If I was to get then sub without checking underflow, the limit would wrap around and allows the next readers to read usize::MAX.

I think the version I pushed is better and handle this better than my previous version. I use an extra flag to indicate that the limit has been reached when the limit got to 0 or when it underflows.

dwrensha · 2020-11-26T15:32:23Z

capnp/Cargo.toml

@@ -25,7 +25,7 @@ quickcheck = { version = "0.9", optional = true }
 quickcheck = "0.9"

 [features]
-default = ["std"]
+default = ["std", "sync_reader"]


I would prefer to make the new feature not a default feature, so that we minimize the chance of breaking anyone's code.

dwrensha · 2020-11-26T15:41:30Z

capnp/src/private/read_limiter.rs

+
+    pub struct ReadLimiter {
+        pub limit: usize,
+        pub read: AtomicUsize,


Making this an AtomicU64 would more closely match the current implementation, which uses a u64. I'm worried that someone might be setting the current limit to a large u64 value, and when this new feature gets enabled their code will break.

The issue with AtomicU64, though, is that it is supported on fewer platforms. I don't see a nice solution here...

dwrensha · 2020-11-28T00:25:19Z

capnp/src/private/read_limiter.rs

+            }
+
+            let prev_limit = self.limit.fetch_sub(amount, Ordering::Relaxed);
+            if prev_limit == amount {


Is this branch necessary? When prev_limit == amount, the current limit will be 0, and therefore we'll get an error on the next attempt to read.

You're right, it's not necessary

dwrensha · 2020-11-28T00:26:56Z

capnp/src/private/read_limiter.rs

+            if prev_limit == amount {
+                self.limit_reached.store(true, Ordering::Relaxed);
+            } else if prev_limit < amount {
+                self.limit_reached.store(true, Ordering::Relaxed);


If the goal is to cause future reads to fail too, then I think we could do

set.fetch_add(amount, Ordering::Relaxed)

here and not need to worry about adding a limit_reached flag.

Won't this just resets the value to what it was right before the fetch_sub ?

If we want to use a single variable, we could set the value to 0 here when fully consumed (since the new value after fetch_sub may have underflow) and check if we have enough remaining limit before reading on each call.

appaquet · 2020-12-13T17:33:16Z

Friendly ping @dwrensha. Let me know if there is anything I can do to land this faster :)

dwrensha · 2020-12-13T22:56:48Z

I would like get this pull request landed, but I think we first need to resolve the usize vs u64 discrepancy. Imagine if a crate today is setting traversal_limit_in_words to u64::MAX to effectively mean "no traversal limit". If the sync_reader feature is enabled, the crate will start to panic on 32-bit systems in the ReadLimiter::new() function. Note that crates don't always get to control when features are enabled -- if the crate imports some other library that uses capnp, and that other library enables the feature, then the feature gets globally enabled for the full crate build.

I think a satisfactory path forward would be:

make ReaderOptions::traversal_limit_in_words have type Option<usize>, where None means "no limit".
when the sync_reader feature flag is not enabled, ReadLimiter has a limit: Cell<usize> field.
when the sync_reader feature flag is enabled, ReadLimiter instead has a limit: AtomicUsize field.
ReadLimiter also gets a panic_on_limit_exceeded: bool field, for when traversal_limit_in_words is None
To avoid excessive branching, limit always gets updated on can_read(). panic_on_limit_exceeded is only checked when the limit is exceeded.
We bump the release version to 0.14, because this is a breaking change.

How does that sound?

Relatedly, I've been working on the arena type parameter approach (work in progress in #210), and while I've made some progress and learned some things, I would no longer describe a solution as "within reach" using that approach, because there are a lot of details to be worked out, and things generally look complicated.

appaquet · 2020-12-15T02:02:43Z

I agree for the u64 vs usize problems for 32-bit systems. The initial proposal was to prevent breaking changes, but I agree that making things right is more beneficial that the cost of the breaking change.

Steps 1-2-3 makes sense.

Step 4-5 are confusing me here. Which limit are we comparing against if the traversal_limit_in_words is none? usize::MAX? Shouldn't the semantic of a None value in ReaderOptions::traversal_limit_in_words rather be that the read limiting is disabled completely? Also, should we panic or simply return an error?

Quite massive refactor you have in #210. I like the removal of the higher-ranked trait bounds. I also don't think it is within reach because there is no clear roadmap for when GATs will actually be stabilized. There are still a few issues to be tackled according to the tracking issue.

dwrensha · 2020-12-15T02:28:38Z

Shouldn't the semantic of a None value in ReaderOptions::traversal_limit_in_words rather be that the read limiting is disabled completely?

To an outside observer, yes. What I'm trying to describe is an optimization.

Sorry, panic_on_limit_exceeded is not the right name for the field. I don't actually intend to panic. Better would be error_on_limit_exceeded: bool.
For the non-Sync case, I'm imagining something like this:

    pub fn can_read(&self, amount: usize) -> Result<()> {
        let current = self.limit.get();
        if amount > current && self.error_on_limit_exceeded {
            Err(Error::failed(format!("read limit exceeded")))
        } else {
            self.limit.set(current.wrapping_sub(amount));
            Ok(())
        }
    }

So when no limit is enabled, the self.limit variable still gets updated, it just never causes an error.
The point is to avoid having two branches on every call to can_read() when a limit is enabled.

Compare to this naive version, where self.limit stores a Cell<Option<usize>>:

    pub fn can_read(&self, amount: usize) -> Result<()> {
        if let Some(current) = self.limit.get() {
            if amount > current {
                Err(Error::failed(format!("read limit exceeded")))
            } else {
                self.limit.set(current - amount);
                 Ok(())
            }
        }
     }

(Of course, it's possible that this optimization won't have a noticeable effect. It seems easy enough, though, to be worth doing.)

appaquet · 2020-12-15T02:38:05Z

I'm wondering if the sync version should do the same though. The atomic operations are probably most costly than having the extra branching. So disabling the limit should probably try to avoid the atomic load and store.

Which benchmark do you think would be impact the most here? I could probably bench the 2 alternatives.

dwrensha · 2020-12-15T03:36:46Z

I suppose I mostly care about avoiding performance regression on the non-sync version.

You could try run_all_benchmarks in the benchmark folder or https://github.com/llogiq/serdebench.

dwrensha · 2020-12-15T03:37:49Z

(also, fyi, I've bumped the version numbers in preparation for landing this and other breaking changes: f40b6dc)

dwrensha · 2020-12-19T05:20:37Z

Thanks for the pull request and discussion! I am going to merge this and then add a few changes.

appaquet · 2020-12-19T12:45:56Z

Sorry for the radio silence. Was planning to do the changes over the weekend. But I'm happy that this got merged!

Saw the other changes you merged on master too. Can't wait for the release!

dwrensha · 2020-12-19T13:07:19Z

I looked at a few benchmarks myself. My findings were:

fetch_sub() was significantly slower than sequentially doing load() then store():

capnproto-rust/capnp/src/private/read_limiter.rs

Lines 54 to 71 in 37e7110

    
           pub fn can_read(&self, amount: usize) -> Result<()> { 
        
               // We use separate AtomicUsize::load() and AtomicUsize::store() steps, which may 
        
               // result in undercounting reads if multiple threads are reading at the same. 
        
               // That's okay -- a denial of service attack will eventually hit the limit anyway. 
        
               // 
        
               // We could instead do a single fetch_sub() step, but that seems to be slower. 
        
               let current = self.limit.load(Ordering::Relaxed); 
        
               if amount > current && self.error_on_limit_exceeded { 
        
                   return Err(Error::failed(format!("read limit exceeded"))); 
        
               } else { 
        
                   // The common case is current >= amount. Note that we only branch once in that case. 
        
                   // If we combined the fields into an Option<AtomicUsize>, we would 
        
                   // need to branch twice in the common case. 
        
                   self.limit.store(current.wrapping_sub(amount), Ordering::Relaxed); 
        
               } 
        
               Ok(()) 
        
           }

.

I was unable to detect a performance difference between Option<AtomicUsize> and AtomicUsize,bool. I went with the latter because I feel like minimizing branches is important.

You are welcome to look into benchmarks too and to propose changes.

My plan is to release 0.14 later today or tomorrow.

Atomic read limiter

55729c9

dwrensha reviewed Sep 12, 2020

View reviewed changes

Use AtomicUsize for read limiter + feature gate it

04b2110

dwrensha reviewed Nov 26, 2020

View reviewed changes

Atomic read limiter: better logic

f6c86c7

dwrensha reviewed Nov 28, 2020

View reviewed changes

Atomic read limiter: better logic

25f9948

dwrensha mentioned this pull request Dec 13, 2020

Work in progress (do not land): expose arena as type parameter #210

Open

dwrensha merged commit 83c76be into capnproto:master Dec 19, 2020

compiler-errors mentioned this pull request Jul 3, 2022

Stabilize generic associated types rust-lang/rust#96709

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync ReadLimiter #201

Sync ReadLimiter #201

appaquet commented Sep 11, 2020

dwrensha commented Sep 12, 2020

dwrensha Sep 12, 2020

appaquet Sep 12, 2020

appaquet commented Sep 12, 2020

dwrensha commented Sep 13, 2020

appaquet commented Sep 13, 2020

appaquet commented Sep 22, 2020

dwrensha commented Sep 23, 2020

appaquet commented Sep 24, 2020

dwrensha commented Oct 3, 2020

appaquet commented Oct 3, 2020

appaquet commented Nov 19, 2020

dwrensha commented Nov 21, 2020

appaquet commented Nov 22, 2020

dwrensha Nov 26, 2020

appaquet Nov 27, 2020

dwrensha Nov 26, 2020

dwrensha Nov 26, 2020

dwrensha Nov 26, 2020

dwrensha Nov 28, 2020

appaquet Nov 28, 2020

dwrensha Nov 28, 2020

appaquet Nov 28, 2020

appaquet commented Dec 13, 2020

dwrensha commented Dec 13, 2020

appaquet commented Dec 15, 2020

dwrensha commented Dec 15, 2020 •

edited

appaquet commented Dec 15, 2020

dwrensha commented Dec 15, 2020

dwrensha commented Dec 15, 2020

dwrensha commented Dec 19, 2020

appaquet commented Dec 19, 2020 •

edited

dwrensha commented Dec 19, 2020

Sync ReadLimiter #201

Sync ReadLimiter #201

Conversation

appaquet commented Sep 11, 2020

dwrensha commented Sep 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

appaquet commented Sep 12, 2020

dwrensha commented Sep 13, 2020

appaquet commented Sep 13, 2020

appaquet commented Sep 22, 2020

dwrensha commented Sep 23, 2020

appaquet commented Sep 24, 2020

dwrensha commented Oct 3, 2020

targets that support AtomicUsize, AtomicU64, and AtomicU32

targets that support AtomicUsize and AtomicU32, but not AtomicU64

targets that do not support AtomicUsize, AtomicU64, or AtomicU32

appaquet commented Oct 3, 2020

appaquet commented Nov 19, 2020

dwrensha commented Nov 21, 2020

appaquet commented Nov 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

appaquet commented Dec 13, 2020

dwrensha commented Dec 13, 2020

appaquet commented Dec 15, 2020

dwrensha commented Dec 15, 2020 • edited

appaquet commented Dec 15, 2020

dwrensha commented Dec 15, 2020

dwrensha commented Dec 15, 2020

dwrensha commented Dec 19, 2020

appaquet commented Dec 19, 2020 • edited

dwrensha commented Dec 19, 2020

targets that support `AtomicUsize`, `AtomicU64`, and `AtomicU32`

targets that support `AtomicUsize` and `AtomicU32`, but not `AtomicU64`

targets that do not support `AtomicUsize`, `AtomicU64`, or `AtomicU32`

dwrensha commented Dec 15, 2020 •

edited

appaquet commented Dec 19, 2020 •

edited