New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsound uses of Unsafe in API #289

Closed
axon-q opened this Issue Jun 8, 2018 · 32 comments

Comments

Projects
None yet
@axon-q
Contributor

axon-q commented Jun 8, 2018

Right now the actix-web code contains 100+ uses of unsafe. Presumably this is in order to achieve the best possible performance in hot parts of the code.

However, web servers often face the public Internet, so security is extremely important for web server implementations. This issue is especially critical for organizations that intend to use the software in large-scale production environments. One of the main reasons to choose a Rust-based HTTP implementation is the guaranteed memory safety that safe Rust provides. Unfortunately this guarantee is eroded for every use of unsafe in the codebase. Performance isn't worth much if it comes at the cost of critical security vulnerabilities due to unsafe memory access. It's also nice to know for certain that your web server won't segfault in production.

I propose that we leave this open as a tracking issue to track design and implementation issues concerning the use of unsafe code. Some of the items that should be explored:

  • Is it possible to remove any of the current uses of unsafe without significantly impacting performance?
  • Is it appropriate to remove some uses of unsafe even if there's a performance impact?
  • Is there a long-term plan to reduce or eliminate the use of unsafe code?
  • Security analysis, testing, and fuzzing of the codebase
  • Profiling and performance analysis to assess the impact of converting unsafe to safe code
@DoumanAsh

This comment has been minimized.

Contributor

DoumanAsh commented Jun 8, 2018

@axon-q First of all if your concern is unsafe code then you cannot use even Rust's standard library which is the biggest collection of unsafe code.

Is it possible to remove any of the current uses of unsafe without significantly impacting performance?

Most likely no, but it should be investigated on per case basis, not overall

Is it appropriate to remove some uses of unsafe even if there's a performance impact?

No, I don't believe it is right approach to problem.
Just because code is unsafe doesn't mean it is going to to segfault or cause memory corruption.
Note that Rust has strong guarantees about UB and it is really difficult to write code that would unexpectedly cause problems (applicable for most unsafe code aside from raw pointers)

P.s. Of course if code is not correct it should be corrected

Is there a long-term plan to reduce or eliminate the use of unsafe code?

AFAIK there is no long-term plan, but nothing stops you from eliminating it on per case basis(as long as it doens't impact performance significantly)

But overall goal should not be to remove 100% of unsafe code

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 8, 2018

unsafe code is used mostly for two reasons:

  • performance. and i dont think it is possible to remove those unsafes without impact. we need to evaluate every use
  • api ergonomics. i think we can remove some of the usafes

in any case we have to evaluate every use case for unsafe. i used unsafe for the reason, in most cases i couldnt come up with safe solution. also you need to take into account size of the codebase, at the moment actix-web is ~27k loc

@axon-q

This comment has been minimized.

Contributor

axon-q commented Jun 8, 2018

@DoumanAsh The point is to reduce risk and the attack surface. Of course the entire stack, including std, rustc, LLVM and the kernel can potentially have vulnerabilities. The Rust foundations are considered very solid, however, due to their maturity and the amount of use and scrutiny they receive. The question is how much risk actix-web adds on top of that, and how to minimize it. This is something that everyone considering using it in production has to evaluate.

I agree that unsafe isn't necessarily bad, and that some kinds of unsafe code are better than others. But from the perspective of risk, the less unsafe the better.

@axon-q

This comment has been minimized.

Contributor

axon-q commented Jun 8, 2018

@fafhrd91 Makes sense! Good to know that it might be possible to work toward removing some of the non-performance-critical usages. Also, some uses of unsafe are more problematic than others: having a bit of unsafe sealed inside an internal component is not so bad, but unsafe handling of things like memory buffers and parsing that are directly affected by untrusted input is a red flag from a security perspective.

@memoryruins

This comment has been minimized.

Contributor

memoryruins commented Jun 12, 2018

If anyone would like to try fuzzing, these will come in handy:

@seanmonstar

This comment has been minimized.

seanmonstar commented Jun 15, 2018

After noticing this issue, and then noticing #301, I figured I'd do a more in-depth analysis of the unsafe usages here. I've so far found ~15 cases where a user can "safely" trigger memory bugs. I first offered to send this in a private email, but it was preferred I just post this here.

I haven't done a complete review. I first looked in public API files, and so have not analyzed client/*, server/*, or pipeline.rs, all of which do happen to contain a large amount of unsafe code as well.

The two main classes of issues I have found are:

  1. Unsafe Send, allowing memory unsafety to occur if the affected code every moves to another thread.
  2. Unchecked multiple mutable references, allowing invalidating references to memory accidentally.

While a couple unsafe usages can be fine, I'd strongly recommend to remove as much as possible. Consider:

  • The performance in many cases isn't actually improved because of it.
  • When performance is improved, usually by just a tiny bit, it's not worth it when it's so easy to violate memory safety. Otherwise why would we have Rust, when we could have this fun property in C/C++.
  • Many C/C++ libraries have felt that they can manage the unsafety, and yet we constantly see CVS notices from those same libraries. Unsafety is extremely tricky.
  • I suspect most all of these can be made safe with no performance cost, just be restructuring some things.

The issues I've found so far:

Actix core

  • ContextImpl unsafe impl Send
    • ContextImpl contains a vector of Box<Future> (note, not Box<Future + Send>).
    • ContextImpl::spawn has F: Future bounds, so a user can safely insert !Send futures into the items list. If that ContextImpl ever crosses to another thread, it can easily trigger memory unsafety (double-free or use-after-free).
  • mutable alias of &mut Actor in ContextImpl
    • A user can call Context::create(new_actor) to create a new Context. After grabbing a &T of something on their Actor (&mut self), they can mem::replace(context, new_context). When the returned old_context drops, it will have invalidated the memory of &T.
  • ContextImpl::actor() returns an aliased &mut Actor
    • Someone implementing a Context using ContextImpl could safely get 2 mutable references to the Actor, and then mutating one can invalidate references into the 2nd.
  • Writer contains UnsafeWrite, which is Send, but holds just an Arc<UnsafeCell>
    • Calling Writer::new(some_io, ctx) will spawn a clone of the UnsafeWriter to be costantly flushing its buffer on the io
    • Writer can then be sent to another thread, where a user can "safely" write into it's buffer.
    • If the Actor in the first thread accesses &self.buffer to pass to std::io::Write::write just before the second thread causes the buffer to re-allocate to grow, the Write will try to copy freed memory.

Actix-web

  • HttpRequest unsafely accesses mutable inner fields without synchronization
    • HttpRequest internally has an Rc
    • InnerHttpMessage has get_mut() method which just transmutes from &self to &mut self.
    • HttpRequest can be cloned (both impl Clone, and methods like change_state, drop_state, etc).
    • This allows having 2 HttpRequests, and while taking a reference to some inner data, like let header = req1.headers().get("connection"), and then invalidate the memory it points to, by doing req2.headers_mut() = HeaderMap::new().
  • HttpRequest::connection_info transmutes references from the internal HeaderMap to &'static.
    • A call to req.connection_info() save static references from the current HeaderMap and Uri
    • Afterwards, a user can mutate the headers, like req.headers_mut().insert("x-forward-for", "woopsies").
    • A second call to req.connection_info() will access cached "static" strs that have had their memory freed.
    • It seems that filters/predicates/middleware actually fill this in automatically, so all it would take is mutating some headers to invalidate those references.
  • HttpRequest::query() and match_info() store references into the Uri path as &'static strs in the Extensions, just like connection_info.
    • Its possible to generate these unsafe static refernces with req.query()
    • Then, the extensions can be replaced via req.extensions_mut().
    • The extensions can be put in a different HttpRequest.
    • req.query() will return the cached but likely freed "static" references.
  • HttpContext mutable alias of self, along with HttpContext::actor(new_actor) will free the old actor, which can be done inside any Actor method
    • impl Actor for Woops {
          fn start(&mut self, ctx: &mut HttpContext<Woops>) {
              let bomb = &self.some_str[10..];
              ctx.actor(Woops::new())
              println!("hey look at this free memory: {}", bomb);
          }
      }
  • Scope takes an unsafe reference to the HttpRequest::uri before passed &mut HttpRequest to each Predicate
    • A user can implement a safe Predicate that could accidentally free the Uri.
    • There is a constructor, HttpRequest::new. A user could clone each of the pieces, modifying a little, and then inserting the new HttpRequest in Predicate::check(&mut req), via *req = my_sub_request.
    • The next step in the loop would then check the path for the next filter, accessing freed memory.
  • WebsocketContext mutable alias has same problem as HttpContext
    • A user can replace the ctx with a new one, thanks to constructors, and free the Actor that represents &mut self
    • A user can also do it more easily with just ctx.actor(Woopsies::new()
  • Identity middleware has unsafe impl for Box<Identity> (not Box<Identity + Send>)
    • A user can very easily accidentally use !Send types when implementing Identity, such as using Rc
    • That type is then stored in the extensions, and if they get sent to another thread, can lead to memory unsafety
  • The Session middleware has the exact same problem.
  • ExtractorConfig has an Rc<UnsafeCell>
    • It implements Clone, and DerefMut, without synchronization, meaning a user can have multiple mutable references to the same Config
@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 16, 2018

@seanmonstar thanks for analysis. i will think about this.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 16, 2018

@seanmonstar thanks again for pointing.

  • using tokio threadpool was bad idea in first place. actix does not use threadpool anymore, so all Send related problems should be fixed.
  • Extensions is fixed too #315
  • uri will be fixed later with #178
  • HttpContext/WebsocketContext should be fixed

last is header_mut(), i need to think how to fix it

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 17, 2018

fixed most of the problems. let's open new ticket for each new case.
closing this one

@fafhrd91 fafhrd91 closed this Jun 17, 2018

@seanmonstar

This comment has been minimized.

seanmonstar commented Jun 18, 2018

I'm glad that some of the issues could be fixed, that's great!

However, I think it's early to close this issue, as "most of the problems" are not actually fixed. Some of the holes have been filled in, but the underlying unsafety still exists, and several of the "fixes" actually still leave a public API that is unsound.

  1. It does seem like most of the unsoundness due to unsafe impl Send has been removed (though, I was a little shocked to read one of the patches add a brand new unsafe impl Send, instead of removing all of them. I haven't yet analyzed the new one.).
  2. Some of the things were fixed, like transmuting &'a str into &'static str. However, several others still remain. The Contexts still have ways to trigger memory bugs. HttpRequest still has was for a user to trigger memory bugs. The URI issue is not fixed.

The biggest problem is there several different versions of code like this:

let alias: &mut Foo = unsafe { &mut *(self as *const Foo as *mut Foo) };
self.inner.do_something_mut(alias)

This is a very unsafe thing to do! True, it's possible for the do_something_mut method to not trigger memory unsafey, but doing the above has completely disable the compiler from helping you notice when you do! And, this pattern is actually the root of most of the unsound APIs that are available. I'd beg that instead of trying to fill in more holes, the root of the unsoundness be fixed instead.

For that reason, I'm not spelling out exactly how to trigger the memory bugs that still exist (though I have exact steps), since it's not my goal to have the ones I find fixed. It's my goal to have people using and writing safe software. Instead, I can write up some steps on how to fix the roots.


  • The HttpRequest continues to have just an Rc<InnerHttpMessage>, and yet thanks to HttpRequest::as_mut() simply transmuting to a &mut InnerHttpMessage, references can still be invalidated. The as_mut method should add a check that there is only 1 reference, and panic otherwise.

    fn as_mut(&mut self) -> &mut InnerHttpMessage {
        if Rc::strong_count(self) > 1 || Rc::weak_count(self) > 0 {
            panic!("as_mut access when other references exist")
        }
        // now take a mutable reference
    }
  • The Scope takes an unsafe reference to the path, and then passes a mutable reference to each predicate (memory bugs are still possible with the fixes so far). It should not do this. Instead, it should only grab a temporary reference when it is needed.

    // This should be removed
    let path = unsafe { &*(&req.uri().path()[prefix..] as *const [u8]) }

    Instead, this should be done:

    thing_using_path_prefix(&req.uri().path()[prefix..])
  • The various Contexts, ContextImpl, Context, HttpContext, WebsocketContext, all do the mutable alias example I mentioned above:

    let ctx: &mut Context = unsafe { &mut *(self as *const Context as *mut Context) };
    self.inner.poll(ctx)

    This still has several unsound holes, and they could all be fixed by not doing the above. Since the Context is passed a &mut Context to all Actor and Handler methods, the structure should be changed. The Context should stop holding the actual Actor, so that a mutable alias isn't required to pass the Context to an actor method.


Besides the unsoundness that I've been able to identify, there may exist other instances that I haven't noticed yet. Instead of fixing the individual issues, fixing the root makes any unnoticed issues go away too. I'd expect this issue to remain open so users of Actix can track how to know that the code they rely on is safe, until everything has been resolved.

  1. The fixes that have been applied so far are only for the unreleased 0.7 version. How can a user of 0.6 be sure their code is sound, without having to migrate through breaking changes to get 0.7 (and hope any other libraries that they use that also depend on Actix upgrade to 0.7 quickly)?
  2. How does a user feel confident that these kinds of unsoundness don't happen again? With the root cause remaining in the library, it's far too easy to forget and introduce a new unsound API. That's what happened the first time, the compiler errors were turned off, assuming that things were safe, and perhaps the very first usage of it was. But since the unsafety gets hidden away in a small function somewhere, while the invariant that keeps it safe is not actually local to that function, new instances can easily pop up.

(An example of number 2 is how server::TransferEncoding holds a *mut BytesMut. This is unsafe, but it's unsafety has been wrapped up in a couple of debug assertions. While analyzing the code, it doesn't appear that memory bugs are triggered, there is absolutely no need to have this unsafety, but allows for future code to accidently violate some invariants. Instead of keeping a *mut BytesMut inside the TransferEncoding, a &mut BytesMut could be passed to TransferEncoding::encode, like encoder.encode(&mut self.buffer, bytes).)

I believe Actix users would want these to be addressed. If not by reopening this issue, then by opening a new one with that objective.

@hayd

This comment has been minimized.

hayd commented Jun 18, 2018

Also, the issue's headline is:

Right now the actix-web code contains 100+ uses of unsafe.

This is still the case. Being able to track this is useful...

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 19, 2018

@seanmonstar i can not see how to trigger memory unsafety with req.uri().path() and predicates. could you elaborate?

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Jun 19, 2018

First of all if your concern is unsafe code then you cannot use even Rust's standard library which is the biggest collection of unsafe code.

Each use of unsafe in the std library contains a comment "proving"/explaining why the unsafe operations performed are safe. The explanations typically list the invariants that, when violated, could lead to undefined behavior, and the code typically asserts these, such that the program will abort or unwind instead of running into undefined behavior.

OTOH unsafe inside actix-web appears to be used nitty-willy, without any explanations of why each unsafe operation is actually safe. That's a completely different bar.

i used unsafe for the reason, in most cases i couldnt come up with safe solution.

Do you mean that you couldn't come up with a safe solution in safe Rust? Because unsafe is a tool to write safe code that the compiler cannot prove correctly, not a tool to write code that is not safe (EDIT: that is, that has undefined behavior).

@ZerothLaw

This comment has been minimized.

ZerothLaw commented Jun 19, 2018

So I'm a newish Rust developer (but long time developer in other stuff). Will efforts to start refactoring the API/Code to remove reliance on unsafe be accepted if sound and clean?

@ZhangHanDong

This comment has been minimized.

ZhangHanDong commented Jun 20, 2018

Recommended to follows three rules-of-thumb for designing a hybrid, memory-safe architecture, as proposed by the Rust SGX SDK project:

  • Unsafe components must not taint safe components, especially for public APIs and data structures.
  • Unsafe components should be as small as possible and decoupled from safe components.
  • Unsafe components should be explicitly marked during deployment and ready to upgrade.

@mitsuhiko mitsuhiko added the unsafe label Jun 20, 2018

@mitsuhiko mitsuhiko reopened this Jun 20, 2018

@mitsuhiko

This comment has been minimized.

Member

mitsuhiko commented Jun 20, 2018

I am going to reopen this issue as there is clear evidence that the API exposed currently is not sound. Thanks @seanmonstar for pointing this out. I want to spend some time in helping to solve these problems. At the same time I want to set some expectations that this is going to be a process that will take quite a bit of time.

I identified one issue #332 which is the core of a lot of this. It's that we currently use an inner type that is used among multiple clonable request objects and through it multiple mutable references can be obtained. It's also clear that this happens in practice. The solution will require some API changes and I want to find out what good ways there are to do this. That request object also sets a pattern that has been used in various places in actix so getting a good story there in place is likely to then set some guidelines of how to do this elsewhere.

This is a topic that's very important to me and I would love if people with experience in that field would be able to assist us.

@mitsuhiko mitsuhiko changed the title from Use of Unsafe to Unsound uses of Unsafe in API Jun 20, 2018

@DoumanAsh

This comment has been minimized.

Contributor

DoumanAsh commented Jun 20, 2018

@mitsuhiko As I mentioned in gitter I think it would be better to look in direction of removing Clone at all

Things like with_state should be with_state(self) -> Self instead of cloning itself

@mitsuhiko

This comment has been minimized.

Member

mitsuhiko commented Jun 20, 2018

@DoumanAsh and then replace it with explicit Rcs? I don't think that will work unless the objects themselves encapsulate some interior mutability. For instance a common pattern at the moment is to clone the request object in a middleware and have those might then take mutable references later to set some data. The tricky bit is that the request object would not really be able to use Rc internally as this would mean it loses the ability to mutate if any reference is out there.

I think i like the idea of removing Clone for these but it means the API needs to be clearly structured so that users can still do what they wanted to do in the first place. The extractors currently clone the request left and right sadly.

@DoumanAsh

This comment has been minimized.

Contributor

DoumanAsh commented Jun 20, 2018

It is true that it is going to be difficult but the whole need for smart pointers here is to let you clone requests around.
Instead it would be better to use transformation approach i.e. take Self and return Self

But this is really pain as you mentioned since clone is used extensively.

P.s. it may imply performance degradation I wonder?

@fafhrd91 I wonder what do you think?

@mitsuhiko

This comment has been minimized.

Member

mitsuhiko commented Jun 20, 2018

I really don't see how Clone be reasonably be removed. The only thing that comes to mind right now are loads of new smart pointers. The entire thing internally is already Rced so all that is really missing is ensure that all APIs where mutation can happen is well guarded.

So for instance it would be trivial to have an API contract like fn method(&self) -> &Method if we never mutate the method internally (or where it's stored on). On the other hand we have stuff like extensions which is tricker. Here I think Extensions could just use interior mutability and a refcell internally.

The trickest one is headers() and friends clearly. I have no idea how to deal with this yet without creating a really weird API.

@mitsuhiko

This comment has been minimized.

Member

mitsuhiko commented Jun 20, 2018

I think for now I want to land some changes that mark functions as unsafe that should need that or tag them to be only there for internal test uses. That should make it clearer which ones are okay and which ones are not. And add markers to indicate why things are unsafe.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jun 20, 2018

Let’s not rush. We need to play with different approaches. In general, I am inclined to remove unsafe rather than making unsafe fn. I am not very happy about this shitstorm atm

@kamek-pf

This comment has been minimized.

kamek-pf commented Jun 24, 2018

For anyone keeping track of this, last week actix-web had over 120 unsafes, as of today I only count 38.
Shout out to the team !

@botev

This comment has been minimized.

botev commented Jun 25, 2018

The trickest one is headers() and friends clearly. I have no idea how to deal with this yet without creating a really weird API.

@mitsuhiko first I want to say that I'm not really an expert on web development or any of the such so my understanding of these issues is relatively low. However, I did have several times have to spend time in Rust developing things which were very counter-intuitive to do with the borrow checker as they required various mutations which are not allowed. As such I'm quite curious what exactly is the issue/use-case/problem here. Is it possible for you to describe a bit the flow of the API and where/why this is required and which bits make it harder? Again for someone not so familiar with HTTP libraries.

@Pzixel

This comment has been minimized.

Pzixel commented Jun 25, 2018

@botev unsafe can make safe Rust to perform an UB operation which would lead to anything, including data corruption or panic. Here is an example on playground. Although it compiles it crashes at runtime (best outcome). However, UB says that it could silently corrupt your data or send your credit card number to African drug dealers (not so nice outcome).

@DoumanAsh

This comment has been minimized.

Contributor

DoumanAsh commented Jun 25, 2018

@Pzixel Please don't be misleading, unsafe code may lead to UB(in a very rare occasions) or memory corruption. But it doesn't mean that all unsafe code equals to UB or corruption or segfault

@Pzixel

This comment has been minimized.

Pzixel commented Jun 25, 2018

@DoumanAsh plase, reread my sentence carefully. That's what I have written:

unsafe can make safe Rust to perform an UB

@botev

This comment has been minimized.

botev commented Jun 25, 2018

@Pzixel maybe I reference the wrong part, but I intended this towards the issue of removing 'Clone' and in general allowing productive mutations when passing object around. I think there have been enough discussion on 'unsafe' and UB and I can't add anything to that.

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jul 5, 2018

UB should be fixed. some unsafe are still there but should not bring UB. if anyone what to review code re-open ticket or create new one

@berkus

This comment has been minimized.

berkus commented Jul 14, 2018

@fafhrd91 is there a commit id/range with regards to fixes related to this ticket?

@fafhrd91

This comment has been minimized.

Member

fafhrd91 commented Jul 14, 2018

@berkus there is no any specific commit, changes are too large. Most of commits for the last two weeks are related to fixes

@berkus

This comment has been minimized.

berkus commented Jul 14, 2018

Okay, I'll look through those, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment