Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ ergonomic usage suggestion #8

Closed
bruxisma opened this issue Jan 9, 2020 · 4 comments
Closed

C++ ergonomic usage suggestion #8

bruxisma opened this issue Jan 9, 2020 · 4 comments

Comments

@bruxisma
Copy link

bruxisma commented Jan 9, 2020

As mentioned in #5, there is need for better namespace usage. Additionally, I understand the need for ABI compatibility, and quickly hacked up an improved API suggestion on the C++ side. This will of course required additional work on the rust side of things, but wanted to get feedback first.

Below, is a C++17 (though backportable to C++11 with minimal effort) API mockup for an improved API. Among the changes are

  • Renaming types to use the C++ standard style
  • Placing everything inside an inline namespace named v1
  • Made the internal representation of str actually private. If its needed in generated code, I recommend using the friend keyword.
  • Left a note that the from_raw and into_raw could be turned into construction and explicit casting to a more opaque class (this class could be made into a friend of box, or vice versa. This reduces the need for unnecessary encapsulation, and also allows the type system to take care of some unsafety for us). For now I've just used a unique_ptr, which will cause a compiler error if an opaque type would be destructed.
  • Improved unnecessary code generation by using iosfwd instead of iostream. I recommend looking at what happens per translation unit when including iostream. It's not pretty, but iosfwd allows us to forward declare the ostream insertion operators without incurring this overhead.
  • Changed all cast operators to explicit. This saves more headaches in the long run, I assure you.
  • Box now has an explicit operator bool, so it can be used in if() statements, and is more in line with unique_ptr, which is the closest thing on the C++ side.
  • box also can now do in-place construction, saving unnecessary copies. We also construct correctly, via std::addressof and ::new. Adopting this API would allow Windows support?  #6 to be completed.

I also know quite a bit about build systems and the C++ language (and API design) in general, so feel free to ask any questions :)

#ifndef CXXBRIDGE_H
#define CXXBRIDGE_H

#include <cstdint>
#include <cstddef>

#include <type_traits>
#include <utility>
#include <iosfwd>
#include <string>
#include <array>

namespace cxxbridge {
inline namespace v1 {
namespace rust {

struct string final {
  string (string const&) noexcept;
  string (string&&) noexcept
  string () noexcept;
  ~string () noexcept;

  string (std::string const&);
  string (std::string&&);
  string (char const*);

  string& operator = (string const&) noexcept;
  string& operator = (string&&) noexcept;

  void swap (string&) noexcept;

  explicit operator std::string () const;

  char const* data () const noexcept; // no null terminator
  std::size_t size () const noexcept;
private:
  std::array<intptr_t, 3> repr;
};

class str final {
  // internal is private and must not be used other than by generated code.
  // Not really ABI compatible with &str, codegen will translate to
  // cxx::rust_str::RustStr
  struct internal {
    char const* ptr;
    std::size_t len;
  };

public:
  str (str const&) noexcept;
  str (str&&) = delete;
  str () noexcept;

  str (std::string&&) = delete;
  str (std::string const&) noexcept;
  str (char const*) noexcept;
  str (internal) noexcept;

  str& operator = (str const&) noexcept;

  void swap (str&) noexcept;

  explicit operator std::string () const;

private:
  internal repr;
};

template <class T>
struct box final {
  using value_type = T;
  using const_pointer = std::add_pointer_t<std::add_const_t<value_type>>;
  using pointer = std::add_pointer_t<value_type>;

  box (box const& that) : box(*that) { }
  box (box&& that) : repr(std::exchange(that.repr, 0)) { }

  template <class... Args>
  box (std::in_place_t, Args&&... args) {
    ::new (std::addressof(**this)) value_type(std::forward<Args>(args)...);
  }
  box (T const& value) : box(std::in_place, value) { }

  // replace this with a raw<T> wrapper so that we KNOW a type came from box
  box (std::unique_ptr<value_type> handle) {
    this->raw(handle.release());
  }

  box& operator = (box const& that) {
    box(that).swap(*this);
    return *this;
  }

  box& operator = (box&& that) {
    if (*this) { this->drop(); }
    this->repr = std::exchange(that.repr, 0);
    return *this;
  }

  void swap (box& that) {
    using std::swap;
    swap(this.repr, that.repr);
  }

  explicit operator bool () const noexcept { return this->repr; }

  // This should be replaced with a raw<T> wrapper so that we KNOW a type came
  // from a box.
  explicit operator std::unique_ptr<value_type> () noexcept {
    auto ptr = std::unique_ptr { this->get() };
    this->repr = 0;
    return ptr;
  }

  const_pointer operator -> () const noexcept { return this->get(); }
  pointer operator -> () noexcept { return this->get(); }

  decltype(auto) operator * () const noexcept { return *this->get(); }
  decltype(auto) operator * () noexcept { return *this->get(); }

private:
  void destroy () noexcept;
  void uninit() noexcept;

  void raw (pointer) noexcept;
  pointer raw () noexcept;
  
  const_pointer get () const noexcept;
  pointer get () noexcept;

  std::uintptr_t repr { };
};

std::ostream& operator << (std::ostream&, string const&);
std::ostream& operator << (std::ostream&, str const&);

}}} /* cxxbridge::v1::rust */

#endif /* CXXBRIDGE_H */
@dtolnay
Copy link
Owner

dtolnay commented Jan 9, 2020

Thanks so much for putting this together! This is super helpful. I will take care of incorporating your suggestions over the next couple days.

One question for you on box (box&& that) -- In Rust, moves are destructive so there isn't a destructor that runs on the "old" object, and Box is non-nullable. In C++ I want cxxbridge::rust::box to be move constructible because otherwise it would be pretty horrible (I think; right?) but we only get non-destructive moves. As you saw, in the current implementation a move just sticks 0 in repr. This is fine as long as Rust never sees that 0, i.e. the moved-from box is about to be destroyed and ~box checks for 0 and avoids calling Rust's Drop in that case.

Now suppose that someone had:

#[cxx::bridge]
mod ffi {
    struct Thing {
        b: Box<X>,
    }

    extern "C" {
        type X;
        fn f(thing: &mut Thing);
    }
}
void f(Thing &thing) {
    box<X> moveconstruct(std::move(thing.b));
}

This is UB because we move from the box but the whole Thing and Box are still owned by Rust, so the Box's Drop impl will be run by Rust and see the 0. I want code like this to be considered wrong C++ code. Do we get to just say "don't do that", meaning don't move from a box that isn't immediately going to be destroyed? As I understand it, move is usually expected to leave the old object in some valid but unspecified state that is safe to destroy. How does that mesh here? It's safe to destroy in C++ but not Rust. Are we forced to do something more expensive in move, or cut move altogether?

@jsgf
Copy link

jsgf commented Jan 9, 2020

Is the issue that the C++ box<T> is actually equivalent to a Rust Box<Option<T>>, but we don't want to scatter that type throughout the ffi layer?

@bruxisma
Copy link
Author

bruxisma commented Jan 9, 2020

Thanks so much for putting this together! This is super helpful. I will take care of incorporating your suggestions over the next couple days.

Happy to help! The API could be improved further, but this was just me spending a bit of time on the interface :)

One question for you on box (box&& that) -- In Rust, moves are destructive so there isn't a destructor that runs on the "old" object, and Box is non-nullable. In C++ I want cxxbridge::rust::box to be move constructible because otherwise it would be pretty horrible (I think; right?) but we only get non-destructive moves. As you saw, in the current implementation a move just sticks 0 in repr. This is fine as long as Rust never sees that 0, i.e. the moved-from box is about to be destroyed and ~box checks for 0 and avoids calling Rust's Drop in that case.

Now suppose that someone had:

#[cxx::bridge]
mod ffi {
    struct Thing {
        b: Box<X>,
    }

    extern "C" {
        type X;
        fn f(thing: &mut Thing);
    }
}
void f(Thing &thing) {
    box<X> moveconstruct(std::move(thing.b));
}

This is UB because we move from the box but the whole Thing and Box are still owned by Rust, so the Box's Drop impl will be run by Rust and see the 0. I want code like this to be considered wrong C++ code. Do we get to just say "don't do that", meaning don't move from a box that isn't immediately going to be destroyed?

If this is the thing you're trying to avoid, there are options available. For instance as long as don't ever actually move the value onto the C++ side with the intention that it will be destroyed there, some kind of an "adopting" pointer might be useful (though my proposed type P0468 is not slated until C++23, it is easily implementable and might be useful here. However, it relies on reference counting and would be better served for helping to implement a C++ equivalent to Rc<T>, which retain_ptr can be used to implement but not represent). Moving from using uintptr_t to a unique_ptr with a custom deleter (where the pointer member alias is uintptr_t, see cppreference for more info) might be the better option. Unfortunately, uintptr_t does not meet the needs of NullablePointer, though in practice the requirements for the interface for unique_ptr is relaxed, and thus it would be doable, but not guaranteed for all implementations. If we could guarantee that a Box on the rust side will never exist solely on the C++ side (i.e., C++ won't ever actually call the underlying destructor), then we can make a shim wrapper that allows the type system to enforce all of this, while still allowing for moves on the C++ side.

Also keep in mind that clang now has a [[clang::trivially_relocatable]] attribute and this allows a user to treat a type as though its move constructor is destructible in some cases (we're starting to investigate the path of destructible move constructors in C++, but we're calling them relocations for reasons that are currently up in the air. I talked about them briefly at RustConf 2019 this past August.

As I understand it, move is usually expected to leave the old object in some valid but unspecified state that is safe to destroy. How does that mesh here? It's safe to destroy in C++ but not Rust. Are we forced to do something more expensive in move, or cut move altogether?

The nice thing about the wording is that while the standard requires that types be in an unspecified state, we the users can actually specify the exact state it is in. (Thus if we document what we do, people can't technically be surprised because it means they didn't read the documentation 😅)

Is the issue that the C++ box<T> is actually equivalent to a Rust Box<Option<T>>, but we don't want to scatter that type throughout the ffi layer?

Not so much. Having an unique_ptr<T> is closer to box<T> (though no copies because "what if you wanted a custom allocator per instance, etc. etc." is the difference). Both unique_ptr and optional do implement explicit operator bool and having optional be the inner type would be a bit redundant and add additional overhead. Effectively an empty unique_ptr would be like having Box<None>, but would require a user to check if it has anything "stored".

@dtolnay
Copy link
Owner

dtolnay commented Apr 11, 2020

I've incorporated or filed issues to track all of these suggestions. Thanks again! This has made the library much better.

The remaining work is tracked in #58 (swaps) and #102 (in_place constructions).

One issue that's come up where I may need expert C++ guidance is #104, deciding which constructors to make explicit.

@dtolnay dtolnay closed this as completed Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants