Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crate redesign #23

Open
Kixunil opened this issue Dec 15, 2020 · 9 comments · May be fixed by #24
Open

Crate redesign #23

Kixunil opened this issue Dec 15, 2020 · 9 comments · May be fixed by #24

Comments

@Kixunil
Copy link
Owner

Kixunil commented Dec 15, 2020

Rust language made a lot of progress since this crate was created and it'd be great if this crate caught up.

Specifically:

  • MaybeUninit<[u8]> is the right way of dealing with possibly uninitialized buffers
  • Vectored IO is supported by std
  • async is becoming very popular

std design and anything that copies it has several limitations that this crate could attempt to solve (by not trying hard to follow std design):

  • Read is safe and takes &mut [u8]. We already have unsafe version but it seems that having a safe Read with read taking some special, safe type is a better approach. It avoids many issues, including forgetting to specialize the implementation and having more unsafe than necessary.
  • Vectored IO is an afterthought and suffers from similar problems that read does. Even weirdly, some vectored impls (e.g. TLS) make programs slower, not faster. While it might be a better design if the impls were fixed, it could be fundamentally impossible. (Not sure.) The resulting design should somehow take this into account.
  • AsyncRead and AsyncWrite traits don't exist in std and there was confusion where they belong. Previously, the design even didn't use such traits which was quite a footgun. We can do better by encoding Pending in the return type, but this needs to be researched to work correctly with Interrupted.
  • The Read and Write traits don't have a way to self-describe their implementor which makes declaring errors that contain this information complicated.

Rough proposal:

// buffer containing IO slices
// Note that conversion to low-level representation is done by the implementor.
trait IOSliceBuf {
    fn push<'a>(&'a mut self, &'a mut OutSlice<u8>);
}

unsafe trait Buffer {
    fn get_mut(&mut self) -> &mut OutSlice<u8>;
    unsafe fn advance(&mut self, len: usize);
    fn fill_io_slice<'a>(&'a mut self, &'a mut dyn IoSliceBuf);

    // provided impls:
    fn try_push(&mut self, byte: u8) -> bool { /*...*/ }
    fn try_push_slice(&mut self, &[u8]) -> bool { /*...*/ }
    // push as many bytes as possible, return the remaining slice
    fn push_slice_min<'a>(&mut self, &'a [u8]) -> &'a [u8] { /*...*/ }
    fn zeroed(&mut self) -> &mut [u8];
}

// Separated so it can be shared between Read and Write
trait ObjectDescription {
    type Description;

    fn describe(&self) -> Self::Description;
}

unsafe trait Read: ObjectDescription {
    type Error: fmt::Display + fmt::Debug;

    // Implemented by the implementor
    // we don't return the number of bytes because it's reflected in the `Buffer`
    fn read_impl(&mut self, buffer: &mut dyn Buffer) -> Result<(), Self::Error>;

    fn read(&mut self, buffer: &mut Buffer) -> Result<(), Error<Self::Description, Self::Error>> { /* ... */ }

   // other helpers like read_exact, etc
}

// Any reader that returns this error can be converted to `Future`
enum AsyncError<E> {
    Pending,
    Other(E),
}

struct Error<D, E> {
    object: D,
    error: E
}

Of course, dyn is sketchy and ideally we don't want it.

@Kixunil
Copy link
Owner Author

Kixunil commented Mar 4, 2021

After discussion with @ColinFinck it seems like trying an approach that could be expected to be included in core one day is a good idea.

In general, things in std tends to be biased towards simplicity. I expect ObjectDescription to be considered too much, so let's throw it away for now.

Goals

  • The traits need to be abstract enough to connect various implementations of readers to their users
  • The traits need to allow efficient code
  • There shouldn't be footguns
  • If possible keep it simple
  • Needs to support modern idiomatic Rust
  • Abstracting over multiple different readers/writers should be reasonably easy

Let's discuss various design decisions one-by-one:

Should the Read trait be unsafe?

In general, there are not many unsafe traits in std/core and the general approach is towards minimal amount of unsafe code. Having unsafe trait Read puts burden on people who aren't attempting to do anything complicated. As such it makes more sense to make the buffer OutSlice<u8> with length tracking.

Should the buffer passed to read be be type or trait?

Since the read method will need to accept OutSlice<u8> it's required that the amount of bytes written is correctly reported. We can't just return this information as it's inherently coupled with the buffer. At best we could return &mut [u8]. But that's messy.

So we need the reader to update the position of initialized data. There are several options to do this:

  • struct Buffer { slice: &mut MaybeUninit<[u8]>, pos: usize }
  • trait Buffer { ... }, fn read(&mut self, buffer: impl Buffer>) - not object-safe
  • trait Buffer { ... }, fn read(&mut self, buffer: &mut dyn Buffer>) - this may invoke dynamic dispatch for every byte written 🤮
  • trait Buffer { ... }, trait Read<B: Buffer> fn read(&mut self, buffer: &mut B>) - annoying to write, looks complicated, annoying dynamic dispatch
  • struct Buffer { ... } but allow initializing with Vec etc which gets synced on drop. This looks quite interesting, so it will be researched.

What should the error type be?

io::Error was criticized by me already and some people seem to agree with the criticism. It attempts to be both low level and high level, failing at both of them. Even if it was somehow incorporated into core, the are only few ways I could see it working, all of them terrible:

  • Allow core to store ErrorKind only - no way to store custom information
  • Require allocator and do hacks with pointers - not general for all no_std applications anymore
  • Give core some reserved number of bytes to store arbitrary information - problematic because big errors are supposed to be avoided

As a result, I don't really see a good way to avoid error being generic. I believe being generic is the right thing to do. Similar design was selected for Future, where object-safety is important, yet it wasn't a concern.
A crate that wants to abstract over several readers/writers with error types can map errors. The traits can provide a combinators to do that.

How to handle async?

The readiness-based async approach used in Rust leads to what seems to be an obvious solution: have an error type indicate WouldBlock. This was already used in the ecosystem in the form of io::Error and there doesn't seem to be any issue with it.

We have two possible approaches here:

  • require that Error: IsWouldBlock
  • require that Error = AsyncError

Requiring specific error type is somewhat simpler but may lead to annoying bounds (<E> where T: Read<Error=AsyncError<E>>) and inflated type size by discriminant in the case of io::Error.
IsWouldBlock (or simply AsyncError?) is annoying to impl for each error type.

Using trait and providing enum type may be best of both worlds but adds more code. :(

Thus I'm not certain which it should be. However a good thing about this is it can be added later.

How should vectored API look like?

std::io added a special method for vectored IO because of compatibility issues. It's also tied to the OS and as such it can never go to core. After it was implemented, new problems appeared - there were actually many cases where it led to worse performance.

My understanding of the issue

When a user of IO trait tried to call vectored IO, it first filled IoSlice with the desired slice and then passed it to the vectored method. The default implementation just called the appropriate method on each item in the slice. As a result there was code that was just copying things into and from slice without doing anything more interesting.

What do do?

The question is how to do it better? This is not easy. It may be tempting to put this on hold. However, I'm afraid that this may significantly influence the rest of the design. So I prefer to look at it first.

One interesting idea that comes to mind is this:

fn prepare_vectored_writer<S: Storage, F: FnOnce -> S>(&mut self,  alloc_storage: S) -> VectoredWriter<Self, S>;

VectoredWriter would be similar to ArrayVec<IoSlice> or BufWriter, where slices are stored instead of copying the bytes. S is storage where slices would actually reside. It'd most likely be an array but could be Box or Vec. Non-vectored writer would simply not call alloc_storage and internally set the storage to be Option<IoSlice>. This would cause the users to pass slices one-by-one.

This is almost good but has one issue: it could waste stack space if vectored IO is not being used as VectoredWriter needs to have enum internally. This has to be researched further. I originally had an idea to have two traits but I'm just thinking that maybe having an associated type that is one of two options would be simpler. This sadly leads to difficulties around abstracting over multiple writers, but I think it can be solved somehow.

Sadly, this whole thing is getting complicated, which may be a dealbreaker for inclusion into core but perhaps it's better to get started somewhere.

@ColinFinck
Copy link

Thanks for raising these questions. The goals sound fine and ObjectDescription should indeed be left out for now.
Find my answers inline.

Should the Read trait be unsafe?

Absolutely not! That would lead to many use cases employing unsafe code for no reason. But I see that we agree on that.

Should the buffer passed to read be be type or trait?

I would need to see the full implementation you have in mind before judging on your proposal here. OutSlice doesn't seem to be defined anywhere and it's not part of the Rust core either. Hence, that would also need to become part of core in your proposal.
Does it correspond to MaybeUninitSlice from https://github.com/Kixunil/possibly_uninit?

What should the error type be?
I believe being generic is the right thing to do.

I agree that std::io::Error is not a good example, and nothing that could become part of core.
However, assuming that we also move read, read_exact, read_to_end to "core::io::Read" and flush, write, write_all to "core::io::Write", we need a standardized error enum containing at least their error variants.
ErrorKind is such an enum in std that could also work in core. Is your plan to move that as well? And how exactly would you go about returning ErrorKind error variants and still supporting a generic error type?

How to handle async?

I agree that this topic is up for a later discussion. In particular, it shouldn't be discussed before we have an actual use case. The crates I have in mind, which would benefit from "core::io::Read" and "core::io::Write", all don't need any async support.

A minimal "core::io::Read" and "core::io::Write" should only ensure that its design doesn't prevent async support from being added later.

How should vectored API look like?

I share the same opinion as for async support on this matter: Without any immediate use case, it's not the right time to discuss vectored I/O now.
It only came to Rust std much later than standard I/O, hence it should be fine and way easier to agree on a minimum "core::io::Read" and "core::io::Write" without vectored I/O support.

Again, we should only make sure that vectored I/O can still be added later.
I see no blockers in the current proposal, neither for std::io's vectored I/O APIs, nor for your prepare_vectored_writer proposal.

@Kixunil
Copy link
Owner Author

Kixunil commented Mar 7, 2021

Does it correspond to MaybeUninitSlice from https://github.com/Kixunil/possibly_uninit?

Yes, it was supposed to be renamed but I didn't have the time to do it yet. out reference is the correct technical term which I didn't know at the time of writing of possibly_uninit.

Note that I was imprecise in that part and what actually needs to be passed is a buffer with OutSlice and position. I will try to write it soonish.

we need a standardized error enum containing at least their error variants.

I intend for them to just have the internal errors wrapped in special error types but open to other possibilities if it turns out to be problematic.

ErrorKind is such an enum in std that could also work in core. Is your plan to move that as well?

It was not originally planned but since you brought it up I will consider it (but most likely it will be there). While I'm not a huge fan of it I guess I can see its usefulness and reasons why people like it.

And how exactly would you go about returning ErrorKind error variants and still supporting a generic error type?

The most obvious thing to do is to define IoError trait which requires fn kind(&self) -> ErrorKind and then require IoError on error type. Another possibility is to return struct IoError { kind: ErrorKind, error: E, }` but strongly I suspect that'd be highly annoying to use in practice.

before we have an actual use case.

I had a hobby-maybe-serious-later program that would need that but I didn't have time to finish it. Anyway I feel like the current approach in std-based crates (AsyncRead, AsyncWrite) is either right or close to right.

should only ensure that its design doesn't prevent async support from being added later.

Agree, I don't see how what I have in mind would prevent good async support.

I share the same opinion as for async support on this matter: Without any immediate use case, it's not the right time to discuss vectored I/O now.

Here I'm not that confident that we can skip it without introducing similar problems std experiences. I don't mind having only minimal support for it but I strongly believe it should be considered in the design.

My idea is that if you write a crate that uses genio you can use it equally well on embedded or in OS-based program. This includes support for vectored IO.

@HeroicKatora
Copy link

How to handle async?

The strategy can be: don't. What the caller more precisely wants to do is to construct a Future. There are several reasons why this is a bad fit for the Read/Write trait itself:

  • It's only feasilble to write with higher-ranked-trait bounds, or allocation—not necessarily global heap allocation but an allocation parameter nevertheless.
  • The mechanism for registering a source with the task::Context is entirely orthogonal to the mechanism of reading. It's only necessary that the reading itself doesn't block which is part of the construction of the value, and following HAL's design shouldn't be part of the trait. How the source is registered to source events that make it awaitable is a concern for a separate set of libraries such as mio, and the 'async fn read' is little more than waiting on such an event and then calling the original Read::read anyways.

Other than that some feedback on the library: Enabling byteorder changes the trait Read/Write traits from being object-safe to not being object-safe which is very undesirable with regards to code bloat. This is probably not intended.

The std::io::Error object is boxed by design to make it small and fast, i.e. it is only pointer sized. An opaque error design might be able to hide this detail depending on whether alloc/std is available.

@Kixunil
Copy link
Owner Author

Kixunil commented Mar 9, 2021

Thanks for feedback!

It's only feasilble to write with higher-ranked-trait bounds, or allocation

I'm thinking of something like this:

struct ReadFuture<'a, R: Read> {
    reader: R,
    buffer: &'a mut OutSlice<u8>,
}

// without Pin for simplicity
impl<'a, R: Read> Future for ReadFuture<'a, R> {
    type Output = Result<&'a mut [u8], R::Error>;

    fn poll(&mut self) -> Poll<Self::Output> {
        // API envisioned above
        // it's a struct with position: usize and the slice
        let mut buffer = self.buffer.as_buffer();
        match self.reader.read(&mut buffer) {
            // into_initialized returns same lifetime as the slice
            Ok(_) => Ok(Poll::Ready(buffer.into_initialized())),
            Err(AsyncError::WouldBlock) => Ok(Poll::WouldBlock),
            Err(AsyncError::Other(error)) => Err(error),
        }
    }
}

Do you see any problem with it?

The mechanism for registering a source with the task::Context is entirely orthogonal to the mechanism of reading.

Correct, I don't propose to do any registration. Am I missing something?

it is only pointer sized

No, it isn't

An opaque error design might be able to hide this detail depending on whether alloc/std is available.

How? If allocation is impossible, then io::Error can not carry arbitrary types which it currently can. This would lead to very sad error handling.

@HeroicKatora
Copy link

Do you see any problem with it?

This is how you can write a specific Future type, and indeed this is possible, but it can not be a trait method. Also note that returning Poll::Pending [sic] without having been registered into a Context, as it is done here, is wrong and might not make any more progress. See here.

When a function returns Pending, the function must also ensure that the current task is scheduled to be awoken when progress can be made.

This integration is inherently impossible to do correctly without an outer event loop (read: OS, or interrupt controller) and also completely unnecessary for the basic read/write trait.

@roblabla
Copy link

roblabla commented Mar 9, 2021

Regarding async, I don't see why it should be integrated into a core::io::Read trait. It almost certainly needs to be a separate trait, as not every IO source can be read asynchronously, along with the other problems @HeroicKatora mentions. Besides, AsyncRead and friends aren't even in std yet, they live in the various async frameworks, or in the futures crate. Any no_std improvement to AsyncRead should probably happen there IMO, instead of duplicating efforts.

@HeroicKatora
Copy link

There is a conflict of interest here, which also becomes apparent in the custom wrapper for MaybeUninit. Either the crate offers std-like trait in a no_std environment, with good compatibility. Or it innovates with a different design for no_std. The former options should imply that one patiently waits on libs-team decision with regards to the uninit and async interface and it also seems to fare better with regards to the compatibility. That's also been a major motivation for not-io, to stay firmly within a subset of std::io. Since all other crates I had looked at tried to de-facto stabilize or deviate from the established traits for no quantifiable reason, which conflicts with the compatibility goal once such an addition and stabilization happens within std.

Case in point; the initializer approach to uninitialized reading is being replaced by another accepted RFC and it feels , for the lack of a better word, unsettling that the proposed interface here—and in particular the lack of ReadBuf–has not been informed by the RFC. (In particular, trying instead a pseudo-out parameters). If that has been due to a lack of information, then maybe reconsider the interface now that you have it.

Kixunil added a commit that referenced this issue Jan 26, 2022
This refactors lot of stuff based on knowledge learned over past years
and lot of thinking. Apart from obvious details like using `Infallible`
or the `alloc` crate, this introduces the concept of maybe uninitialized
buffers. Their actual initializedness is tracked as typestate or
dynamically as needed. This should enable efficient bridging with `std`
among other things.

The code contains quite a bit of `unsafe` that needs better auditing and
ther's still shitload of work to do but looks like this direction is
good.

Closes #23
Kixunil added a commit that referenced this issue Jan 26, 2022
This refactors lot of stuff based on knowledge learned over past years
and lot of thinking. Apart from obvious details like using `Infallible`
or the `alloc` crate, this introduces the concept of maybe uninitialized
buffers. Their actual initializedness is tracked as typestate or
dynamically as needed. This should enable efficient bridging with `std`
among other things.

The code contains quite a bit of `unsafe` that needs better auditing and
ther's still shitload of work to do but looks like this direction is
good.

Closes #23
@Kixunil Kixunil linked a pull request Jan 26, 2022 that will close this issue
@Kixunil
Copy link
Owner Author

Kixunil commented Jan 26, 2022

Oh, WTF, I thought I replied to this. Anyway, I finally found some time to write something and made a not-entirely-finished PR to show the changes.

Agree to initially skip async. It can be just a marker trait that also requires the error type to be identifiable as WouldBlock or Other.

Some other things from @HeroicKatora I'd like to address:

the initializer approach to uninitialized reading is being replaced by another accepted RFC and it feels , for the lack of a better word, unsettling that the proposed interface here—and in particular the lack of ReadBuf–has not been informed by the RFC

Actually, what I proposed above is very similar to that RFC! The only difference was lack of additional field. I dislike the fact that people who may want to use genio would have to pay performance penalty of handling additional field even if they don't use the field. So I figured out to parametrise it - what the PR implements. Buffer<'_ init::Dynamic> is pretty much the exact same thing as the ReadBuf type in that RFC. I just had to use a wrapper instead of reference to allow fast and sound casting of types. Otherwise it's the same. I will support casting those types as much as possible.

In particular, trying instead a pseudo-out parameters

I don't see any problem with these. It's a cleaner design than having unsafe method returning &mut [MaybeUninit<u8>] because it tracks the safety requirements in the type.

If that has been due to a lack of information, then maybe reconsider the interface now that you have it.

Well, I did reconsider the third field and decided to make it generic. :)

TODOs before release:

  • deep review of unsafe with a fresh head
  • more tests
  • better doc explaining in-depth how to work with BufInit
  • cookbook for various common scenarios
  • combinator to convert BufInit similarly to map_read_err
  • map_write_err
  • Clean up - a module name called util is screaming "low-quality crap"
  • Possibly write a migration guide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants