Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider using extension traits on Vec<u8>/&[u8] #5

Closed
BurntSushi opened this issue May 3, 2019 · 5 comments · Fixed by #8
Closed

consider using extension traits on Vec<u8>/&[u8] #5

BurntSushi opened this issue May 3, 2019 · 5 comments · Fixed by #8

Comments

@BurntSushi
Copy link
Owner

BurntSushi commented May 3, 2019

Currently, in 0.1 of bstr, the primary way to use and manipulate byte strings is with the BString (owned, growable) and BStr (borrowed slice) types. However, a major alternative design to using explicit types is to define extension traits that add more methods to the Vec<u8> and &[u8] types.

Reflecting back, I don't think I quite gave the extension trait path a thorough review before going with the explicit BString/BStr types. In particular, I perceive few key advantages to using explicit types:

  1. Having distinct types provides some "semantic" meaning that the bytes should be treated as a string rather than just an arbitrary collection of bytes.
  2. Have a convenient Debug representation that prioritizes the "stringness" of BString/BStr over the "slice of u8" representation shown for Vec<u8>/&[u8]. For example, "abc" instead of [97, 98, 99].
  3. As a riff on (2), there may be other traits that one wants to implement "specially" for byte strings as opposed to "slice of u8." Serde comes to mind here.

If (1) were the only benefit, I think I could be persuaded to drop that line of reasoning, although it does appeal to me aesthetically. However, in my view, (2) is a fairly significant benefit, and it's one of the most important ergonomic improvements that I look forward to whenever I bring bstr in to one of my crates. Otherwise, I fairly consistently define my own helper functions for printing out byte strings when I don't have bstr, and it's honestly a pain. Especially when Vec<u8>/&[u8] are part of some other type.

With that said, in the course of actually using bstr in crates, I've come to the belief that using extension traits would make using string oriented APIs much more seamless and more ergonomic overall, with the notable exception of the aforementioned problems with the debug representation. In particular, using BString/BStr often requires annoying conversion routines between Vec<u8>/&[u8]. e.g., Most of the raw I/O APIs in std want a &[u8], so you wind up needing to write my_byte_string.as_bytes() quite a bit, which is annoying.

Moreover, using BString/BStr really motivates one to use them in as many places as possible, because of aforementioned annoying conversions. But this isn't always desirable, because you might want to expose APIs in terms of &[u8] for various reasons, including, but not limited to, not adding a public dependency on bstr. If we were using extension traits instead, then you could just import the traits and start using the new APIs immediately.

One possible alternative to this would to implement Deref and DerefMut for BString/BStr, which would eliminate the various as_bytes() calls, but you still need to explicitly construct byte strings. Moreover, this kind of feels like an abuse of Deref.

Another benefit of extension traits is that the API surface area of bstr could be quite a bit smaller, since many of the methods on BString/BStr are just re-exports of methods by the same name on Vec<u8>/&[u8].

Overall, my sense is that this crate would be better if it used extension traits. To mitigate (but not completely solve) my Debug problem, we could keep the BString/BStr types, but remove all of their inherent methods, make them implement Deref and add an appropriate Debug impl. You still have to explicit convert from Vec<u8>/&[u8], which is a little annoying, but I expect their use would be more limited and the Deref impl would make them overall more convenient to use.

Obviously, this is a fairly large breaking change to the current API, but given the only consumer (that I know of) is me, I think it's okay to do this. The library is called an experiment after all, and if we're going to make this change, then now would be the time to do it.

Pinging @joshtriplett and @eddyb, who I believe might have thoughts on this. (Please ping others that you think might have an opinion on this.)

@eddyb
Copy link

eddyb commented May 3, 2019

I'd say [u8] should be the main type to talk about here, since Vec<u8> derefs to it.
(But the &[u8] syntax may not have been meant literally, as BStr is a newtype around [u8], not around a reference)

And, yeah, I'd be fine with BStr that derefs to [u8], since I'd prefer having most of the functionality in libcore, and just get e.g. the debug formatting from this crate.

@joshtriplett
Copy link
Contributor

joshtriplett commented May 3, 2019 via email

@BurntSushi
Copy link
Owner Author

BurntSushi commented Jun 19, 2019

One little hiccup that I've run into is that a lot of the current BStr APIs accept anything that implements AsRef<[u8]>. So that means you can use somebstr.find("foo") or somebstr.find(b"foo"), which is nice for ergonomics. I can continue to do this for the extension trait APIs of course, but with an extension trait, there are several methods already defined on slices that we wouldn't want to redefine. For example, there are already starts_with and ends_with methods defined on &[u8], but their only valid parameter is &[u8] which means somebstr.starts_with("foo") won't work. I see a few choices here, all of which kind of suck honestly:

  1. Don't define methods like starts_with that already exist and keep the AsRef<[u8]> for APIs we do define. We live with the inconsistency. (Users will invariably ask, "why do some methods accept a &str but others don't?!")
  2. When there are conflicting methods, like starts_with, provide "str" focused variants, e.g., starts_with_str and ends_with_str.
  3. When there are conflicting methods, define new methods with the same name and require users to disambiguate methods via UFCS. (Yuck. Only listed for completeness.)
  4. Get rid of the AsRef<[u8]> convenience on all APIs and make everything accept a concrete &[u8] in order to be consistent. This is kind of a bummer, since then you can't just write, somebstr.find("☃☃☃"), but instead must write somebstr.find("☃☃☃".as_bytes()).

Any thoughts on what the preferred route ought to be?

@lespea
Copy link

lespea commented Jun 19, 2019

I like the second option personally

@BurntSushi
Copy link
Owner Author

@lespea Could you elaborate a bit more? :)

BurntSushi added a commit that referenced this issue Jun 25, 2019
This commit effectively rewrites the entire API of this crate to use
extension traits on `[u8]` and `Vec<u8>`. While the `BStr` and `BString`
types are still present, they are now just dumb wrappers that deref to
`[u8]` and `Vec<u8>`, respectively. Their primary purpose is for
convenient debug impls.

The motivation for this design is laid out in #5.

Closes #5
BurntSushi added a commit that referenced this issue Jun 25, 2019
This commit effectively rewrites the entire API of this crate to use
extension traits on `[u8]` and `Vec<u8>`. While the `BStr` and `BString`
types are still present, they are now just dumb wrappers that deref to
`[u8]` and `Vec<u8>`, respectively. Their primary purpose is for
convenient debug impls.

The motivation for this design is laid out in #5.

Closes #5
BurntSushi added a commit that referenced this issue Jun 25, 2019
This commit effectively rewrites the entire API of this crate to use
extension traits on `[u8]` and `Vec<u8>`. While the `BStr` and `BString`
types are still present, they are now just dumb wrappers that deref to
`[u8]` and `Vec<u8>`, respectively. Their primary purpose is for
convenient debug impls.

The motivation for this design is laid out in #5.

Closes #5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants