Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider mutable representation for heap variant #12

Closed
matklad opened this issue Sep 20, 2021 · 8 comments
Closed

Consider mutable representation for heap variant #12

matklad opened this issue Sep 20, 2021 · 8 comments

Comments

@matklad
Copy link
Contributor

matklad commented Sep 20, 2021

Currently, the heap variant is Arc<str>. This has a couple drawbacks:

  • Arc wastes code/space for weak pointer support which we don't use
  • mutating the string requires a full copy of data
  • (need to re-check this) pointer in Arc<str> points to the start of the ArcInner, not to the start of str, so there's extra offset during deref.

An ideal representation would be to store *const str, which points:

                             here
                               V
| rc: AtomicUsize | cap: usize | str byte data | spare capacity |

This should allow us to just mutate the string just fine (using Arc::make_mut-like API).

Admittedly, this increases the complexity of the implementation quit a bit.

@novacrazy
Copy link

How does one even allocate a struct like this?

#[repr(C)]
struct Inner {
    rc: AtomicUsize,
    cap: usize,
    data: str,
}

as I get to the point where alloc::alloc(layout) as *mut Inner but it's rejected due to casting a thin-pointer to a fat-pointer. And in fact, even how I define the layout is dubious. Three usize's and cap bytes after? Can't determine it directly due to the entire struct being unsized now.

Otherwise, I have a little prototype for a version that doesn't point directly to a str in the middle of the struct: https://gist.github.com/novacrazy/6541c9000b12bed7f55edcbc0b4eea1b

Which does generate offset instructions on deref, unfortunately.

@matklad
Copy link
Contributor Author

matklad commented Sep 22, 2021

How does one even allocate a struct like this?

In today's Rust, via toil, pain, and suffering.

One way is to make it just struct Inner { rc: AtomicUsize, cap: usize } and just maintain the invariant that whatever follows the strut in memory is str.

If you want to keep it a proper DST you might want to look at https://github.com/rust-analyzer/rowan/blob/master/src/arc.rs which does a similar thing (the code originated in triomphe).

@CAD97 also authored https://github.com/CAD97/pointer-utils which is a bunch of proper safe abstractions around these ideas.

@NobodyXu
Copy link
Contributor

NobodyXu commented Sep 23, 2021

@novacrazy You can take a look at triomphe, which povides a Arc and ThinArc (can store DST with the same size of Arc) that can store ad create DST easily and safely.

@ParkMyCar
Copy link
Owner

I think this is a great idea, I'm wondering if we can store the ref count on the stack though, ideally make cloning even faster, e.g.

pub struct ArcStr {
    ref_count: AtomicUsize,
    pointer: *const u8,
    len: usize,
}

@CAD97
Copy link
Contributor

CAD97 commented Sep 23, 2021

if we can store the ref count on the stack though

Uh, no. The whole point of the ref count is that it's shared, as it tracks how many owners exist to the heap memory, and all owners need to see consistent state. The ref count needs to be shared, thus on the heap.

@ParkMyCar
Copy link
Owner

Uh, no. The whole point of the ref count is that it's shared, as it tracks how many owners exist to the heap memory, and all owners need to see consistent state. The ref count needs to be shared, thus on the heap.

Lol great point, I forgot about that

@ParkMyCar
Copy link
Owner

Closing this out because a CompactStr is now mutable! We no longer use atomic reference counting, although the mutable ArcString that was created does still exist in the code base in repr::arc.

I switched from reference counted strings to a normal heap/box string after reading about C++'s woes in making reference counted strings, this series of blog posts goes into detail:

  1. Reference counting Part I
  2. Reference counting Part II
  3. Reference counting Part III

I'm not opposed to introducing a mutable reference counted version of CompactStr, maybe CompactArcStr, in the future. Just requires a bit of work

nottirb pushed a commit to nottirb/compact_str that referenced this issue May 15, 2022
@matklad
Copy link
Contributor Author

matklad commented Jun 18, 2022

I switched from reference counted strings to a normal heap/box string after reading about C++'s woes in making reference counted strings, this series of blog posts goes into detail:

Not my fastest reply ever but … it seems that issues raised in those posts are C++ specific and are not applicable to Rust? I am 0.95 sure that Rust’s COW string would work just fine without any locks. The Arc::make_mut is basically all we need here.

The specific thing which would be different between C++ and Rust is that Rust wouldn’t need the bMarkUnshareable runtime flag —- in rust, this info would be tracked by the compiler, that’s basically &mut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants