Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share immutable chunks externally #47

Closed
calmofthestorm opened this issue Dec 3, 2021 · 8 comments
Closed

Share immutable chunks externally #47

calmofthestorm opened this issue Dec 3, 2021 · 8 comments

Comments

@calmofthestorm
Copy link

Is it possible to get a copy-on-write reference to a chunk? Or alternatively, a rope that guarantees it only ever consists of a single chunk?

The context here is a struct which needs a memory contiguous string for some operations, but to produce a rope containing that string for other operations. Currently, I am storing both a Rope and a String that have the same content, but this isn't ideal.

@cessen
Copy link
Owner

cessen commented Dec 5, 2021

The context here is a struct which needs a memory contiguous string for some operations, but to produce a rope containing that string for other operations.

I'm not 100% sure I understand your use case here. Do you mean you want something that's (essentially) actually a String but that you can pass to functions as if it were a Rope (presumably just for read-only operations)?

@calmofthestorm
Copy link
Author

That's essentially it. As a (simplified) analogy, if I consider Rope a list of Arc<String>, I'd like to be able to write code that shares the Arc<String> with the Rope. This means I can pass &str to a function which requires contiguous storage, but can also append that same string to a Rope, without needing double the memory.

At present, I either double store or have a function that checks if myrope.chunks().count() is >1, in which case it calls to_string, but if it is == 1, I can instead just pass the single chunk to a function that takes &str. This works well in practice because Rope tends to keep them in a single chunk in this size range.

Having looked through the design more, however, I think I can understand why this might not work that well. AIUI, if this were implemented, the Rope might need to change the node in subsequent operations, causing a copy-on-clone, so now it and my code each have their own copy, with an end result of double storage anyway in many cases. Workarounds are presumably possible, but would introduce other complexity. Does that sound right?

@cessen
Copy link
Owner

cessen commented Dec 5, 2021

Ah, I see. So you don't want a String that can behave as a Rope, rather you want a shared reference into a contiguous part of a Rope but that doesn't prevent the Rope itself from being mutated. Is that right?

(Sorry for the back-and-forth. I just want to make sure I understand what you're aiming for before I try to suggest things, ha ha.)

@calmofthestorm
Copy link
Author

Yes, that is correct.

@cessen
Copy link
Owner

cessen commented Dec 6, 2021

Got it. In that case, I think what you're really looking for is some kind of Rope clone. The trivial approach would be to just clone the whole Rope, but then you'll get a lot of duplicated chunks anyway as the original Rope gets modified. Ideally what you want is a way to tell Ropey "make a Rope clone out of just this chunk". At the moment there isn't a way to do this, but implementing it would be pretty straightforward.

Having said that, I'd rather not grow Ropey's API surface any further if I can reasonably avoid it, particularly for something that (at least at first glance) seems kind of niche. So what kind of memory savings are we talking about here? Are you making tens of thousands of these (currently cloned) strings?

Edit: and how large are these strings typically? Are they usually about as large as a chunk (on the order of 500-1000 bytes)?

@calmofthestorm
Copy link
Author

Looking at the distribution of sizes, there are high 1,000s - low 10,000s of strings in question, but essentially all are under 500 bytes. As such, it seems easy enough for me to just write a wrapper around the rope that checks the chunk count and saves just the rope if it is 1, or both the rope and a string if it is more. This is essentially a type that does the same thing as several of my existing processing functions that special case the 1-chunk rope. Duplicating the occasional odd long string seems fine.

@cessen
Copy link
Owner

cessen commented Dec 10, 2021

Yeah, that sounds like a reasonable approach. Sorry I couldn't be of any more help. If you run into anything else, please don't hesitate to file another issue!

@calmofthestorm
Copy link
Author

It seems to work so far, so I'm making progress and I'm happy. Thanks for taking the time to understand my use case thoroughly and help identify the best way forward. Also, FWIW, I appreciate avoiding scope creep in APIs, and doing that means making hard calls on what to include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants