Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow string split by char with yield #2541

Closed
wants to merge 1 commit into from

Conversation

kostya
Copy link
Contributor

@kostya kostya commented May 2, 2016

sometimes this is needed for speed

@asterite
Copy link
Member

asterite commented May 2, 2016

Indeed, yielding slices will be much more efficient because no copying is needed. However, I would expect split with a block to yield strings, not slices.

Maybe we can have a slice_split (or another, better name) for this. Then slice can invoke slice_split and build strings out of it. What do you think?

@lbguilherme
Copy link
Contributor

A nicer alternative could be having a readonly StringView class that implements mostly the same api as String. Similar to returning a slice, but it would contain chars, not bytes. It would have to hold a pointer to the original buffer and a pair offsets. A problem though (with slices too) is that it can get unsafe if the string is changed. (It may call realloc to reduce the buffer size)

@kostya
Copy link
Contributor Author

kostya commented May 2, 2016

added split_slice,
@lbguilherme i think there is no problem, because strings immutable

@asterite
Copy link
Member

asterite commented May 2, 2016

Yes, something like a StringView would be ideal. Maybe Slice(UInt8) is already that type. We definitely have to think this a bit more.

@lbguilherme
Copy link
Contributor

I didn't notice before that String is immutable. Maybe String could be the StringView class. Currently a String stores a pointer to the data and a size (am I right here?), it could also store an offset in that buffer, so we could reuse the same buffer in multiple strings and the GC will take care of freeing it only when all of the strings are gone. The advantage is that there wouldn't be an extra class for this, would be a transparent optimization. What do you think @asterite?

@asterite
Copy link
Member

asterite commented May 2, 2016

@lbguilherme There are two issues:

  1. A string also stores a lazily computed UTF-8 size (and in the future I'd maybe like to store it's hash value so a hash with string keys is faster). All strings also have a trailing \0 char so that they can be just just-as-is to C functions. A view won't have this \0 char, nor a computed size.
  2. Keeping substrings of a bigger string by using pointers isn't always good. Sometimes you have a big text and only keep a small substring, and the bigger string won't be able to be freed because it's internal data is referenced by the program. Java had this problem.

@jhass
Copy link
Member

jhass commented May 29, 2016

I don't quite follow the public API usecases for split_slice, it feels like it's only useful for when you have binary/non-UTF8 data in a String, which shouldn't happen in the first place. What am I missing?

@kostya
Copy link
Contributor Author

kostya commented May 29, 2016

useful when you reuse parts of string, instead of create new, example: https://github.com/kostya/simple_idn/blob/master/src/simple_idn.cr#L231 (here i implement this method by myself)

@asterite
Copy link
Member

I'd eventually like to have String-like methods for Bytes, but adding these directly to String feels too low-level and different than other String methods, so for now I'll close this until we find a better way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants