You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#single_byte_optimizable? is a private helper method that returns true when the size and bytesize of a string are identical. It is similar to #ascii_only? but does not guarantee that the byte values are valid code points (ASCII characters use only 7 bits).
This method is commonly used to optimize algorithms that need to iterate a string's contents and can be more efficient when it does not account for multibyte characters.
Such algorithms are not limited to stdlib applications and can be found in user code as well. So I think it would be helpful to expose this method in the public API to make it usable elsewhere. Otherwise, shard authors tend to use the less efficient #ascii_only? (unless #12020 gets implemented, but even then the semantics of #single_byte_optimizable? might be preferable).
The text was updated successfully, but these errors were encountered:
Now maybe might be a good point to consider embedding this information in a string. I forgot ascii_only? is not just checking for @bytesize == size 😞
That said, what prevents users from checking string.bytesize == string.size? Nothing. I don't think we should expose single_byte_optimizable? (there's no such thing in Ruby either)
I feel like exposing this explicitly would also further proliferate the dual use of String as byte array in addition to a character array. I still would highly prefer making Bytes the most convenient and efficient for those.
In my ideal world we would return to validating input to be valid UTF-8 on String creation, then ascii_only? could indeed just be size == bytesize. A path toward that could indeed be tracking whether the string is valid UTF-8 as a, let's call it dirty flag for now. Then there could be a happy path of return size == bytesize unless dirty.
#single_byte_optimizable?
is a private helper method that returns true when the size and bytesize of a string are identical. It is similar to#ascii_only?
but does not guarantee that the byte values are valid code points (ASCII characters use only 7 bits).This method is commonly used to optimize algorithms that need to iterate a string's contents and can be more efficient when it does not account for multibyte characters.
Such algorithms are not limited to stdlib applications and can be found in user code as well. So I think it would be helpful to expose this method in the public API to make it usable elsewhere. Otherwise, shard authors tend to use the less efficient
#ascii_only?
(unless #12020 gets implemented, but even then the semantics of#single_byte_optimizable?
might be preferable).The text was updated successfully, but these errors were encountered: