Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add
StringScanner#read_char
and#read_byte
#11785base: master
Are you sure you want to change the base?
Add
StringScanner#read_char
and#read_byte
#11785Changes from 1 commit
d276f5f
96a8fd3
45d0506
ac6bde4
1204472
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this representative of the character offset, and not
byte_offset
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but it raises error when calling
#read_byte
to a multibyte character then calling#offset
in the current implementation.I concern this behavior is expected or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think this change would be a breaking change so in theory we can'd do this.
However, I consider the existing definition of
offset
to be incorrect.offset
should actually return the byte offset because that's more useful, and it's the only correct thing we can return if one can advance byte per byte. So we can consider this change a bugfix instead of a breaking change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful to who? Isn't it pretty useful to use the same index values when parsing
String
s as theString
s themselves use when indexing withString#[]
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the suggested change is inconsistent with
offset=
, so if this change is wanted then that also needs to be updated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what situation is this something that is reasonable (rather than reading a full utf8 character)? The only situations I can think of is where the data isn't actually a valid string, and I'd argue that if that is the case a solution working directly on
Slice
would be more appropriate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that if we go with this, we should return
UInt8?
, notString?
But yes, it would be nice to know the actual use case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to return
UInt8?
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit unintuitive that
read_char
returns aString?
and not aChar?
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to return
Char?
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/./
might actually miss a few characters. The safest way is to do it in Crystal with something likeString#char_bytesize_at
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
/./m
?/./
misses newline characters, so changing it multiline mode, it detects newline characters. It also changes behaviors of^
and$
, but it doesn't matter.