-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StringScanner#read_char
and #read_byte
#11785
base: master
Are you sure you want to change the base?
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -38,6 +38,8 @@ | |||||||||||||
# * `#scan_until` | ||||||||||||||
# * `#skip` | ||||||||||||||
# * `#skip_until` | ||||||||||||||
# * `#read_byte` | ||||||||||||||
# * `#read_char` | ||||||||||||||
# | ||||||||||||||
# Methods that look ahead: | ||||||||||||||
# * `#peek` | ||||||||||||||
|
@@ -68,12 +70,12 @@ class StringScanner | |||||||||||||
# Sets the *position* of the scan offset. | ||||||||||||||
def offset=(position : Int) | ||||||||||||||
raise IndexError.new unless position >= 0 | ||||||||||||||
@byte_offset = @str.char_index_to_byte_index(position) || @str.bytesize | ||||||||||||||
@byte_offset = Math.min(position, @str.bytesize) | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
# Returns the current position of the scan offset. | ||||||||||||||
def offset : Int32 | ||||||||||||||
@str.byte_index_to_char_index(@byte_offset).not_nil! | ||||||||||||||
@byte_offset | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
# Tries to match with *pattern* at the current position. If there's a match, | ||||||||||||||
|
@@ -280,6 +282,37 @@ class StringScanner | |||||||||||||
@str.byte_slice(@byte_offset, @str.bytesize - @byte_offset) | ||||||||||||||
end | ||||||||||||||
|
||||||||||||||
# Returns one byte from the current offset. | ||||||||||||||
# ``` | ||||||||||||||
# require "string_scanner" | ||||||||||||||
# | ||||||||||||||
# s = StringScanner.new("あ") | ||||||||||||||
# s.read_byte # => 227 | ||||||||||||||
# s.read_byte # => 129 | ||||||||||||||
# s.read_byte # => 130 | ||||||||||||||
# ``` | ||||||||||||||
def read_byte : UInt8? | ||||||||||||||
return nil if eos? | ||||||||||||||
s = @str.byte_at(@byte_offset) | ||||||||||||||
@byte_offset += 1 | ||||||||||||||
s | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid using one letter variable names, please.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I corrected it. |
||||||||||||||
end | ||||||||||||||
|
||||||||||||||
# Returns one char from the current offset. | ||||||||||||||
# ``` | ||||||||||||||
# require "string_scanner" | ||||||||||||||
# | ||||||||||||||
# s = StringScanner.new("ab") | ||||||||||||||
# s.read_char # => 'a' | ||||||||||||||
# s.read_char # => 'b' | ||||||||||||||
# ``` | ||||||||||||||
def read_char : Char? | ||||||||||||||
reader = Char::Reader.new(@str, @byte_offset) | ||||||||||||||
c = reader.current_char | ||||||||||||||
@byte_offset += c.bytesize | ||||||||||||||
c | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto
Suggested change
|
||||||||||||||
end | ||||||||||||||
|
||||||||||||||
# Writes a representation of the scanner. | ||||||||||||||
# | ||||||||||||||
# Includes the current position of the offset, the total size of the string, | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this representative of the character offset, and not
byte_offset
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but it raises error when calling
#read_byte
to a multibyte character then calling#offset
in the current implementation.I concern this behavior is expected or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think this change would be a breaking change so in theory we can'd do this.
However, I consider the existing definition of
offset
to be incorrect.offset
should actually return the byte offset because that's more useful, and it's the only correct thing we can return if one can advance byte per byte. So we can consider this change a bugfix instead of a breaking change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useful to who? Isn't it pretty useful to use the same index values when parsing
String
s as theString
s themselves use when indexing withString#[]
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the suggested change is inconsistent with
offset=
, so if this change is wanted then that also needs to be updated.