Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StringScanner#read_char and #read_byte #11785

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
28 changes: 28 additions & 0 deletions spec/std/string_scanner_spec.cr
Original file line number Diff line number Diff line change
Expand Up @@ -274,3 +274,31 @@ describe StringScanner, "#terminate" do
s.eos?.should eq(true)
end
end

describe StringScanner, "#read_byte" do
it "returns one byte from the current offset and advances the offset" do
s = StringScanner.new("あ")
s.read_byte.should eq 227
s.offset.should eq 1
s.read_byte.should eq 129
s.offset.should eq 2
s.read_byte.should eq 130
s.offset.should eq 3
s.read_byte.should be_nil
s.offset.should eq 3
s.eos?.should eq(true)
end
end

describe StringScanner, "#read_char" do
it "returns a char from the current offset and advances the offset" do
s = StringScanner.new("ab")
s.read_char.should eq 'a'
s.offset.should eq 1
s.read_char.should eq 'b'
s.offset.should eq 2
s.read_byte.should be_nil
s.offset.should eq 2
s.eos?.should eq(true)
end
end
37 changes: 35 additions & 2 deletions src/string_scanner.cr
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@
# * `#scan_until`
# * `#skip`
# * `#skip_until`
# * `#read_byte`
# * `#read_char`
#
# Methods that look ahead:
# * `#peek`
Expand Down Expand Up @@ -68,12 +70,12 @@ class StringScanner
# Sets the *position* of the scan offset.
def offset=(position : Int)
raise IndexError.new unless position >= 0
@byte_offset = @str.char_index_to_byte_index(position) || @str.bytesize
@byte_offset = Math.min(position, @str.bytesize)
end

# Returns the current position of the scan offset.
def offset : Int32
@str.byte_index_to_char_index(@byte_offset).not_nil!
@byte_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this representative of the character offset, and not byte_offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but it raises error when calling #read_byte to a multibyte character then calling #offset in the current implementation.
I concern this behavior is expected or not.

require "string_scanner"

s = StringScanner.new("")
s.read_byte
s.offset #=> Unhandled exception: Nil assertion failed (NilAssertionError)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think this change would be a breaking change so in theory we can'd do this.

However, I consider the existing definition of offset to be incorrect. offset should actually return the byte offset because that's more useful, and it's the only correct thing we can return if one can advance byte per byte. So we can consider this change a bugfix instead of a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful to who? Isn't it pretty useful to use the same index values when parsing Strings as the Strings themselves use when indexing with String#[]?

Copy link
Contributor

@yxhuvud yxhuvud Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the suggested change is inconsistent with offset=, so if this change is wanted then that also needs to be updated.

end

# Tries to match with *pattern* at the current position. If there's a match,
Expand Down Expand Up @@ -280,6 +282,37 @@ class StringScanner
@str.byte_slice(@byte_offset, @str.bytesize - @byte_offset)
end

# Returns one byte from the current offset.
# ```
# require "string_scanner"
#
# s = StringScanner.new("あ")
# s.read_byte # => 227
# s.read_byte # => 129
# s.read_byte # => 130
# ```
def read_byte : UInt8?
return nil if eos?
s = @str.byte_at(@byte_offset)
@byte_offset += 1
s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using one letter variable names, please.

Suggested change
s = @str.byte_at(@byte_offset)
@byte_offset += 1
s
byte = @str.byte_at(@byte_offset)
@byte_offset += 1
byte

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I corrected it.

end

# Returns one char from the current offset.
# ```
# require "string_scanner"
#
# s = StringScanner.new("ab")
# s.read_char # => 'a'
# s.read_char # => 'b'
# ```
def read_char : Char?
reader = Char::Reader.new(@str, @byte_offset)
c = reader.current_char
@byte_offset += c.bytesize
c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Suggested change
c = reader.current_char
@byte_offset += c.bytesize
c
char = reader.current_char
@byte_offset += c.bytesize
char

end

# Writes a representation of the scanner.
#
# Includes the current position of the offset, the total size of the string,
Expand Down