New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better docs, and optimizations for the chop functions #53242
Changes from all commits
234eb84
21ba90b
dea529f
fa488f2
a0f442a
4706865
5dc505a
2872def
6873d7f
d486f9c
4aaf754
37d4760
7c6cb20
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -363,20 +363,36 @@ function endswith(s::SubString{String}, r::Regex) | |
return PCRE.exec_r(r.regex, s, 0, r.match_options | PCRE.ENDANCHORED) | ||
end | ||
|
||
|
||
# https://stackoverflow.com/questions/5663987/how-to-properly-escape-characters-in-regexp | ||
_is_regex_escape(c) = in(c, raw"()[{*+.$^\|?") | ||
|
||
function chopprefix(s::AbstractString, prefix::Regex) | ||
pattern = prefix.pattern | ||
|
||
# fast path | ||
startswith(s, pattern) && any(!_is_regex_escape, pattern) && return chopprefix(s, pattern) | ||
|
||
m = match(prefix, s, firstindex(s), PCRE.ANCHORED) | ||
m === nothing && return SubString(s) | ||
return SubString(s, ncodeunits(m.match) + 1) | ||
end | ||
|
||
function chopsuffix(s::AbstractString, suffix::Regex) | ||
pattern = suffix.pattern | ||
|
||
# fast path for regexes meant for file endings | ||
startswith(pattern, raw"\.") && any(!_is_regex_escape, @view pattern[3:end]) && endswith(s, @view pattern[2:end]) && return chopsuffix(s, @view pattern[2:end]) | ||
|
||
# more general fast path | ||
endswith(s, pattern) && any(!_is_regex_escape, pattern) && return chopsuffix(s, pattern) | ||
Comment on lines
+385
to
+388
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These "fast paths" are still of dubious quality (combinations of startswith/endswith with a recursive As stated before, I'd merge the docstring improvements, but not these code changes, at least not with some thorough benchmarking data showing how (much) they improve cases were they apply, and conversely how they affect cases were they don't apply. |
||
|
||
m = match(suffix, s, firstindex(s), PCRE.ENDANCHORED) | ||
m === nothing && return SubString(s) | ||
isempty(m.match) && return SubString(s) | ||
return SubString(s, firstindex(s), prevind(s, m.offset)) | ||
end | ||
|
||
|
||
""" | ||
match(r::Regex, s::AbstractString[, idx::Integer[, addopts]]) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you mean
all
instead ofany
? Also why is theendswith
needed?