Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better docs, and optimizations for the chop functions #53242

Closed
wants to merge 13 commits into from
18 changes: 17 additions & 1 deletion base/regex.jl
Expand Up @@ -363,20 +363,36 @@ function endswith(s::SubString{String}, r::Regex)
return PCRE.exec_r(r.regex, s, 0, r.match_options | PCRE.ENDANCHORED)
end


# https://stackoverflow.com/questions/5663987/how-to-properly-escape-characters-in-regexp
_is_regex_escape(c) = in(c, raw"()[{*+.$^\|?")

function chopprefix(s::AbstractString, prefix::Regex)
pattern = prefix.pattern

# fast path
startswith(s, pattern) && any(!_is_regex_escape, pattern) && return chopprefix(s, pattern)

m = match(prefix, s, firstindex(s), PCRE.ANCHORED)
m === nothing && return SubString(s)
return SubString(s, ncodeunits(m.match) + 1)
end

function chopsuffix(s::AbstractString, suffix::Regex)
pattern = suffix.pattern

# fast path for regexes meant for file endings
startswith(pattern, raw"\.") && any(!_is_regex_escape, @view pattern[3:end]) && endswith(s, @view pattern[2:end]) && return chopsuffix(s, @view pattern[2:end])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean all instead of any? Also why is the endswith needed?


# more general fast path
endswith(s, pattern) && any(!_is_regex_escape, pattern) && return chopsuffix(s, pattern)
Comment on lines +385 to +388
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "fast paths" are still of dubious quality (combinations of startswith/endswith with a recursive chopsuffix call). They also slow down the general case.

As stated before, I'd merge the docstring improvements, but not these code changes, at least not with some thorough benchmarking data showing how (much) they improve cases were they apply, and conversely how they affect cases were they don't apply.


m = match(suffix, s, firstindex(s), PCRE.ENDANCHORED)
m === nothing && return SubString(s)
isempty(m.match) && return SubString(s)
return SubString(s, firstindex(s), prevind(s, m.offset))
end


"""
match(r::Regex, s::AbstractString[, idx::Integer[, addopts]])

Expand Down
10 changes: 10 additions & 0 deletions base/strings/util.jl
Expand Up @@ -224,6 +224,8 @@ end
chopprefix(s::AbstractString, prefix::Union{AbstractString,Regex}) -> SubString

Remove the prefix `prefix` from `s`. If `s` does not start with `prefix`, a string equal to `s` is returned.
Note that `prefix` can also be a regular expression, which makes this quite powerful. For example, one can
use such to make the prefix comparison case-insensitive, with the i-modifier, see below.

See also [`chopsuffix`](@ref).

Expand All @@ -237,6 +239,9 @@ julia> chopprefix("Hamburger", "Ham")

julia> chopprefix("Hamburger", "hotdog")
"Hamburger"

julia> chopprefix("HTTP://abc.com", r"https?://"i) # The regex with i, and "s?", to also match e.g. http and httpS (the secure variant)
"abc.com"
```
"""
function chopprefix(s::AbstractString, prefix::AbstractString)
Expand Down Expand Up @@ -265,6 +270,8 @@ end
chopsuffix(s::AbstractString, suffix::Union{AbstractString,Regex}) -> SubString

Remove the suffix `suffix` from `s`. If `s` does not end with `suffix`, a string equal to `s` is returned.
Note that `suffix` can also be a regular expression, which makes this quite powerful. For example, one can
use such to make the suffix comparison case-insensitive, with the i-modifier, see below.

See also [`chopprefix`](@ref).

Expand All @@ -278,6 +285,9 @@ julia> chopsuffix("Hamburger", "er")

julia> chopsuffix("Hamburger", "hotdog")
"Hamburger"

julia> chopsuffix("my_Julia_program.jl", r"\.jl"i) # The regex with i, to also match e.g. .JL
"my_Julia_program"
```
"""
function chopsuffix(s::AbstractString, suffix::AbstractString)
Expand Down