Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make `startswith`, `endswith` work with `Regex` #29790

Merged
merged 17 commits into from Feb 1, 2019

Conversation

@dalum
Copy link
Contributor

commented Oct 24, 2018

The rationale behind this PR is two-fold.

The first is motivated by a real use鈥揷ase of checking the file extension of a file, where something like occursin(r"\.skf$"i, filename) seems to employ a "reverse query" logic to me that reduces readability: does this pattern, which must be at the end, occur in filename? Compared with: endswith(filename, r"\.skf"i): does filename have this pattern at the end?

The second is an argument about genericity, where a function such as occurs_not_endswith(pat, s) = occursin(pat, s) && !endswith(s, pat) presently wouldn't work for pat::Regex. I don't have a use for this, but it seems to me that it should work.

As an aside, the implementation here allows use such as endswith("abc\ndef", r"c"m) to check if any line in the string ends with "c".

The implementation of course has the slight overhead that if passed an already compiled Regex, this will generate and compile a new Regex on every call, but I would think that the cases where this is a real issue are not that frequent. And users who need this little bit of extra speed could probably be expected to understand the caveats of using a convenience function like the ones introduced here. 馃檪

@dalum

This comment has been minimized.

Copy link
Contributor Author

commented Oct 25, 2018

From suggestion by @ScottPJones, I changed the methods to use ANCHORED and ENDANCHORED in the compile options instead of manipulating the pattern. I had to add ENDANCHORED to the allowed COMPILE_MASK in pcre.jl, and I have no clue if there was a reason it was left out or not. At least the tests are passing locally, so hopefully there is no deeper meaning to its omission.

test/regex.jl Outdated Show resolved Hide resolved
@StefanKarpinski
Copy link
Member

left a comment

Looks great. I like the version with anchor options very much.

@dalum

This comment has been minimized.

Copy link
Contributor Author

commented Oct 25, 2018

The Windows 32 build is failing over something related to profiling. Can't quite figure out if it is related to this PR. 馃し鈥嶁檧

@fredrikekre
Copy link
Member

left a comment

Add some simple docstrings?

@fredrikekre fredrikekre added the strings label Oct 25, 2018

base/regex.jl Outdated Show resolved Hide resolved
@stev47

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2018

Isn't the weirdness of occursin(r"\.skf$"i, filename) mostly a naming problem? A pattern doesn't "occur" in a string, it "matches" a string.
Extending methods startswith and endswith (that are used since the regex counterpart would be slower) to incorporate functionality that regex already delivers on might be unnecessary.
Thinking out loudly: somebody reading a regex pattern will also understand that one extra character ^ or $.

@KristofferC
Copy link
Contributor

left a comment

I think we need to address the 100x slowdown over occursin.

base/regex.jl Show resolved Hide resolved
base/regex.jl Show resolved Hide resolved
chethega and others added 2 commits Oct 28, 2018
Update base/regex.jl
Co-Authored-By: dalum <17059936+dalum@users.noreply.github.com>
Update base/regex.jl
Co-Authored-By: dalum <17059936+dalum@users.noreply.github.com>
base/regex.jl Outdated Show resolved Hide resolved
@dalum

This comment has been minimized.

Copy link
Contributor Author

commented Oct 30, 2018

With the latest changes thanks to @chethega, this version is now only 1.3x slower than using occursin on my machine when the pattern doesn't match, and about 2x slower when the pattern matches. I will add a note to the docstrings mentioning that occursin should be preferred in performance critical situations.

dalum added 2 commits Oct 30, 2018
base/regex.jl Outdated Show resolved Hide resolved
base/regex.jl Outdated Show resolved Hide resolved

1.3x seems good enough :)

dalum added 3 commits Nov 3, 2018
Update regex.jl
Add newline
@dalum

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

Error messages during bootstrap are obscure. This bit meant that it failed to interpolate a $ inside a docstring: 馃槙

Core.CodeInfo(code=Array{Any, (2,)}[
  Expr(:call, :include, "regex.jl"),
  Expr(:return, nothing)], codelocs=Array{Int32, (2,)}[1, 1], method_for_inference_limit_heuristics=nothing, ssavaluetypes=2, linetable=Array{Any, (1,)}[Core.LineInfoNode(mod=Base, method=Symbol("top-level scope"), file=Symbol("Base.jl"), line=207, inlined_at=0)], ssaflags=Array{UInt8, (0,)}[], slotflags=Array{UInt8, (0,)}[], slotnames=Array{Any, (0,)}[], inferred=false, inlineable=false, propagate_inbounds=false, pure=false)Core.CodeInfo(code=Array{Any, (3,)}[
  Expr(:call, Core.getproperty, :Core, :(:include)),
  Expr(:call, SSAValue(1), :Main, "Base.jl"),
  Expr(:return, nothing)], codelocs=Array{Int32, (3,)}[1, 1, 1], method_for_inference_limit_heuristics=nothing, ssavaluetypes=3, linetable=Array{Any, (1,)}[Core.LineInfoNode(mod=Main, method=Symbol("top-level scope"), file=Symbol("sysimg.jl"), line=3, inlined_at=0)], ssaflags=Array{UInt8, (0,)}[], slotflags=Array{UInt8, (0,)}[], slotnames=Array{Any, (0,)}[], inferred=false, inlineable=false, propagate_inbounds=false, pure=false)
@dalum

This comment has been minimized.

Copy link
Contributor Author

commented Feb 1, 2019

@StefanKarpinski @fredrikekre Ping! Tests are passing. I think it's ready now. 馃帀

@StefanKarpinski StefanKarpinski merged commit 2b4f003 into JuliaLang:master Feb 1, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dalum dalum deleted the dalum:dalum/regex branch Feb 2, 2019

fredrikekre added a commit that referenced this pull request Feb 6, 2019
fredrikekre added a commit that referenced this pull request Feb 6, 2019
@fredrikekre fredrikekre referenced this pull request Feb 6, 2019
fredrikekre added a commit that referenced this pull request Feb 6, 2019
doc: misc updates (#30977)
compat annotations, news and manual updates for #29790, #30496 and #30915.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can鈥檛 perform that action at this time.