Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a Regex from a string matching this string literally (with escaping) #31989

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented May 10, 2019

EDIT: convert(Regex, s) renamed to escape_regex(s). I like to believe that two thumbs-down were due to the old name.
This was suggested by @StefanKarpinski, so putting it here for discussion. Currently, the way to match a string s exactly is to write r"" * s, with this PR we can now write convert(Regex, s), which is more direct, if not terser or much faster.

@rfourquet rfourquet added the domain:strings "Strings!" label May 10, 2019
@KristofferC
Copy link
Sponsor Member

KristofferC commented May 10, 2019

I don't think this should pun on convert (convert is especially bad since it is implicitly called in many places).

@rfourquet
Copy link
Member Author

It didn't strike me as a pun... do you see Regex and String as too different kind of objects for being convertible to each other?

@fredrikekre
Copy link
Member

It is pretty weird that e.g. this would not be an error anymore:

julia> r = [r"a", r"b"]
2-element Array{Regex,1}:
 r"a"
 r"b"

julia> push!(r, "c")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Regex

@KristofferC
Copy link
Sponsor Member

do you see Regex and String as too different kind of objects for being convertible to each other?

Yeah, this should in my opinion use some kind of constructor.

@rfourquet
Copy link
Member Author

I renamed the function to escape_regex as suggested in #29643, which solves the same problem (IIUC), except that here we produce directly a Regex instead of an escaped string. Comments welcome.

@KristofferC
Copy link
Sponsor Member

Perhaps it is worth updating the title of the PR?

@rfourquet rfourquet changed the title add convert(::Type{Regex}, ::Union{AbstractString,AbstractChar}) create a Regex from a string matching this string literally (with escaping) Aug 20, 2019
@StefanKarpinski
Copy link
Sponsor Member

Regarding naming, this is a bit different than escape_string:

  • escape_string takes a string value and gives the input form one would need to write in order to evaluate back to the input value
  • unescape_string takes the input form for strings and converts it to the string value that would evaluate to
  • Base.shell_escape takes a raw value and "neuters" it so that it no longer contains any values with special meaning to the shell
  • Base.shell_parse is the inverse function from Base.shell_escape

The proposed functionality is more like Base.shell_escape than it is like escape_string. There is a different function that would be more like escape_string: it would take a regular expression and produce the input form for that regular expression—of course, we don't generally need that since we save the input form inside of the Regex object.

I would argue that I named Base.shell_escape poorly and it should have been called Base.shell_quote and that this function should be named analogously to that: regex_quote rather than escape_regex. Note that escape and unescape functions are natural pairs, whereas the inverse of quote is parse—and the inverse form of Base.shell_escape is Base.shell_parse. Oops.

So how can we get there? Base.shell_escape is not an exported function, so we can technically just rename it. That might break a bunch of things since I suspect quite a few packages and codes out there use this internal function, so maybe it would be better to deprecate it for an exported shell_quote function.

@StefanKarpinski
Copy link
Sponsor Member

Oh, another "oops" is that the actual inverse of Base.shell_escape is Base.shell_split and they're not perfect inverses: Base.shell_escape quotes a single string as a shell word while Base.shell_split takes an entire shell command and parses it into unquoted shell words. Perhaps we can rationalize that issue as well. Sorry to pull the artist tentatively known as escape_regex into the mess I made long ago with Base.shell_escape but it would be good to get the terminology right before introducing an official API here.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 21, 2019

It's fairly unlikely scripts should be using Base.shell_escape, since it's only the opposite of Base.shell_parse and will not protect you against shell injection in general (as you say, it quotes it, not escapes it). The relevant escaping function is currently Base.shell_escape_posixly for an arbitrary posix shell, and that may be augmented with Base.shell_escape_dosly and Base.shell_escape_powerfully in the future.

@mlhetland
Copy link
Contributor

As mentioned in #29643, it might be worthwhile to have a function for producing a string or string-like object with escapes in place, as this could have other uses (e.g., further manipulation) than simply searching directly. Producing the regex directly is mainly an optimization, which could be a variant of the function or which one could handle just like compiling regexen in general? Just a thought.

@rfourquet
Copy link
Member Author

Sorry to pull the artist tentatively known as escape_regex into the mess

No worries, I'm glad that you look into the issue. I'm not familiar with the other stringy functions so can't contribute to the naming discussion right now, but will happily trust your judgment :)

it might be worthwhile to have a function for producing a string or string-like object with escapes in place

I didn't expect it would be needed, but thank you for mentioning this point. It's currently possible via Base.wrap_string(s, zero(UInt32), so it will basically be a matter of renaming wrap_string to a name that we want to export.

@StefanKarpinski
Copy link
Sponsor Member

Good point, @mlhetland: I actually think that regex_quote or whatever we call it, should probably be a function from String to String so that the way to get a pattern that matches a string literally would be Regex(regex_quote(s)). That's slightly verbose but it seems like an odd thing to do anyway, so I think that's ok.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Aug 22, 2019

I should probably open a separate issue for the shell_quote stuff. (Done: #33024)

@fredrikekre
Copy link
Member

I actually think that regex_quote or whatever we call it, should probably be a function from String to String

I agree, thats what we have in Documenter: https://github.com/JuliaDocs/Documenter.jl/blob/22639383221b89921e44a93650080efda807fc36/src/Utilities/Utilities.jl#L13 + https://github.com/JuliaDocs/Documenter.jl/blob/22639383221b89921e44a93650080efda807fc36/src/Utilities/Utilities.jl#L23-L24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants