Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add chopprefix, chopsuffix #40995

Merged
merged 3 commits into from
Nov 18, 2021
Merged

add chopprefix, chopsuffix #40995

merged 3 commits into from
Nov 18, 2021

Conversation

kcajf
Copy link
Contributor

@kcajf kcajf commented May 28, 2021

Attempt to address #38477

@dkarrasch dkarrasch added the domain:strings "Strings!" label May 29, 2021
@stevengj
Copy link
Member

Maybe chophead and choptail?

"""
function chopprefix(s::AbstractString, prefix::AbstractString)
if startswith(s, prefix)
SubString(s, nextind(s, lastindex(prefix), 1), lastindex(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be correct if s and prefix are different subtypes of AbstractString, since in that case lastindex(prefix) may not be a correct index into s.

Copy link
Member

@stevengj stevengj May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we need two methods: one for AbstractString arguments, and an optimized method for Union{String,SubString{String}} arguments. (In the latter case you can just use 1+ncodeunits(prefix) as the starting index.)

if isempty(s) || !endswith(s, suffix)
SubString(s)
else
SubString(s, firstindex(s), prevind(s, lastindex(s), length(suffix)))
Copy link
Member

@stevengj stevengj May 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine in principle, but is a little suboptimal because length(suffix) and prevind require two extra O(length(suffix)) loops. In the case where both s and suffix are of type String or SubString{String}, one option is to use thisind(s, ncodeunits(s) - ncodeunits(suffix)).

(As above, we probably want two methods here, one optimized for String.)

@JeffreySarnoff
Copy link
Contributor

+1 for chophead, choptail
or removeprefix, removesuffix

@kcajf
Copy link
Contributor Author

kcajf commented Jun 16, 2021

Yes, I like chophead/choptail. Thanks @stevengj for the feedback, I'll add those optimisations shortly.

@thchr
Copy link
Contributor

thchr commented Jun 17, 2021

It might be nice to avoid introducing a new function name entirely and instead just add this as a new method for chop or lstrip/rstrip (that would feel more in tune with multiple dispatch to me)?
Right now, there's already a fair number of different options for changing the ends of strings, whose names nevertheless invoke pretty much the same idea in my mind:

  • chop: remove n characters from tail/head.
  • chomp: remove \n from tail of string.
  • lstrip/rstrip: remove specific character(s) or predicate from tail/head of string.

I always have trouble remembering which of these applies to any given situation. It would be nice if this could be added without adding more similarly sounding function names for slightly different versions of "get rid of something at the ends of a string".

E.g., could adding this to lstrip/rstrip work, i.e. as a ::AbstractString, ::AbstractString method? Then it could potentially also be generalized later to stripping multiple prefixes/suffixes sequentially as a ::AbstractString, Vector{<:AbstractString} method.

@StefanKarpinski
Copy link
Sponsor Member

The names chophead and choptail sound to me like they unconditionally remove something, which matches the behavior of chop(str, head=3) and chop(str, tail=3). That does suggest that maybe we could do

chop(str, head="prefix")
chop(str, tail="suffix")

Although those suggest to me that they might error if str doesn't begin with prefix or end with suffix.

The trouble with defining lstrip(::AbstractString, ::AbstractString) is that we already have this:

julia> rstrip("Hello", collect("lo"))
"He"

I actually expected rstrip("Hello", "lo") to work and do the same thing. It doesn't, but I think that since we treat strings as collections of characters, it would be bad to give that a different meaning.

Overall my favorite so far is chop(str, head="prefix") and chop(str, tail="suffix").

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 19, 2021

Stylistically, I think I much prefer chophead and choptail to kwargs for that. Using named arguments instead of positional for APIs like this (where the order can be confused) looks helpful, but seems inconsistent to change now.

@vtjnash vtjnash added the status:forget me not PRs that one wants to make sure aren't forgotten label Jul 19, 2021
Comment on lines 218 to 246
if startswith(s, prefix)
SubString(s, nextind(s, lastindex(prefix), 1), lastindex(s))
else
SubString(s)
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this fairly compactly as follows (startswith could also be written this way, to show they are the same):

Suggested change
if startswith(s, prefix)
SubString(s, nextind(s, lastindex(prefix), 1), lastindex(s))
else
SubString(s)
end
i, j = iterate(a), iterate(b)
while true
j === nothing && i === nothing && return SubString(s, 1, 0) # s == prefix: empty result
j === nothing && return SubString(s, i[2]) # ran out of prefix: success!
i === nothing && return SubString(s) # ran out of source: failure
i[1] == j[1] || return SubString(s) # mismatch: failure
i, j = iterate(a, i[2]), iterate(b, j[2])
end

Then the general chopend is pretty much the same, but with slight changes (such as calling iterate(Iterators.Reverse(a)) at the top, and SubString(s, firstindex(s), i[2]) on success)

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just go ahead and take this over and finish it. I think it is ready to merge, but can add to triage if (@StefanKarpinski) we want to discuss merging chop|chopprefix|chopsuffix into chop(s; [head], [tail] [prefix], [suffix]) and deciding which order to apply those in.

@kcajf
Copy link
Contributor Author

kcajf commented Jul 20, 2021

Thanks @vtjnash!

@stevengj
Copy link
Member

LGTM, modulo bikeshedding the names.

@oscardssmith
Copy link
Member

Lets bikeshed the names.

@oscardssmith oscardssmith added the status:triage This should be discussed on a triage call label Nov 18, 2021
@JeffBezanson
Copy link
Sponsor Member

Triage says 👍

@JeffBezanson JeffBezanson merged commit 85f4db2 into JuliaLang:master Nov 18, 2021
@JeffBezanson JeffBezanson removed status:triage This should be discussed on a triage call status:forget me not PRs that one wants to make sure aren't forgotten labels Nov 18, 2021
@stevengj
Copy link
Member

Was this PR missing a NEWS?

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants