New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 13632: Enhancement to std.string.strip #6023
Conversation
Thanks for your pull request, @aravindavk! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Some tips to help speed things up:
Bear in mind that large or tricky changes may require multiple rounds of review and revision. Please see CONTRIBUTING.md for more information. Bugzilla references
|
01f9e15
to
41f03f9
Compare
std/string.d
Outdated
@@ -2840,15 +2841,17 @@ if (isForwardRange!Range && isSomeChar!(ElementEncodingType!Range) && | |||
auto c = input.front; | |||
if (std.ascii.isASCII(c)) | |||
{ | |||
if (!std.ascii.isWhite(c)) | |||
if ((chars == "" && !std.ascii.isWhite(c)) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is null
is the more idiomatic D way to check for empty strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review, I will update the patch to use null and refresh the patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is null
is most definitely not the way to check for empty strings. That's how you check that a string is actually null
. A null
string is empty, but an empty string isn't necessarily null. The idiomatic way to check for empty is the empty
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please don't use null to check for empty string, as it will introduce subtle bugs where empty strings with a non-null .ptr
will not get recognized as empty. Either use .empty
or do an explicit length check chars.length == 0
.
std/string.d
Outdated
@@ -2827,7 +2828,7 @@ if (isConvertibleToString!Range) | |||
See_Also: | |||
Generic stripping on ranges: $(REF _stripLeft, std, algorithm, mutation) | |||
+/ | |||
auto stripLeft(Range)(Range input) | |||
auto stripLeft(Range)(Range input, string chars = "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to be adding this as an optional argument instead of an overload? This is going to make the common case slower, because it's going to have to keep checking for whether chars
is empty or not. IMHO, it would make far more sense to either provide a second overload or to make this a variadic template so that the presence of the second argument can be checked at compile-time, and the extra runtime checks can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I will work on changing this as overloaded function as you suggested. Thanks for the review
71546f4
to
e88e56d
Compare
I'm pretty sure that is not @jmdavis had in mind when he said overload. |
Yeah, this is way too error-prone. The fact that you're providing a second function argument should be what determines whether the normal version of strip is used or not, not whether you passed a template argument and passed a second function argument. The simplest way to deal with this is to just create two separate functions:
That way, the function signatures are clear, and the efficiency of the existing |
Using second argument as compile time switch didn't worked since value should be known during compile time. Without duplicating the function content I couldn't achieve two overloaded functions. Template argument is used in the intermediate function, both
First function is internal function, I have to explore how it can be made private. Other two functions are exposed which doesn't use template parameter. Thanks. |
Just create two completely separate functions. It's simpler. e.g.
|
To avoid duplication, you could use a templated auto stripLeft(R)(R range)
if(...)
{
stripLeftImpl!(Yes.useChars)(range, chars);
}
auto stripLeft(R)(R range, string chars)
if(...)
{
stripLeftImpl!(No.useChars)(range, chars);
}
private stripLeftImpl(Flag!"UseChars" useChars, Args...)(Args args)
{
alias range = args[0];
... code ...
static if (useChars)
{
... // args[1]
}
} |
If you go that route, the flag is pointless. The number of arguments gives you the information you need. There is zero need to pass any kind of template argument indicating which version of the function to use. |
Looking over these again, I definitely vote for separate functions. Each overload should be doing fundamentally different things, and honestly, the fact that you've just swapped out With the new overload, for And with Just create completely separate overloads for these versions of |
Thanks @jmdavis for detailed review, will work on the suggested changes and refresh the patch by today. |
e88e56d
to
2f2dfbc
Compare
Updated the code as suggested. Please review. |
Build is failing in CI, looks like unable to find
|
std/string.d
Outdated
@@ -2870,10 +2900,24 @@ if (isForwardRange!Range && isSomeChar!(ElementEncodingType!Range) && | |||
assert(stripLeft([paraSep] ~ "hello world" ~ paraSep) == | |||
"hello world" ~ [paraSep]); | |||
|
|||
assert(stripLeft(" hello world ", " ") == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the new unittests in a separate unittest block, and also, please move the new stripLeft
overload after the original unittest block. This is so that the generated documentation will make sense.
In other words, what you currently have is this:
/** ... */
auto stripLeft(...) ... // original stripLeft()
/** ... */
auto stripLeft(...) // new stripLeft
///
unittest
{
... // original test cases
... // new test cases
}
What it should be is this:
/** ... */
auto stripLeft(...) ... // original stripLeft
///
unittest { ... } // original test cases
/** ... */
auto stripLeft(...) ... // new stripLeft
///
unittest { ... } // new test cases
std/string.d
Outdated
@@ -2882,6 +2926,12 @@ if (isConvertibleToString!Range) | |||
return stripLeft!(StringTypeOf!Range)(str); | |||
} | |||
|
|||
auto stripLeft(Range)(auto ref Range str, string chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two overloads also need to have the right order. Here's what the order should be:
/** ... */
auto stripLeft(...) ... // original stripLeft
///
unittest { ... } // original test cases
/// ditto <--- add this comment
auto stripLeft(Range)(auto ref Range str)
if (isConvertibleToString!Range) ...
///
unittest { ... } // more original test cases
/** ... */
auto stripLeft(...) ... // new stripLeft
///
unittest { ... } // new test cases
/// ditto <--- add this comment
auto stripLeft(Range)(auto ref Range str, string chars)
if (isConvertibleToString!Range) ...
///
unittest { ... } // more new test cases (if there are any specific to the 2nd overload)
assert(stripRight(" hello world ", " ") == | ||
" hello world"); | ||
assert(stripRight(" hello worldxy ", "xy ") == | ||
" hello world"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for stripRight
. The original overload + unittests should be left as-is in their original order, and the new stripRight
should be added after the original unittest, and the new tests should follow, in their own unittest block.
std/string.d
Outdated
{ | ||
return strip!(StringTypeOf!Range)(str, chars); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for these overloads. Keep the original single-argument overloads grouped together with their respective unittest blocks, followed by the new 2-argument overloads and the new unittests.
@quickfur thanks for the review, I will make the suggested changes and refresh the patch. |
2f2dfbc
to
c67239f
Compare
@quickfur updated the patch as suggested. Please review |
Hi Reviewers, Let me know if any more changes required to the patch. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better now. The one thing that still stands out is that we can relax the requirements on the incoming range a bit more.
std/string.d
Outdated
+/ | ||
auto stripRight(Range)(Range str, string chars) | ||
if (isSomeString!Range || | ||
isRandomAccessRange!Range && hasLength!Range && hasSlicing!Range && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, we don't need to require a random access range here. It is enough to require a bidirectional range and use back
and popBack
to strip from the end, in a similar way to stripLeft
. In that case, we also don't need to require hasLength
and hasSlicing
, though of course in the implementation we can take advantage of those attributes where available.
std/string.d
Outdated
+/ | ||
auto strip(Range)(Range str, string chars) | ||
if (isSomeString!Range || | ||
isRandomAccessRange!Range && hasLength!Range && hasSlicing!Range && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, we can relax the constraints here to just isBidirectionalRange!Range
, and that should be sufficient to implement this function.
c67239f
to
2bd8a7a
Compare
@quickfur updated the patch to use |
It triggers on new commits. So, just do a |
Only people with write permissions can login at Jenkins -> I just restarted Jenkins for you. git commit --amend
git push -f Note that we currently have a couple of issues with Jenkins. The one you saw is dlang/ci#135 |
LGTM ping @andralex Looks good now? |
Thanks for all the review comments. Let me know if any change required. @andralex please review the changes. (jenkins/pr-merge failure is not related to the patch #6023 (comment)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting there
std/string.d
Outdated
assert(stripLeft(" hello ", "") == " hello "); | ||
} | ||
|
||
auto stripLeft(Range, Char)(auto ref Range str, const(Char)[] chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public and undocumented not good. Please merge constraints and bodies with the overload above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed similar to existing stripLeft(Range)(auto ref Range str)
. Confused because none of the functions with auto ref
input are documented in this file. Please clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pursue lifting the quality of code overall, and in the process of doing that it's possible that new code is subject to a higher bar than existing code. Two standards arose from experience:
- All public symbols must be documented
- Prefer fewer overloads
I understand this can be frustrating - please bear with us. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, happy to work on the changes, just needed a clarification. Thanks
std/string.d
Outdated
" hello world"); | ||
} | ||
|
||
auto stripRight(Range, Char)(auto ref Range str, const(Char)[] chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge with the other overload
std/string.d
Outdated
assert(strip(" hello ", "") == " hello "); | ||
} | ||
|
||
auto strip(Range, Char)(auto ref Range str, const(Char)[] chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please merge with the other oveload
5515402
to
23acfee
Compare
@andralex merged the overloads as suggested. Please review. |
714108a
to
3d37afa
Compare
ci/circleci is failing due to following error.
Is this a known failure or do I need to do anything to fix this? |
Known |
And fixed: #6099 (I restarted your build on CircleCi) |
{ | ||
for (; !str.empty; str.popBack) | ||
{ | ||
if (chars.indexOf(str.back) == -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use foreach
here?
In this case there's any performance penalty of this abstraction.
#6092 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're iterating from the back, so we'd need foreach_reverse
or retro
if you wanted to use an abstraction.
Or we could write this as:
while (!str.empty && chars.indexOf(str.back) != -1)
str.popBack;
which to me reads more intuitively as "while the range is not empty and the back is a character to the stripped, pop the back".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I change this to use while
as suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important, but I like the for
form. Kurz und gut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important, but I like the for form. Kurz und gut.
Yes, I was referring to foreach_reverse
:
foreach_reverse (s; str)
if (chars.indexOf(s) == -1)
break;
FWIW LDC optimizes foreach_reverse
a lot better as it avoid auto-decoding:
_D7example3fooFAyaQdZv:
sub rsp, 72
mov qword ptr [rsp + 64], rcx
mov qword ptr [rsp + 56], rdx
mov qword ptr [rsp + 40], rdi
mov qword ptr [rsp + 48], rsi
mov rcx, qword ptr [rsp + 64]
mov rdx, qword ptr [rsp + 56]
mov qword ptr [rsp + 24], rdx
mov qword ptr [rsp + 32], rcx
mov rcx, qword ptr [rsp + 24]
mov qword ptr [rsp + 16], rcx
.LBB0_1:
mov rax, qword ptr [rsp + 16]
mov rcx, rax
sub rcx, 1
mov qword ptr [rsp + 16], rcx
cmp rax, 0
je .LBB0_6
mov eax, 1
mov rcx, qword ptr [rsp + 16]
mov rdx, qword ptr [rsp + 32]
mov sil, byte ptr [rdx + rcx]
mov byte ptr [rsp + 15], sil
movzx esi, byte ptr [rsp + 15]
mov rdx, qword ptr [rsp + 40]
mov rcx, qword ptr [rsp + 48]
mov edi, 1
mov dword ptr [rsp + 8], eax
call _D3std6string__T7indexOfTAyaZQnFNaNbNiNfQpxwxEQBs8typecons__T4FlagVQBqa13_6361736553656e736974697665ZQBoZl@PLT
cmp rax, -1
jne .LBB0_4
jmp .LBB0_6
.LBB0_4:
jmp .LBB0_5
.LBB0_5:
jmp .LBB0_1
.LBB0_6:
add rsp, 72
ret
I hope you notice that it could inline back
completely.
For comparison, the for
:
_D7example3fooFAyaQdZv:
push r15
push r14
push r12
push rbx
sub rsp, 24
mov r14, rsi
mov r12, rdi
mov qword ptr [rsp + 8], rdx
mov qword ptr [rsp + 16], rcx
test rdx, rdx
je .LBB0_9
mov rdi, rdx
mov rsi, rcx
call _D3std5range10primitives__T4backTyaZQjFNaNdNfAyaZw@PLT
mov edi, 1
mov esi, eax
mov rdx, r12
mov rcx, r14
call _D3std6string__T7indexOfTAyaZQnFNaNbNiNfQpxwxEQBs8typecons__T4FlagVQBqa13_6361736553656e736974697665ZQBoZl@PLT
cmp rax, -1
je .LBB0_9
lea r15, [rsp + 8]
.LBB0_3:
mov rbx, qword ptr [rsp + 8]
test rbx, rbx
je .LBB0_6
mov rdi, rbx
mov rsi, r15
call _D3std3utf__T10strideBackTAyaZQrFNaNfKQmmZk@PLT
mov eax, eax
sub rbx, rax
cmp rbx, qword ptr [rsp + 8]
ja .LBB0_5
mov qword ptr [rsp + 8], rbx
test rbx, rbx
je .LBB0_9
mov rsi, qword ptr [rsp + 16]
mov rdi, rbx
call _D3std5range10primitives__T4backTyaZQjFNaNdNfAyaZw@PLT
mov edi, 1
mov esi, eax
mov rdx, r12
mov rcx, r14
call _D3std6string__T7indexOfTAyaZQnFNaNbNiNfQpxwxEQBs8typecons__T4FlagVQBqa13_6361736553656e736974697665ZQBoZl@PLT
cmp rax, -1
jne .LBB0_3
.LBB0_9:
add rsp, 24
pop rbx
pop r12
pop r14
pop r15
ret
.LBB0_6:
lea rsi, [rip + .L.str]
lea rcx, [rip + .L.str.1]
mov edi, 69
mov edx, 92
mov r8d, 2206
call _d_assert_msg@PLT
.LBB0_5:
lea rsi, [rip + .L.str.1]
mov edi, 92
mov edx, 2207
call _d_arraybounds@PLT
If we would have perfect abstraction, I would have suggested str.retro.until!(s => chars.indexOf(s) > 0);
, but unfortunately that's - even for the LLVM optimizer - too much.
edit: after thinking about this, it's is probably better here to do auto-decoding there because the user-input of chars could consist of utf-8 chars :/
return stripLeft!(StringTypeOf!Range)(input, chars); | ||
else | ||
{ | ||
for (; !input.empty; input.popFront) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use foreach
here?
In this case there's any performance penalty of this abstraction.
#6092 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the catch here is that we need to iterate from the back, not from the front. So we'll either have to use foreach_reverse
(ick), or std.range.retro
, which then would require passing the range in by reference, since we need to guarantee that the range is actually modified in order to return it to the user. So it'll end up being something ugly like this:
import std.range : retro;
foreach (ch; (&input).retro) { ... }
which is equally ick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, what I said only applies to the stripRight
variant. Here we're moving forwards, so there's no foreach_reverse
or retro
needed.
I'd say the for
loop here is justified (using foreach would require passing the range by ref, just in case, so it still entails the ugly &range
syntax).
Or perhaps a more self-documenting way to write this would be:
while (!input.empty && chars.indexOf(input.front) != -1)
input.popFront;
which, at least to me, reads more intuitively as "while the range is not empty and the front element is a character to be stripped, pop the front element".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it boils down to whether we want to do auto-decoding or not ... which after thinking about this is probably better here because the user-input of chars could consist of utf-8 chars.
@@ -3177,6 +3289,89 @@ if (isConvertibleToString!Range) | |||
}); | |||
} | |||
|
|||
/// Ditto | |||
auto strip(Range, Char)(Range str, const(Char)[] chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional overload would take a third argument:
auto strip(Range, Char)(Range str, const(Char)[] leftChars, const(Char)[] rightChars)
and would strip leftChars
on the left and rightChars
on the right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I will add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the new overload as suggested and added the respective tests.
3d37afa
to
947c8cc
Compare
std/string.d
Outdated
/// | ||
@safe pure unittest | ||
{ | ||
assert(stripLeft(" hello world ", " ") == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
std/string.d
Outdated
assert(stripLeft("xxxyy hello world ", "xy ") == | ||
"hello world "); | ||
|
||
import std.array : array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good sign that it should be a separate unittest:
}
/// unittest
{
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Added second argument similar to Python `str.strip` Second argument accepts a string of characters to strip and strips only those characters. Examples: "xyzhello".stripLeft("xyz") == "hello" "helloxy ".stripRight("xy ") == "hello" "xhellox".strip("x") == "hello" Signed-off-by: Aravinda VK <mail@aravindavk.in>
947c8cc
to
2346990
Compare
Alright, I think this is good to go. Any further fixes can be done in followup PRs. Merge ahoy! |
Feeling very happy to see my first contribution to D language :) Contribution experience was very nice, review comments were very valid and helped me to understand the language better. Happy to contribute more to D language in future. Thanks a lot |
Nice work @aravindavk, you made a solid contribution to the standard library! |
BTW it's online on the changelog already: https://dlang.org/changelog/pending.html#std-string-strip
👍 Also, https://issues.dlang.org is open 24/7 😉 |
Added second argument similar to Python
str.strip
Second argument accepts list of chars to strip and strips only
those chars.
Examples:
Signed-off-by: Aravinda VK mail@aravindavk.in