New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ASCII fast path to stripLeft & avoid unnecessary decoding #6727
Conversation
Thanks for your pull request, @n8sh! Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + phobos#6727" |
683b961
to
4fcb911
Compare
EDIT: new benchmark code: stripBench.d//1st module
import std.traits;
import std.range;
auto old_stripLeft(Range)(Range input)
if (isForwardRange!Range && isSomeChar!(ElementEncodingType!Range) &&
!isInfinite!Range && !isConvertibleToString!Range)
{
static import std.ascii;
static import std.uni;
import std.utf : decodeFront;
while (!input.empty)
{
auto c = input.front;
if (std.ascii.isASCII(c))
{
if (!std.ascii.isWhite(c))
break;
input.popFront();
}
else
{
auto save = input.save;
auto dc = decodeFront(input);
if (!std.uni.isWhite(dc))
return save;
}
}
return input;
}
auto new_stripLeft(Range)(Range input)
if (isForwardRange!Range && isSomeChar!(ElementEncodingType!Range) &&
!isInfinite!Range && !isConvertibleToString!Range)
{
static import std.ascii;
static import std.uni;
static if (is(Unqual!(ElementEncodingType!Range) == dchar)
|| is(Unqual!(ElementEncodingType!Range) == wchar))
{
// Decoding is never needed for dchar. It happens not to be needed
// here for wchar because no whitepace is outside the basic
// multilingual plane meaning every whitespace character is encoded
// with a single wchar and due to the design of UTF-16 those wchars
// will not occur as part of the encoding of multi-wchar codepoints.
static if (isDynamicArray!Range || (isRandomAccessRange!Range && hasLength!Range
&& __traits(compiles, { size_t s = input.length / 2; input = input[s .. $]; })))
{
size_t i = 0;
for (const size_t end = input.length; i < end; ++i)
{
if (!std.uni.isWhite(input[i]))
break;
}
input = input[i .. $];
return input;
}
else
{
while (!input.empty)
{
if (!std.uni.isWhite(input.front))
break;
input.popFront();
}
return input;
}
}
else
{
static if (isDynamicArray!Range || (isRandomAccessRange!Range && hasLength!Range
&& __traits(compiles, { size_t s = input.length / 2; input = input[s .. $]; })))
{{
// ASCII optimization for dynamic arrays & similar.
size_t i = 0;
for (const size_t end = input.length; i < end; ++i)
{
auto c = input[i];
if (c >= 0x80) goto NonAsciiPath;
if (!std.ascii.isWhite(c)) break;
}
input = input[i .. $];
return input;
NonAsciiPath:
input = input[i .. $];
// Fall through to standard case.
}}
static if (ElementType!Range.sizeof > 1)
{
// Type performs its own decoding.
while (!input.empty)
{
if (!std.uni.isWhite(input.front))
break;
input.popFront();
}
return input;
}
else
{
// Type doesn't perform its own decoding.
import std.utf : decodeFront, UseReplacementDchar;
while (!input.empty)
{
auto c = input.front;
if (std.ascii.isASCII(c))
{
if (!std.ascii.isWhite(c))
break;
input.popFront();
}
else
{
auto save = input.save;
auto dc = decodeFront!(UseReplacementDchar.yes)(input);
if (!std.uni.isWhite(dc))
return save;
}
}
return input;
}
}
}
void main(string[] args)
{
import core.stdc.string;
import std.datetime.stopwatch;
import std.stdio;
import bench_data;
auto timings = benchmark!(
()
{
blackhole += old_stripLeft(ascii_1)[0];
},
()
{
blackhole += new_stripLeft(ascii_1)[0];
},
()
{
blackhole += old_stripLeft(non_ascii_1)[0];
},
()
{
blackhole += new_stripLeft(non_ascii_1)[0];
},
()
{
blackhole += old_stripLeft(w_ascii_1)[0];
},
()
{
blackhole += new_stripLeft(w_ascii_1)[0];
},
()
{
blackhole += old_stripLeft(w_non_ascii_1)[0];
},
()
{
blackhole += new_stripLeft(w_non_ascii_1)[0];
},
)(50_000_000);
writefln("old_stripLeft ascii %s", timings[0]);
writefln("new_stripLeft ascii %s", timings[1]);
writefln("old_stripLeft non_ascii %s", timings[2]);
writefln("new_stripLeft non_ascii %s", timings[3]);
writefln("old_stripLeft w_ascii %s", timings[4]);
writefln("new_stripLeft w_ascii %s", timings[5]);
writefln("old_stripLeft w_non_ascii %s", timings[6]);
writefln("new_stripLeft w_non_ascii %s", timings[7]);
} bench_data.d// 2nd module to hide data from optimizer
module bench_data;
immutable string ascii_1 = "\n\t\v\rhello world\n\t\v\r";
immutable string non_ascii_1 = "\u2028\u2028hello world\u2028\u2028";
immutable wstring w_ascii_1 = "\n\t\v\rhello world\n\t\v\r"w;
immutable wstring w_non_ascii_1 = "\u2028\u2028hello world\u2028\u2028"w;
int blackhole; |
could you add a testcase that combines an ascii and non-ascii whitespace, so that the break out statement are correct. |
@burner Added. |
std/string.d
Outdated
size_t s = input.length / 2; | ||
input = input[s .. $]; | ||
})) | ||
{{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enlighten me, why double curly braces here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create a scope. static if
curly braces don't do that so I need a second pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry i must be tired but i don't see why a scope is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's not needed, I just prefer it.
Reworked. Added optimizations for |
976b466
to
d38016a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two optimizations here. One thing that isn't crystal clear to me is whether the random access usage is needed for anything but arrays, as we don't know the mechanics of generic random-access ranges (and whether indexing is faster than front/popFront). This code would certainly look a lot cleaner without those extra optimizations, especially the char
branch.
{{ | ||
// ASCII optimization for dynamic arrays & similar. | ||
size_t i = 0; | ||
for (const size_t end = input.length; i < end; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested modification in the other branch makes this cleaner as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I was thinking this too could be rewritten:
foreach (i; 0 .. input.length)
{
auto c = input[i];
if (c >= 0x80)
{
input = input[i .. $];
goto NonAsciiPath;
}
if (!std.ascii.isWhite)
return input[i .. $];
}
return input[$ .. $];
NonAsciiPath:
While it doesn't get rid of the goto, the loop looks cleaner to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewrite makes the ASCII loop slower while avoiding a call to _d_arraybounds
. When processing "\n\t\v\rhello world\n\t\v\r"
, with ldc2 -O2
the function overall takes ~5% longer and with dmd -O -inline
it takes ~30% longer. Obviously when -release
is added and no bounds checks are performed the comparison gets worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be a cache line thing, because the major work done in the loop (calling isWhite and checking if c is non-ascii) is identical in both cases. In any case, it's only marginally more readable, so the original is fine.
for (const size_t end = input.length; i < end; ++i) | ||
{ | ||
auto c = input[i]; | ||
if (c >= 0x80) goto NonAsciiPath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid a goto
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so.
std/string.d
Outdated
// Fall through to standard case. | ||
}} | ||
|
||
static if (ElementType!Range.sizeof > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about is(ElementType!Range != ElementEncodingType!Range)
. The sizeof thing requires one to deduce that you only got here if the element encoding type is char.
7094a4f
to
82e0f2c
Compare
f10c0b8
to
94a38a2
Compare
I haven't had the time to review this PR, just wanted to mention that if you're iterating sequentially, indexing with a CPU predictable index variable (e.g. |
Changed the PR to only use an index variable for arrays. |
Add ASCII fast path to stripLeft & avoid redundant decoding.
I would think indexing and incrementing would be less expensive than |
This is a great idea. There are more functions in Phobos that would benefit from ASCII-optimized paths. I managed to find these
. I bet there are more. This will have a big impact on the performance of text processing, such as parsers, as a large percentage of the input is ASCII-clean. |
ping @wilzbach I can't figure out what went wrong in the buildkite. Seems to be consistent, but there is no real error message except that some script failed:
Is this a false error? |
Yes. |
Auto-merge toggled on |
OK, thanks. Just used the old mechanism instead. |
Similar to #6361. Code for benchmarks in next post.
string benchmark, ldc2:
wstring benchmark, ldc2:
string benchmark, dmd:
wstring benchmark, dmd: