-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
substring_search
benchmark is timing out in nightly performance run
#917
Comments
…em size from 10**8 to 10**6
Issue #917: Change `substring_search` benchmark problem size to `10**6`
I haven't been able to dive too deep into this, but here's what I found so far. I ran substring_search on chapcs (single node, comm=none) and I don't see way way better performance there, which makes me suspect this is more about local overheads than inherent comm limitations (though there could be some comm overhead still.) On chapcs here's the performance I see:
Looking at the current code there are a fair number of memory allocations so I wonder if the overhead is mostly from lots of short-lived allocations similar to what we saw in #266 (comment). The allocations that I saw quickly are the ones to slice the segmented array, and to create the string from the slice. The below patch eliminates some of the allocations. It's a prototype since it only works for single node, but it should be good enough to do some experiments: diff --git a/src/CastMsg.chpl b/src/CastMsg.chpl
index 621f09f..54e40d8 100644
--- a/src/CastMsg.chpl
+++ b/src/CastMsg.chpl
@@ -203,7 +203,7 @@ module CastMsg {
} else {
end = oa[i+1] - 1;
}
- e = interpretAsString(va[start..end]) : toType;
+ e = interpretAsString(va, start, end-start) : toType;
}
} catch e: IllegalArgumentError {
var errorMsg = "bad value in cast from string to %s".format(toType:string);
diff --git a/src/SegmentedArray.chpl b/src/SegmentedArray.chpl
index ce12360..75ac24c 100644
--- a/src/SegmentedArray.chpl
+++ b/src/SegmentedArray.chpl
@@ -148,7 +148,7 @@ module SegmentedArray {
end = offsets.a[idx+1] - 1;
}
// Take the slice of the bytearray and "cast" it to a chpl string
- var s = interpretAsString(values.a[start..end]);
+ var s = interpretAsString(values.a, start, end-start);
return s;
}
@@ -509,18 +509,18 @@ module SegmentedArray {
when SearchMode.contains {
forall (o, l, h) in zip(oa, lengths, hits) with (var myRegex = _unsafeCompileRegex(pattern)) {
// regexp.search searches the receiving string for matches at any offset
- h = myRegex.search(interpretAsString(va[o..#l])).matched;
+ h = myRegex.search(interpretAsString(va, o, l)).matched;
}
}
when SearchMode.startsWith {
forall (o, l, h) in zip(oa, lengths, hits) with (var myRegex = _unsafeCompileRegex(pattern)) {
// regexp.match only returns a match if the start of the string matches the pattern
- h = myRegex.match(interpretAsString(va[o..#l])).matched;
+ h = myRegex.match(interpretAsString(va, o, l)).matched;
}
}
when SearchMode.endsWith {
forall (o, l, h) in zip(oa, lengths, hits) with (var myRegex = _unsafeCompileRegex(pattern)) {
- var matches = myRegex.matches(interpretAsString(va[o..#l]));
+ var matches = myRegex.matches(interpretAsString(va, o, l));
var lastMatch: reMatch = matches[matches.size-1][0];
// h = true iff start(lastMatch) + len(lastMatch) == len(string) (-1 to account for null byte)
h = lastMatch.offset + lastMatch.size == l-1;
@@ -531,7 +531,7 @@ module SegmentedArray {
// regexp.match only returns a match if the start of the string matches the pattern
// h = true iff len(match) == len(string) (-1 to account for null byte)
// if no match is found reMatch.size returns -1
- h = myRegex.match(interpretAsString(va[o..#l])).size == l-1;
+ h = myRegex.match(interpretAsString(va, o, l)).size == l-1;
}
}
}
@@ -638,7 +638,7 @@ module SegmentedArray {
var rightStart: [offsets.aD] int;
forall (o, len, i) in zip(oa, lengths, offsets.aD) with (var myRegex = _unsafeCompileRegex(delimiter)) {
- var matches = myRegex.matches(interpretAsString(va[o..#len]));
+ var matches = myRegex.matches(interpretAsString(va, o, len));
if matches.size < times {
// not enough occurances of delim, the entire string stays together, and the param args
// determine whether it ends up on the left or right
@@ -1269,19 +1269,16 @@ module SegmentedArray {
}
/* Convert an array of raw bytes into a Chapel string. */
- inline proc interpretAsString(bytearray: [?D] uint(8)): string {
+ inline proc interpretAsString(bytearray: [?D] uint(8), start, size): string {
// Byte buffer must be local in order to make a C pointer
- var localBytes: [{0..#D.size}] uint(8) = bytearray;
- var cBytes = c_ptrTo(localBytes);
+ var cBytes = c_ptrTo(bytearray[start]);
// Byte buffer is null-terminated, so length is buffer.size - 1
// The contents of the buffer should be copied out because cBytes will go out of scope
// var s = new string(cBytes, D.size-1, D.size, isowned=false, needToCopy=true);
- var s: string;
try {
- s = createStringWithNewBuffer(cBytes, D.size-1, D.size);
+ return createStringWithBorrowedBuffer(cBytes, size-1, size);
} catch {
- s = "<error interpreting bytes as string>";
+ return "<error interpreting bytes as string>";
}
- return s;
}
} That basically tries to directly reinterpret pointers into the segmented array as strings without copies (and will break if the index into the segmented array is remote for instance.) Here's the performance I see with that on chapcs:
This shows some promise, but there's probably another copy that I missed in there or something. |
Hmm, the remaining allocation is coming from the regex module here. Using some hacks to eliminate that allocation here's what I see:
I don't see any extra allocations at this point. I'm not positive, but there may be some other micro-optimizations to avoid doing utf8 validation when creating the strings and things like that. |
Misc related notes:
|
@pierce314159 @reuster986 Just FYI my high level plan for this is:
Mentioned before, but we're in the middle of our 1.25 release so I'll be a bit delayed getting to this. Hoping to tackle in the next 1-2 weeks but don't hold me to that. @pierce314159 Could you take a look at:
It also looks like #920 is adding a |
…_search` benchmark to allow use of `ak.stick` and speed up `test_substring` construction time
…_search` benchmark to allow use of `ak.stick` and speed up `test_substring` construction time
…_search` benchmark to allow use of `ak.stick` and speed up `test_substring` construction time
Issue #917: Speedup `substring_search` benchmark startup
Allocations were minimized in chapel-lang/chapel#18733 and there are no allocations for things like contains now. At the moment we can't really increase the nightly problem size since the version that tests against chapel 1.25.0 will still be slow. We're releasing a 1.25.1 shortly after thanksgiving and I think after nightly testing is using that we can bump problem size. |
substring_search was heavily optimized between #935, chapel-lang/chapel#18465, #931, chapel-lang/chapel#18733, so I'm bumping the problem size back up in #1038. Here's performance on 16-node-cs-hdr with
|
@ronawho found that the
substring_search
benchmark is timing out in the nightly performance runs. He's offered to help track down the performance bottleneck later next week. He recommends switching the default problem size from10**8
to10**6
for the time beingHere are the times for
10**6
on 16-node-cs-hdr:ronawho's original comment:
#912 (comment)
The text was updated successfully, but these errors were encountered: