Skip to content

Commit

Permalink
Add String::push_with_ascii_fast_path, bench it against String::push
Browse files Browse the repository at this point in the history
`String::push(&mut self, ch: char)` currently has a single code path
that calls `Char::encode_utf8`.
Perhaps it could be faster for ASCII `char`s, which are represented as
a single byte in UTF-8.

This commit leaves the method unchanged,
adds a copy of it with the fast path,
and adds benchmarks to compare them.

Results show that the fast path very significantly improves the performance
of repeatedly pushing an ASCII `char`,
but does not significantly affect the performance for a non-ASCII `char`
(where the fast path is not taken).

Output of `make check-stage1-collections NO_REBUILD=1 PLEASE_BENCH=1 TESTNAME=string::tests::bench_push`

```
test string::tests::bench_push_char_one_byte                 ... bench:     59552 ns/iter (+/- 2132) = 167 MB/s
test string::tests::bench_push_char_one_byte_with_fast_path  ... bench:      6563 ns/iter (+/- 658) = 1523 MB/s
test string::tests::bench_push_char_two_bytes                ... bench:     71520 ns/iter (+/- 3541) = 279 MB/s
test string::tests::bench_push_char_two_bytes_with_slow_path ... bench:     71452 ns/iter (+/- 4202) = 279 MB/s
test string::tests::bench_push_str                           ... bench:        24 ns/iter (+/- 2)
test string::tests::bench_push_str_one_byte                  ... bench:     38910 ns/iter (+/- 2477) = 257 MB/s
```

A benchmark of pushing a one-byte-long `&str` is added for comparison,
but its performance [has varied a lot lately](
#19640 (comment)).
(When the input is fixed, `s.push_str("x")` could be used
instead of `s.push('x')`.)
  • Loading branch information
SimonSapin committed Dec 20, 2014
1 parent 8f51ad2 commit cc33ce6
Showing 1 changed file with 80 additions and 0 deletions.
80 changes: 80 additions & 0 deletions src/libcollections/string.rs
Expand Up @@ -528,6 +528,29 @@ impl String {
}
}

#[inline]
fn push_with_ascii_fast_path(&mut self, ch: char) {
if (ch as u32) < 0x80 {
self.vec.push(ch as u8);
return;
}

let cur_len = self.len();
// This may use up to 4 bytes.
self.vec.reserve(4);

unsafe {
// Attempt to not use an intermediate buffer by just pushing bytes
// directly onto this string.
let slice = RawSlice {
data: self.vec.as_ptr().offset(cur_len as int),
len: 4,
};
let used = ch.encode_utf8(mem::transmute(slice)).unwrap_or(0);
self.vec.set_len(cur_len + used);
}
}

/// Works with the underlying buffer as a byte slice.
///
/// # Examples
Expand Down Expand Up @@ -1408,6 +1431,63 @@ mod tests {
});
}

const REPETITIONS: u64 = 10_000;

#[bench]
fn bench_push_str_one_byte(b: &mut Bencher) {
b.bytes = REPETITIONS;
b.iter(|| {
let mut r = String::new();
for _ in range(0, REPETITIONS) {
r.push_str("a")
}
});
}

#[bench]
fn bench_push_char_one_byte(b: &mut Bencher) {
b.bytes = REPETITIONS;
b.iter(|| {
let mut r = String::new();
for _ in range(0, REPETITIONS) {
r.push('a')
}
});
}

#[bench]
fn bench_push_char_one_byte_with_fast_path(b: &mut Bencher) {
b.bytes = REPETITIONS;
b.iter(|| {
let mut r = String::new();
for _ in range(0, REPETITIONS) {
r.push_with_ascii_fast_path('a')
}
});
}

#[bench]
fn bench_push_char_two_bytes(b: &mut Bencher) {
b.bytes = REPETITIONS * 2;
b.iter(|| {
let mut r = String::new();
for _ in range(0, REPETITIONS) {
r.push('â')
}
});
}

#[bench]
fn bench_push_char_two_bytes_with_slow_path(b: &mut Bencher) {
b.bytes = REPETITIONS * 2;
b.iter(|| {
let mut r = String::new();
for _ in range(0, REPETITIONS) {
r.push_with_ascii_fast_path('â')
}
});
}

#[bench]
fn from_utf8_lossy_100_ascii(b: &mut Bencher) {
let s = b"Hello there, the quick brown fox jumped over the lazy dog! \
Expand Down

0 comments on commit cc33ce6

Please sign in to comment.