Skip to content

Commit

Permalink
Enable AVX support. Up to 2x faster is some cases.
Browse files Browse the repository at this point in the history
  • Loading branch information
RazrFalcon committed Oct 20, 2020
1 parent 37d21ee commit a5737da
Show file tree
Hide file tree
Showing 8 changed files with 319 additions and 236 deletions.
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ float-cmp = { version = "0.8", default-features = false, features = ["std"] }
memmap2 = { version = "0.1", optional = true }
num-ext = { git = "https://github.com/RazrFalcon/num-ext", rev = "109f01a" }
png = { version = "0.16.6", optional = true }
wide = { version = "0.6.1", features = ["std"] }
wide = { git = "https://github.com/Lokathor/wide", rev = "cee7b6f", features = ["std"] }

[features]
default = ["png-format"]
Expand Down
42 changes: 20 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ with a focus on a rendering quality, speed and binary size.

The main motivation behind this library is to have a small, high-quality 2D rendering
library that can be used by [resvg]. And the choice is rather limited.
You basically have to choose between cairo, Qt and Skia. And all of them are
You basically have to choose between [cairo], Qt and Skia. And all of them are
relatively bloated, hard to compile and distribute. Not to mention that none of them
is written in Rust.

Expand All @@ -32,42 +32,38 @@ uses an obscure build system (`gn`) which still uses Python2
and doesn't really support 32bit targets.

`tiny-skia` tries to be small, simple and easy to build.
Currently, it has around 12 KLOC and compiles in less than 5s on a modern CPU.

## Performance

Does `tiny-skia` as fast as [Skia]? The short answer is no. The longer one is: it depends.
Currently, `tiny-skia` is 20-100% slower than Skia.
Which is still faster than [cairo] and [raqote].

The heart of Skia's CPU rendering is
[SkRasterPipeline](https://github.com/google/skia/blob/master/src/opts/SkRasterPipeline_opts.h).
And this is an extremely optimized piece of code.
But to be a bit pedantic, it's not really a C++ code. It relies on clang's
non-standard vector extensions, which means that you must build it with clang.
non-standard vector extensions, which means that it works only with clang.
You can actually build it with gcc/msvc, but it will simply ignore all the optimizations
and become 15-30 *times* slower! Which makes it kinda useless. And `tiny-skia`
is way closer to a clang version.

Also, `SkRasterPipeline` supports AVX2 instructions, which provide 256-bits wide types.
This makes common operations almost 2x faster, compared to a generic SSE2/128-bits one.
Which is no surprise.<br>
The problem is that Skia doesn't support dynamic CPU detection.
So by enabling AVX2 you're making the resulting binary non-portable,
since you need a Haswell processor or newer.<br>
Right now, `tiny-skia` directly supports only SSE2 instructions
and relies on autovectorization for newer one.
and become 15-30 *times* slower! Which makes it kinda useless.

Skia also supports ARM NEON instructions, which are unavailable in a stable Rust at the moment.
Therefore a default scalar implementation will be used instead on ARM and other non-x86 targets.
Therefore a fallback scalar implementation will be used instead on ARM and other non-x86 targets.
So if you're targeting ARM, you better stick with Skia.

Accounting all above, `tiny-skia` is 20-100% slower than "a Skia built for a generic x86_64 CPU".
Which still makes if faster than `cairo` in many cases.
Also note, that neither Skia or `tiny-skia` are supporting dynamic CPU detection,
so by enabling newer instructions you're making the resulting binary non-portable.

We can technically use the `SkRasterPipeline` directly, to achive the same performance as Skia has.
But it means that we have to complicate our build process quite a lot.
Mainly because we have to use only clang.
So having a pure Rust library, even a bit slower one, is still a good trade off.
Essentially, you will get a decent performance on x86 targets by default.
But if you are looking for an even better performance, you should compile your application
with `RUSTFLAGS="-Ctarget-cpu=haswell"` env variables to enable AVX instructions.

You can find more information in [benches/README.md](./benches/README.md).

## Rendering quality

Unless there is a bug, `tiny-skia` must produce exactly the same results as Skia.

## API overview

The API is a bit unconventional. It doesn't look like cairo, QPainter (Qt), HTML Canvas or even Skia.
Expand Down Expand Up @@ -163,7 +159,7 @@ Therefore we have to compromise or even rewrite some parts from scratch.

## Alternatives

Right now, the only pure Rust alternative is [raqote](https://github.com/jrmuizel/raqote).
Right now, the only pure Rust alternative is [raqote].

- It doesn't support high-quality antialiasing (hairline stroking in particular).
- It's very slow (see [benchmarks](./benches/README.md)).
Expand All @@ -179,4 +175,6 @@ The project relies on some unsafe code.
The same as used by [Skia]: [New BSD License](./LICENSE)

[Skia]: https://skia.org/
[cairo]: https://www.cairographics.org/
[raqote]: https://github.com/jrmuizel/raqote
[resvg]: https://github.com/RazrFalcon/resvg
5 changes: 2 additions & 3 deletions benches/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 11 additions & 11 deletions benches/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,17 +44,17 @@ Filling a shape with a solid color.
| overlay | 826,240 | 514,436 | 733,158 | 385,684 | 3,412,192 | 7,016,443 |
| darken | 631,307 | 434,548 | 610,493 | 317,215 | 3,579,931 | 5,230,384 |
| lighten | 633,910 | 434,123 | 621,340 | 319,867 | 3,584,409 | 5,255,202 |
| color_dodge | 1,794,356 | 1,386,241 | 780,556 | 679,996 | 5,151,488 | 9,634,447 |
| color_burn | 1,878,874 | 1,437,832 | 861,359 | 708,962 | 5,007,181 | 9,617,102 |
| color_dodge | 1,794,356 | 1,002,832 | 780,556 | 679,996 | 5,151,488 | 9,634,447 |
| color_burn | 1,878,874 | 1,055,831 | 861,359 | 708,962 | 5,007,181 | 9,617,102 |
| hard_light | 804,040 | 508,038 | 734,878 | 370,307 | 3,442,584 | 7,034,417 |
| soft_light | 2,730,721 | 2,219,540 | 1,225,756 | 986,403 | 5,900,415 | 11,941,630 |
| soft_light | 2,730,721 | 1,412,907 | 1,225,756 | 986,403 | 5,900,415 | 11,941,630 |
| difference | 650,915 | 440,554 | 632,797 | 326,285 | 3,936,284 | 5,718,776 |
| exclusion | 530,865 | 399,089 | 618,645 | 316,235 | 3,842,131 | 6,082,393 |
| multiply | 586,406 | 434,776 | 627,202 | 316,841 | 3,608,817 | 5,986,364 |
| hue | 3,662,119 | 3,099,846 | 1,705,745 | 1,413,791 | 7,517,902 | 13,716,827 |
| saturation | 3,675,423 | 3,026,723 | 1,676,187 | 1,411,117 | 7,443,261 | 13,752,382 |
| color | 3,125,045 | 2,549,478 | 1,431,804 | 1,115,800 | 6,070,058 | 10,537,391 |
| luminosity | 3,117,391 | 2,419,344 | 1,351,584 | 1,090,402 | 6,124,294 | 10,488,916 |
| hue | 3,662,119 | 2,008,062 | 1,705,745 | 1,413,791 | 7,517,902 | 13,716,827 |
| saturation | 3,675,423 | 2,006,886 | 1,676,187 | 1,411,117 | 7,443,261 | 13,752,382 |
| color | 3,125,045 | 1,655,401 | 1,431,804 | 1,115,800 | 6,070,058 | 10,537,391 |
| luminosity | 3,117,391 | 1,599,196 | 1,351,584 | 1,090,402 | 6,124,294 | 10,488,916 |

*Destination* is faster in `tiny-skia`, because we're exiting immediately,
while Skia uses null blitter, so edges processing is still in place.
Expand Down Expand Up @@ -144,17 +144,17 @@ Draws a large spiral using a subpixel stroke width.
| linear, three stops, evenly spread | 2,068,385 | 1,940,036 | 1,806,614 | 781,344 | 2,413,454 | 4,021,338 |
| linear, three stops, unevenly spread | 2,068,394 | 1,939,826 | 1,805,379 | 687,479 | 2,423,142 | 3,412,176 |
| simple radial | 2,293,883 | 2,205,554 | 2,050,437 | 805,376 | 4,704,141 | 5,531,178 |
| two point radial | 3,627,946 | 3,716,636 | 1,943,448 | 1,083,230 | 4,709,760 | 13,454,676 |
| two point radial | 3,627,946 | 2,516,614 | 1,943,448 | 1,083,230 | 4,709,760 | 13,454,676 |

### pattern

`pattern.rs`

| Test/Library | tiny-skia SSE2 | tiny-skia AVX2 | Skia SSE2 | Skia AVX2 | cairo | raqote |
| --------------------------- | -------------: | -------------: | ---------: | ---------: | ----------: | ---------: |
| plain (nearest, no ts) | 3,438,646 | 3,326,305 | 1,315,079 | 1,122,982 | 785,550 | 1,865,327 |
| lq (bilinear, with ts) | 9,982,212 | 8,247,785 | 4,484,023 | 2,646,523 | 17,612,685 | 24,906,379 |
| hq (bicubic/gauss, with ts) | 34,931,424 | 27,438,092 | 12,386,848 | 9,364,356 | 162,771,632 | - |
| plain (nearest, no ts) | 3,438,646 | 2,263,839 | 1,315,079 | 1,122,982 | 785,550 | 1,865,327 |
| lq (bilinear, with ts) | 9,982,212 | 4,865,226 | 4,484,023 | 2,646,523 | 17,612,685 | 24,906,379 |
| hq (bicubic/gauss, with ts) | 34,931,424 | 14,760,398 | 12,386,848 | 9,364,356 | 162,771,632 | - |

Note that `raqote` doesn't support high quality filtering.

Expand Down
Loading

0 comments on commit a5737da

Please sign in to comment.