Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Thanks to Mike Frysinger i could torture some real IA64 HW (and myself too...). Throw away the first post of this patch, it's broken, in several ways. (Note to self: Don't try to be a cool kid and save on parenthesis) And even if fixed, it's half as slow as the generic code. Counting instructions is a bad move on IA64. So here it is, new and improved, with 400% more unrolling: an IA64 (McKinley) -------- orig ------ a: 0x0CB4B676, 10000 * 160000 bytes t: 1912 ms a: 0x25BEB273, 10000 * 159999 bytes t: 1916 ms a: 0x733CB174, 10000 * 159998 bytes t: 1912 ms a: 0x1144AF76, 10000 * 159996 bytes t: 1916 ms a: 0x3F4ECB8A, 10000 * 159992 bytes t: 1916 ms a: 0x1902A382, 10000 * 159984 bytes t: 1912 ms -------- vec ------ a: 0x0CB4B676, 10000 * 160000 bytes t: 760 ms a: 0x25BEB273, 10000 * 159999 bytes t: 764 ms a: 0x733CB174, 10000 * 159998 bytes t: 760 ms a: 0x1144AF76, 10000 * 159996 bytes t: 760 ms a: 0x3F4ECB8A, 10000 * 159992 bytes t: 808 ms a: 0x1902A382, 10000 * 159984 bytes t: 760 ms speedup: 2.515789 next stop, blackfin, then working on the ARM iWMMXt version for XScale (N.B.: does someone have a link handy to the instruction reference?), and when some time has passed a complete repost, there are little changes here and there.
- Loading branch information