bryance-oyang/sse_utils
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Utilities to take advantage of the various SSE and AVX instruction sets on modern processors. Can potentially make programs run 2x-8x faster Usage notes: You will find documentation for functions in the comment block above each function definition in sse_utils.c Functions are suffixed with "s" or "d" depending on whether they operate on single-precision (32-bit) floats or double-precision (64-bit) doubles Highly suggest you use sse_utils_malloc instead of malloc for data to be passed to sse_utils functions, but this is not required Suggested compiler and OS: gcc and linux, or else you figure out how to get yours working. gcc version must be new enough to support Intel Intrinsics for SSE and AVX CPU must support the SSE and AVX instruction sets, Sandy Bridge or newer should do Speedup only comes from vectorizing, so make data structures appropriately, e.g. if you have a list of (x,y) coordinates, put them all in a single array in memory like: xyxyxyxyxyxy and make a single call to the appropriate sse_utils function. Making many calls to these functions can cause the function call overhead to slow down your program. In other words, you must know what you're doing! You may not notice a speedup if your arrays are too large, because the majority of time may be spent on cache misses and other memory operations rather than actual computation. "Too large" depends on the size of your CPU cache. NOTE: To those wishing to modify source code for this file, be aware that gcc may have problems with function pointers to intrinsics