Skip to content

bryance-oyang/sse_utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Utilities to take advantage of the various SSE and AVX instruction
sets on modern processors. Can potentially make programs run 2x-8x
faster

Usage notes:

You will find documentation for functions in the comment block above
each function definition in sse_utils.c

Functions are suffixed with "s" or "d" depending on whether they
operate on single-precision (32-bit) floats or double-precision
(64-bit) doubles

Highly suggest you use sse_utils_malloc instead of malloc for data to
be passed to sse_utils functions, but this is not required

Suggested compiler and OS: gcc and linux, or else you figure out how
to get yours working. gcc version must be new enough to support Intel
Intrinsics for SSE and AVX

CPU must support the SSE and AVX instruction sets, Sandy Bridge or
newer should do

Speedup only comes from vectorizing, so make data structures
appropriately, e.g. if you have a list of (x,y) coordinates, put them
all in a single array in memory like: xyxyxyxyxyxy and make a single
call to the appropriate sse_utils function. Making many calls to
these functions can cause the function call overhead to slow down
your program. In other words, you must know what you're doing!

You may not notice a speedup if your arrays are too large, because
the majority of time may be spent on cache misses and other memory
operations rather than actual computation. "Too large" depends on the
size of your CPU cache.

NOTE: To those wishing to modify source code for this file, be aware
that gcc may have problems with function pointers to intrinsics

About

Utilities to take advantage of the various SSE and AVX instruction sets on modern processors. Can potentially make programs run 2x-8x faster

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages