Skip to content
Chenhan D. Yu edited this page Jan 11, 2017 · 23 revisions

HMLP (High Performance Machine Learning Primitives) is not only a library that provides optimized primitives, but it is also a framework to quickly instantiate new primitives. Depending on the need, you may only need to use existing primitives (those we later show in the table), or you may create your own using our template framework. For advanced users who are willing to develop architecture dependent instances, we also provide kernel templates to minimize the work without compromising the performance.

I just want to use existing primitives

The following primitives are provided conditionally on specific architectures.

x=s,d SandyBridge Haswell KNL ARM GPU
xGEMM asm asm int int cuda
xGSKS asm asm int int -
xGSKNN asm asm x int -
xSTRASSEN asm - x - -
xCONV2D - - - - -

Checkout the corresponding wiki page for the specification of the primitives.

Creating your own primitives

It possible to create new GEMM-like primitives using the GKMX frameworks we provide.

OPKERNEL OP1 OP2 OPREDUCE
GEMM identity add mul -
CONV-RELU max(x,0) add mul -
1-norm (Manhattan) identity add abs(a-b) -
2-norm identity max (a-b)^2 -
p-norm identity max pow(a-b) -
Inf-norm identity max abs(a-b) -
Gaussian gaussian add mul -
Linkage disequilibrium div add bitcount(a&b) -
Kmeans iteration 2-norm add mul row-wise argmin
CONV-RELU-POOL max(x,0) add mul block-wise argmax

Be a performance Ninja!

To develop architecture dependent kernels yourself, please checkout Microkernels for more information.