LibShalom

Contact: Weiling Yang (2954427597@qq.com)

LibShalom is a Library for Small Irregular-shaped Matrix Multiplications on ARMv8-based processors. It improves the performance of small and irregular-shaped GEMMs on ARMv8-based processors by improving the shortcomings of existing BLAS libraries, such as packing accounts for a large portion of the runtime, inefficient edge case processing and unreasonable parallelization methods.

This work continues to be optimized, and we need some time. Packing at micro-kernel is key to improving performance. This trick can even be used on large-scale GEMM.

Reference

Weiling Yang, Jianbin Fang, Dezun Dong, Xing Su, Zheng Wang. LIBSHALOM: optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores (SC 2021). DOI: https://dl.acm.org/doi/10.1145/3458817.3476217

Software dependences

GNU Compiler (GCC) (>=v8.2)
OpenMP

hardware platform

Phytium 2000+, Kunpeng 920, ThunderX2 or otther ARMv8-based processors

Compile and install

$ cd NN_LIB && make  
$ make install PREFIX= the installation path

These commands will copy LibShalom library and headers in the installation path PREFIX.

Compiling with LibShalom

All LibShalom definitions and prototypes may be included in your C source file by including a single header file, LibShalom.h:

#include <stdio.h>
#include <stdlib.h>
#include "LibShalom.h"

API

LibShalom_sgemm(int transa, int transb, float *C, float *A, float *B, long M, long N, long K) // Interface of small SGEMM
LibShalom_sgemm_mp(int transa, int transb, float *C, float *A, float *B, long M, long N, long K) // Interface of irregular-shaped SGEMM
LibShalom_dgemm(int transa, int transb, double *C, double *A, double *B, long M, long N, long K) // Interface of small DGEMM
LibShalom_set_thread_nums(int num) // Set the total number of threads

Running Benchmark

The command

$ cd benchmark/small_SGEMM && make

will compile the benchmark program of fp32 small GEMM to generate the executable file main. By executing main, the user can get the evaluation result of the matrices of sizes from 8x8x8 to 128x128x128.

Getting Started

the following C code is focused on a specific functionality but may be considered as Hello LibShalom.

#include <stdlib.h>
#include <stdlib.h>
#include "LibShalom.h"

int main()
{

	int i,j,k;
	int loop= 100;
	long M, N, K;
        M= N = K = 80;
        /* row-major */   	
	float *A = ( float * ) malloc( K* M * sizeof( float ) );
	float *B = ( float * ) malloc( K* N * sizeof( float ) );
	float *C = ( float * ) malloc( M* N * sizeof( float ) );

	double drand48();
	/* initialize input matrices A and B*/
	for ( i = 0; i < M; i++ )
	{
		for ( j = 0; j < K; j++ )
			A [i* K + j]= 2.0 * (float)drand48( ) - 1.0 ;
	}

	for ( i = 0; i < K; i++ )
	{
		for ( j = 0; j < N; j++ )
			B [i * K + j]= 2.0 * (float)drand48( ) - 1.0 ;
	}

	// warm up
	//perform C = A * B (B is transposed)
	for( i =0 ;i< 5; i++)
		LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);

	for( i= 0; i< loop ;i++)
		LibShalom_sgemm(NoTrans, Trans, C, A, B, M, N, K);


	free(A);
	free(B);
	free(C);
	return 0;
}

The makefile corresponding to this program:

LibShalom_PREFIX = $ path to install LibShalom 
LibShalom_INC    = $(LibShalom_PREFIX)/SMM/include
LibShalom_LIB    = $(LibShalom_PREFIX)/SMM/lib/libsmm.a 

OTHER_LIBS  =-fopenmp

CC          = g++
CFLAGS      = -O3 -I$(LibShalom_INC)
LINKER      = $(CC)

OBJS        = Hello.o

%.o: %.c
	 $(CC) $(CFLAGS) -c -fopenmp $< -o $@

all: $(OBJS)
	$(LINKER) $(OBJS) $(LibShalom_LIB) $(OTHER_LIBS) -o a.out

Note

The matrices are stored in the row-major format in this library. We will keep this library updated and maintained.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
NN_LIB		NN_LIB
benchmark		benchmark
evaluate		evaluate
Hello.c		Hello.c
LICENSE		LICENSE
README.md		README.md
platforms.jpg		platforms.jpg
platforms.png		platforms.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LibShalom

Reference

Software dependences

hardware platform

Compile and install

Compiling with LibShalom

API

Running Benchmark

Getting Started

Note

About

Releases

Packages

Languages

License

AnonymousYWL/LibShalom

Folders and files

Latest commit

History

Repository files navigation

LibShalom

Reference

Software dependences

hardware platform

Compile and install

Compiling with LibShalom

API

Running Benchmark

Getting Started

Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages