gemm_optimize

optimize matrix multiply with int64

 gcc -fopenmp -mavx512dq -mavx512f -O3 my_mat_mul.c -o my_mat_mul

#overview
perf stat -d -d -d ./my_mat_mul <N>
#detail
perf record ./my_mat_mul <N>
perf report

the test log run at nus soc cluster xcne node with xeon gold 6230. Run with command:

perf stat -d -d -d -r 5 ./mat_mul <N>

where N varies [1000, 10000, 1000]

there are 4 log files:

mat_mul1.csv: no optimized
mat_mul2.csv: provide mul2 with sequential vector mul-add.
avx512.csv: only optimized with AVX512 unrolling
my_mat_mul.csv: optimized with AVX512 and block reordering for cache by kernel block of 8*8.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
test_log		test_log
README.md		README.md
mat_mul.c		mat_mul.c
my_mat_mul.c		my_mat_mul.c

Provide feedback