This repo contains several implementations of GEMM for CPU with different optimizations. This is related to the article in my blog Optimizing CPU GEMM step by step.
To run all of the implementaions in your machine you can just run the script benchmark_all.sh that will generate a file gemm_benchmark_results.csv that can latter be turned into a plot with plot.py. OpenBLAS is a dependency as it is used as a reference.