mCemm is a GEMM (General Matrix Multiply) kernel generator for Apple Metal which generates optimized Metal shaders with configurable tile sizes, warp sizes, data types (f16/f32), transpose modes (NN/NT/TN/TT), activations (ReLU/GELU/SiLU), bias, alpha/beta scaling, and more.
- arm MacOS and M series chip
- any platform that supports metal
- metal api
- xcode
- json-c 0.18
- curl
To install mCemm run this command.
curl -L -o mCemm.tar.gz https://github.com/MetalLikeCuda/osxiec/releases/download/%s/mCemm.tar.gz && tar -xvzf mCemm.tar.gz && cd mCemm && sudo sh install.shreplace %s with the latest version
mCemm <path_to_config_file>To update
mCemm -updateTo get help
mCemm -helpTo get version information
mCemm --versionYou can build mCemm using cmake with the following command:
mkdir build && cd build && cmake -S .. -B . -G "Ninja" && ninja{
"output": "gemm_generated.metal",
"defaults": {
"tileM": 64,
"tileN": 64,
"tileK": 16,
"warpM": 32,
"warpN": 32,
"alpha": true,
"beta": true,
"bias": true,
"activation": "gelu"
},
"matrix": {
"dtype": [
"f16",
"f32"
],
"transpose": [
"nn",
"nt",
"tn",
"tt"
]
},
"kernels": [
{
"dtype": "f16",
"transpose": "nn",
"activation": "none",
"bias": false
},
{
"dtype": "f16",
"transpose": "nt",
"activation": "relu"
},
{
"dtype": "f16",
"transpose": "tn",
"activation": "gelu"
},
{
"dtype": "f16",
"transpose": "tt",
"activation": "silu"
},
{
"dtype": "f32",
"transpose": "nn",
"tileM": 64,
"tileN": 32,
"tileK": 16,
"warpM": 32,
"warpN": 16,
"activation": "none",
"bias": false
},
{
"dtype": "f32",
"transpose": "nt",
"tileM": 64,
"tileN": 32,
"tileK": 16,
"warpM": 32,
"warpN": 16,
"activation": "relu"
},
{
"dtype": "f32",
"transpose": "tn",
"tileM": 64,
"tileN": 32,
"tileK": 16,
"warpM": 32,
"warpN": 16,
"activation": "gelu"
},
{
"dtype": "f32",
"transpose": "tt",
"tileM": 64,
"tileN": 32,
"tileK": 16,
"warpM": 32,
"warpN": 16,
"activation": "silu"
}
]
}This only generates metal source code, it doesn't compile it nor benchmark it.
For benchmarking you can use gpumkat.