Skip to content

MetalLikeCuda/mCemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mCemm

mCemm is a GEMM (General Matrix Multiply) kernel generator for Apple Metal which generates optimized Metal shaders with configurable tile sizes, warp sizes, data types (f16/f32), transpose modes (NN/NT/TN/TT), activations (ReLU/GELU/SiLU), bias, alpha/beta scaling, and more.

Requirements:

  • arm MacOS and M series chip
  • any platform that supports metal
  • metal api
  • xcode
  • json-c 0.18
  • curl

Installation

To install mCemm run this command.

curl -L -o mCemm.tar.gz https://github.com/MetalLikeCuda/osxiec/releases/download/%s/mCemm.tar.gz && tar -xvzf mCemm.tar.gz && cd mCemm && sudo sh install.sh

replace %s with the latest version

Usage:

mCemm <path_to_config_file>

Commands you can use

To update

mCemm -update

To get help

mCemm -help

To get version information

mCemm --version

Building

You can build mCemm using cmake with the following command:

mkdir build && cd build && cmake -S .. -B . -G "Ninja" && ninja

Example config:

{
    "output": "gemm_generated.metal",
    "defaults": {
        "tileM": 64,
        "tileN": 64,
        "tileK": 16,
        "warpM": 32,
        "warpN": 32,
        "alpha": true,
        "beta": true,
        "bias": true,
        "activation": "gelu"
    },
    "matrix": {
        "dtype": [
            "f16",
            "f32"
        ],
        "transpose": [
            "nn",
            "nt",
            "tn",
            "tt"
        ]
    },
    "kernels": [
        {
            "dtype": "f16",
            "transpose": "nn",
            "activation": "none",
            "bias": false
        },
        {
            "dtype": "f16",
            "transpose": "nt",
            "activation": "relu"
        },
        {
            "dtype": "f16",
            "transpose": "tn",
            "activation": "gelu"
        },
        {
            "dtype": "f16",
            "transpose": "tt",
            "activation": "silu"
        },
        {
            "dtype": "f32",
            "transpose": "nn",
            "tileM": 64,
            "tileN": 32,
            "tileK": 16,
            "warpM": 32,
            "warpN": 16,
            "activation": "none",
            "bias": false
        },
        {
            "dtype": "f32",
            "transpose": "nt",
            "tileM": 64,
            "tileN": 32,
            "tileK": 16,
            "warpM": 32,
            "warpN": 16,
            "activation": "relu"
        },
        {
            "dtype": "f32",
            "transpose": "tn",
            "tileM": 64,
            "tileN": 32,
            "tileK": 16,
            "warpM": 32,
            "warpN": 16,
            "activation": "gelu"
        },
        {
            "dtype": "f32",
            "transpose": "tt",
            "tileM": 64,
            "tileN": 32,
            "tileK": 16,
            "warpM": 32,
            "warpN": 16,
            "activation": "silu"
        }
    ]
}

Notes:

This only generates metal source code, it doesn't compile it nor benchmark it.

For benchmarking you can use gpumkat.

About

a GEMM (General Matrix Multiply) kernel generator

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors