# Benchmark
This notebook shows the benchmark results of all covered ops.

## Setup

In [1]:
from IPython.display import clear_output

def build_package(name, url, commit=None, deps=True):
    import importlib
    import os, sys
    if importlib.util.find_spec(name) is None:
        os.system(f"git clone {url} {name} || true")
        if commit is not None:
            os.system(f"cd {name}; git checkout {commit}")
        os.system(f"cd {name}; git submodule update --init --recursive")
        no_deps = ""
        if deps:
            os.system(f"cd {name}; pip3 install -r requirements.txt || true")
        else:
            no_deps = "--no-deps"
        clear_output()
        os.system(f'cd {name}; pip3 install -e ".[dev]" {no_deps}')

build_package("transformers", "https://github.com/huggingface/transformers.git", deps=False)
build_package("xformers", "https://github.com/facebookresearch/xformers.git")
build_package("epoi", "https://github.com/comaniac/epoi.git")

## Benchmark

In [3]:
from IPython.display import display, Javascript

disable_js = """
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}
"""

def load_ipython_extension():
    display(Javascript(disable_js))
    print ("autoscrolling long output is disabled")
    
load_ipython_extension()

<IPython.core.display.Javascript object>

autoscrolling long output is disabled


In [4]:
!python3 -m epoi.benchmark

===== Environment =====

GPU: Tesla V100-SXM2-16GB

PyTorch Configuration
   Config         Value
-------------  ------------
   Version     1.12.1+cu116
Built w. CUDA      11.6


Other Libraries Configuration
  Package       Version                   Commit SHA
------------  -----------  ----------------------------------------
    epoi        0.1.dev    7581ef3e0ea1146c58b2633ef30444552a1120e3
transformers  4.24.0.dev0  12ce2941c7b67c0dedac0f0468b3ed854fa940ab
  xformers    0.0.14.dev   ba93c5012d00bd1b010514a7bc9bd938c1ad6149
   triton        2.0.0                       N/A
    apex          0.1                        N/A
===== Environment =====

[2022-10-25 01:32:41] INFO main: Selected bias_gelu
[2022-10-25 01:32:41] INFO main: Selected dropout_add_ln
[2022-10-25 01:32:41] INFO main: Selected bert_attention
[2022-10-25 01:32:41] INFO main: Selected gpt_attention
[2022-10-25 01:32:41] INFO main: Selected qkv_self_attn
[2022-10-25 01:32:41] INFO main: Selected layer_norm
[2022-10-25

[2022-10-25 01:49:23] INFO main: [5/7] Benchmarking qkv_self_attn
[----------------- QKV in Self-Attention ------------------]
                           |  NoFuse (FP16)  |  Fused (FP16)
1 threads: -------------------------------------------------
      (4, 512, 1024, 16)   |       1128.0    |       789.4  
      (8, 512, 1024, 16)   |       1291.0    |      1408.7  
      (16, 512, 1024, 16)  |       2305.1    |      2634.7  
      (16, 512, 8192, 64)  |     110519.4    |    117100.7  
      (4, 2048, 8192, 64)  |     110485.5    |    117164.0  

Times are in microseconds (us).

       Shape          NoFuse (FP16)    Fused (FP16)
-------------------  ---------------  --------------
(4, 512, 1024, 16)       42.0063         82.0063
(8, 512, 1024, 16)       78.0063         144.006
(16, 512, 1024, 16)      150.006         200.006
(16, 512, 8192, 64)      1152.05         1376.05
(4, 2048, 8192, 64)      1152.05         1376.05

Memory is in MBs and excludes inputs/outputs.

[2022-10-25 01