Dynamic N:M Fine-grained Structured Sparse Attention Mechanism

This repo contains the artifact for our PPoPP paper Dynamic N:M Fine-grained Structured Sparse Attention Mechanism.

Requirements

The accuracy evaluation script requires two A100 GPUs. The Speedup evaluation takes one A100 GPU. Other pre-Ampere GPUs are not supported as DFSS leverages the Ampere sparse tensor core. Other requirements are covered by the docker file.

Get Source Code

Get source code with

https://github.com/apuaaChen/DFSS.git

Get the submodules

git submodule update --init --recursive

Using Docker

We use NGC pytorch container 21.06. To build the container, run

cd docker && bash build.sh

To launch the container, run

cd .. && bash docker/launch.sh

The code is mounted to /workspace/dfss.

Installation

Our package pydfss can be installed with

cd /workspace/dfss && bash install.sh

Speedup under different sequence length

We provide the script to reproduce the attention speedup under different sequence length with bfloat16 data type.

python benchmark.py

As mentioned in the paper, we only compare the QK^T, Softmax and AV in this script, as the optimizations in other parts are orthogonal to our DFSS. The expected result could be around

attention speedup: 1.38 ~ 1.86

Accuracy

We provide training and inference scripts of BERT-large on SQuAD v1.1 with DFSS 2:4 under bfloat16 data type (Table 2 in the paper). The script requires 2 A100 GPUs, and it takes about 1.5 hour to finish.

mkdir ckpt && python bert_squad_finetuning.py

Expected result would be

F1 score on BERT-large SQuAD v1.1
Transformer: 93.10, DFSS 2:4: 93.19

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
build/lib.linux-x86_64-3.8/dfss		build/lib.linux-x86_64-3.8/dfss
docker		docker
question_answering		question_answering
src		src
test/unit		test/unit
thirdparty		thirdparty
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
benchmark.py		benchmark.py
bert_squad_finetuning.py		bert_squad_finetuning.py
install.sh		install.sh
setup.py		setup.py

apuaaChen/DFSS

Folders and files

Latest commit

History

Repository files navigation

Dynamic N:M Fine-grained Structured Sparse Attention Mechanism

Requirements

Get Source Code

Using Docker

Installation

Speedup under different sequence length

Accuracy

About

Resources

Stars

Watchers

Forks

Languages