GitHub - dhy2000/CUDA-Winograd: Fast CUDA Kernels for ResNet Inference.

Introduction

This code implements fast cuda kernels for DNN inference, especially for convolution layers / residule blocks in ResNet. Specifically, the kernels combine three parts into one piece:

Convolution
Batch Nomalization (BN + Scale)
Activation (ReLU)

For implementation details, please refer to the technical report included in this repo. Winograd algorithm is used for 3 * 3 convolutional kernels.

Usage

mkdir data
python data_generator.py
make
./Test 0

Set parameters in data_generator.py
Run 6 test cases with changing numbers from 0 to 5 after ./Test

Results

3 * 3 Kernels

Kernals	Operations	128 / 128	256 / 256
Cudnn	Gemm + BN + ReLU	214us	384us
Cudnn	Winograd + BN + ReLU	95us	155us
Our Kernel	Winograd + BN + ReLU	59us	117us

1 * 1 Kernels [BUGGY NUMBERS]

Kernals	512 / 128	128 / 512	1024 / 256	256 / 1024
Operations	Gemm + BN + ReLU	Gemm + BN	Gemm + BN + ReLU	Gemm + BN + ReLU
Cudnn	119us	115us	219us	214us
Our Kernel	58us	55us	186us	181us

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
Kernel128_one.cu		Kernel128_one.cu
Kernel128_one.h		Kernel128_one.h
Kernel128_winograd.cu		Kernel128_winograd.cu
Kernel128_winograd.h		Kernel128_winograd.h
Kernel256_one.cu		Kernel256_one.cu
Kernel256_one.h		Kernel256_one.h
Kernel256_winograd.cu		Kernel256_winograd.cu
Kernel256_winograd.h		Kernel256_winograd.h
Makefile		Makefile
README.md		README.md
Test.c		Test.c
data_generator.py		data_generator.py
report.pdf		report.pdf
util.c		util.c
util.h		util.h

dhy2000/CUDA-Winograd

Folders and files

Latest commit

History

Repository files navigation

Introduction

Usage

Results

3 * 3 Kernels

1 * 1 Kernels [BUGGY NUMBERS]

About

Resources

Stars

Watchers

Forks

Languages