LargeMM

A CUBLAS‐CUDA based implementation of multi-GPU large matrix multiplication. It is a standalone C/C++ commandline application to lauch large matrix-matrix multiplicatio and get profiled outputs in a GPU cluster. It can be easily transformed into a C/C++ library given its lightweight codebase.

Built With

CUDA - A parallel computing platform and programming model developed by NVIDIA for GPU-accelerated computing.
cuBLAS - The NVIDIA CUDA Basic Linear Algebra Subprograms (cuBLAS) library for efficient GPU-accelerated linear algebra operations.

Dependencies

The LargeMM application relies on the following dependencies:

Dependency	Version
CUDA	11.6.1+
GCC	10.3.0+
CMake	3.24.2+

CUDA modules should be loaded prior to compilation or execution.

Environment

This application is designed to run on 1-4 Tesla V100 SXM2. The default environment is a GPU node in Gadi.

Important Files

data folder stores performance data of LargeMM.
profile folder stores profiler timeline files for the performance of v2_ngpus_reduction, v1_1_n_streams, and base_cublasDgemm.
test folder stores tests for v2_ngpus_reduction, v1_1_n_streams, and base_cublasDgemm.

Installation

Clone the repository into your workspace and navigate to the project directory:
```
git clone https://github.com/Zlisch/LargeMM.git
cd LargeMM
```
Run the installation script:
```
chmod -x ./INSTALL.sh
./INSTALL.sh
```

Or you can directly download the latest executable from the link.

Documentation

You can either view the documentation in header files of the cloned repository or if you are using Visual Studio Code,

Install the Live Server extension in your Visual Studio Code. To enable Live Server, cmd+shift+p in your Visual Studio Code and type live server in the prompt. Select open with live server.
With the Live Server extension enabled, enter http://127.0.0.1:5500/docs/html/globals.html in your browser to view the documentation.

Running the Application

After running ./INSTALL.sh, use the following to run v2_ngpus_reduction with lookup table on 4 GPUs and print the output.

./bin/largemm -s "-1" -m 28377 -a 2 -g 4

To run the LargeMM with NVIDIA visual profiler, use:

nsys profile --stats=true ./bin/largemm -s "-1" -m 28377 -a 2 -g 4

Or you can build your own run script. A run script template is provided in ./run.sh.

Available Options

-s

Description: Specify the stream stride (square root of the number of streams to be used) for each GPU. If -1 is given, the lookup table will be used instead to decide the number of streams for each GPU.
Example: Run v2_ngpus_reduction with 9 streams for each GPU on 4 GPUs and print the output.

./bin/largemm -s 3 -m 28377 -a 2 -g 4

-a

Description: Specify the algorithm to run.

Value	Algorithm Version
0	`base_cublasDgemm`
1	`v1_1_n_streams`
2	`v2_ngpus_reduction`
3	`v2_ngpus_parallel_a`
4	`v2_ngpus_parallel_a_n_streams_breadth`

-m

Description: Row dimension of the matrix.
Example: Run v2_ngpus_reduction on a square matrix of size 6GB (row width 28377 if double precision is used).

./bin/largemm -s 3 -m 28377 -a 2 -g 4

-g

Description: Specify the number of GPU(s) to use. Cannot be zero.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
data		data
docs		docs
profile		profile
src		src
test		test
.DS_Store		.DS_Store
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
INSTALL.sh		INSTALL.sh
Makefile		Makefile
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LargeMM

Built With

Table of Contents

Dependencies

Environment

Important Files

Installation

Documentation

Running the Application

Available Options

About

Releases 2

Packages

Languages

Zlisch/LargeMM

Folders and files

Latest commit

History

Repository files navigation

LargeMM

Built With

Table of Contents

Dependencies

Environment

Important Files

Installation

Documentation

Running the Application

Available Options

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages