Sort

## 1 Overview

### 1.1 Location

\$(AMDAPPDKSAMPLESROOT)\samples\C++Amp\samples\

where AMDAPPDKSAMPLESROOT is an environment variable pointing to the installation path of SHOC-AMP in AMD APP SDK samples.

#### 1.2 How to Run

See the *Getting Started* guide for how to build samples. You must first compile the sample. Use the command line to change to the directory where the executable is located. The pre-compiled sample executable is at  $\arrowvert (AMDAPPSDKSAMPLESROOT) \simeq \arrowvert (AMDAPPSDKSAMPLESROOT) \simeq \arrowvert (AMDAPPSDKSAMPLESROOT) \simeq \arrowvert (C++Amp\bin\x86_64\ for 64-bit builds.)$ 

Ensure you have installed Microsoft® Visual Studio® 2012 or higher.

Type the following command(s).

- Sort
   This runs the program with the default option s = 8.
- 2. Sort -h
  This prints the help file.

# 1.3 Command Line Options

Table 1 lists, and briefly describes, the command line options.

Table 1 Command Line Options

| Short Form | Long Form  | Description                                                                        |
|------------|------------|------------------------------------------------------------------------------------|
| -h         | help       | Shows all command options and their respective meaning.                            |
| -d         | quiet      | Quiet mode. Suppresses all text output.                                            |
| -e         | verify     | Verify results against reference implementation.                                   |
| -t         | timing     | Print timing.                                                                      |
| -A         | version    | AMD APP SDK version string.                                                        |
| -đ         | deviceId   | Select deviceld to be used (0 to N-1, where N is the number of available devices). |
| -s         | size       | The number of M data to be sorted.                                                 |
| -i         | iterations | Number of iterations for kernel execution.                                         |

## 2 Introduction

This sample implements a radix sort by using C++ Amp. The radix sort algorithm is divided into three phases:

Sort 1 of 3

- 1. Calculate the histogram of an unsorted array.
- 2. Scan the histogram bins.
- 3. Rank and permute to keys to get a sorted array.

## 3 Implementation Details

The implemented radix sort breaks keys (32 integers) into 16-bit digits and sorts one 16-bit digit at a time, starting with the least significant one. It loops eight times to complete sorting. In each *i*th loop, the following three phases sort the input array using ith 16-bit digit.

1. Calculate histogram bins.

There are cNTiles \* cTileSize threads to calculate histogram bins. cTileSize is the number of threads per AMP tile. cNTiles is the number of AMP tiles. The input array is divided into blocks of regionSize \* cNTiles elements. In this case, cTileSize = 256, cNTiles = 64. Each work-item calculates its histogram bin from the allotted 512 elements and passes this histogram to next phase.

2. Scan histogram bins.

In this phase the histogram bins are scanned column-wise, where histogram bins are arranged in the following way.

There are B \* N histogram bins, where B is the number of blocks, and N is the number of work-items in a block. Histogram bins are arranged such that the 0th block bin comes first, and the (B - 1)th block comes last. Each block's histogram bins are arranged so that the 0th work-item bin comes first, and (N - 1)th work-item bin comes last.

The scanned histogram is passed to next phase.

3. Rank and permute keys to get the sorted array.

Each work-item permutes the allotted 128 elements by using its scanned histogram bins.

#### 4 References

- 1. Marcho Zagha and Guy E. Blelloch. "Radix Sort For Vector Multiprocessor." in: Conference on High Performance Networking and Computing, pp. 712-721, 1991.
- 2. Guy E. Blelloch, Prefix Sums and Their Applications, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1990.

Contact

Advanced Micro Devices, Inc. One AMD Place P.O. Box 3453 Sunnyvale, CA, 94088-3453

Phone: +1.408.749.4000

URL:

Developing:

Support:

Forum:

notice.

The contents of this document are provided in connection with Advanced Micro Devices, Inc. ("AMD") products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The information contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD's Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right.

AMD's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD's product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without

developer.amd.com/appsdksupport

developer.amd.com/openclforum

For AMD Accelerated Parallel Processing:

developer.amd.com/appsdk

developer.amd.com/

#### **Copyright and Trademarks**

© 2012 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, Radeon, FireStream, and combinations thereof are trademarks of Advanced Micro Devices, Inc. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos. Other names are for informational purposes only and may be trademarks of their respective owners.

3 of 3