Skip to content

Latest commit

 

History

History
95 lines (54 loc) · 6.39 KB

how-to-use-parallel-invoke-to-write-a-parallel-sort-routine.md

File metadata and controls

95 lines (54 loc) · 6.39 KB
description title ms.date helpviewer_keywords ms.assetid
Learn more about: How to: Use parallel_invoke to Write a Parallel Sort Routine
How to: Use parallel_invoke to Write a Parallel Sort Routine
11/04/2016
task_handle class, example
task_group class, example
make_task function, example
structured_task_group class, example
improving parallel performance with task groups [Concurrency Runtime]
53979a2a-525d-4437-8952-f1ff85b37673

How to: Use parallel_invoke to Write a Parallel Sort Routine

This document describes how to use the parallel_invoke algorithm to improve the performance of the bitonic sort algorithm. The bitonic sort algorithm recursively divides the input sequence into smaller sorted partitions. The bitonic sort algorithm can run in parallel because each partition operation is independent of all other operations.

Although the bitonic sort is an example of a sorting network that sorts all combinations of input sequences, this example sorts sequences whose lengths are a power of two.

Note

This example uses a parallel sort routine for illustration. You can also use the built-in sorting algorithms that the PPL provides: concurrency::parallel_sort, concurrency::parallel_buffered_sort, and concurrency::parallel_radixsort. For more information, see Parallel Algorithms.

Sections

This document describes the following tasks:

Performing Bitonic Sort Serially

The following example shows the serial version of the bitonic sort algorithm. The bitonic_sort function divides the sequence into two partitions, sorts those partitions in opposite directions, and then merges the results. This function calls itself two times recursively to sort each partition.

[!code-cppconcrt-parallel-bitonic-sort#1]

[Top]

Using parallel_invoke to Perform Bitonic Sort in Parallel

This section describes how to use the parallel_invoke algorithm to perform the bitonic sort algorithm in parallel.

To perform the bitonic sort algorithm in parallel

  1. Add a #include directive for the header file ppl.h.

    [!code-cppconcrt-parallel-bitonic-sort#10]

  2. Add a using directive for the concurrency namespace.

    [!code-cppconcrt-parallel-bitonic-sort#11]

  3. Create a new function, called parallel_bitonic_mege, which uses the parallel_invoke algorithm to merge the sequences in parallel if there is sufficient amount of work to do. Otherwise, call bitonic_merge to merge the sequences serially.

    [!code-cppconcrt-parallel-bitonic-sort#2]

  4. Perform a process that resembles the one in the previous step, but for the bitonic_sort function.

    [!code-cppconcrt-parallel-bitonic-sort#3]

  5. Create an overloaded version of the parallel_bitonic_sort function that sorts the array in increasing order.

    [!code-cppconcrt-parallel-bitonic-sort#4]

    The parallel_invoke algorithm reduces overhead by performing the last of the series of tasks on the calling context. For example, in the parallel_bitonic_sort function, the first task runs on a separate context, and the second task runs on the calling context.

    [!code-cppconcrt-parallel-bitonic-sort#5]

The following complete example performs both the serial and the parallel versions of the bitonic sort algorithm. The example also prints to the console the time that is required to perform each computation.

[!code-cppconcrt-parallel-bitonic-sort#8]

The following sample output is for a computer that has four processors.

serial time: 4353
parallel time: 1248

[Top]

Compiling the Code

To compile the code, copy it and then paste it in a Visual Studio project, or paste it in a file that is named parallel-bitonic-sort.cpp and then run the following command in a Visual Studio Command Prompt window.

cl.exe /EHsc parallel-bitonic-sort.cpp

Robust Programming

This example uses the parallel_invoke algorithm instead of the concurrency::task_group class because the lifetime of each task group does not extend beyond a function. We recommend that you use parallel_invoke when you can because it has less execution overhead than task group objects, and therefore lets you write better performing code.

The parallel versions of some algorithms perform better only when there is sufficient work to do. For example, the parallel_bitonic_merge function calls the serial version, bitonic_merge, if there are 500 or fewer elements in the sequence. You can also plan your overall sorting strategy based on the amount of work. For example, it might be more efficient to use the serial version of the quick sort algorithm if the array contains fewer than 500 items, as shown in the following example:

[!code-cppconcrt-parallel-bitonic-sort#9]

As with any parallel algorithm, we recommend that you profile and tune your code as appropriate.

See also

Task Parallelism
parallel_invoke Function