Skip to content

enhancements parallel

empyrical edited this page · 1 revision
Clone this wiki locally

This page describes potential support for parallel execution of code blocks in Cython and unPython based on OpenMP.

Note that this is different from parallel execution of Python code in threads or using the multiprocessing module, which can be easily achieved in both Python and Cython using decorated functions.

Usage scenario

Parallel loops are important in most contexts but they are particularly important for numerical applications. Look at the following code

for i in xrange(m):
    # C loop body goes here

If the programmer wishes to parallelize the above loop, there are currently no mechanisms to do so short of writing C code by hand.

Design Constraints

The design constraints are

  1. It should have a well defined serial semantics to allow compilation and execution on systems without OpenMP support.
  2. It should match Python syntax and be executable in CPython. Since this deals with non-Python code, however, this is not a requirement.
  3. The construct should be extensible to include future enhancements like threadlocal variables or reduction variables.
  4. Nested parallelism should be easy. Ideally, the implementation would accept any iterator and loop over it in parallel. A producer thread will iterate over an iterator and keep producing values. (This producer-consumer scenario was proposed by Stefan Behnel). Multiple worker threads will consume them. Its also possible to have a more restricted proposal where we only have parallel C style for loops.

The following are the proposals so far :

Proposal 1 :

Due to Rahul Garg

"pragma parallel for"
for i in xrange(m):
    # parallel body goes here

The upside is that its easy to understand and use particularly for people familiar with OpenMP. Its also easy to extend. It also does not affect semantics when running on interpreter.

The downside is that representing annotations as strings does not look very Pythonic and follows its own mini-grammar.

Proposal 2:

#pragma: parallel for
for i in xrange(m):
    # parallel body follows

Easy to understand, use, extend.

Not pythonic. Annotations should not be comments.

Proposal 3:

Adapted from Ipython1 and with nogil from Cython.

with parallel():
    for i in xrange(m):
        # parallel body follows

Can also use nogil()?

Easy to understand. Can be extended by introducing keyword arguments to parallel().

Downside is that it requires an implementation of parallel() which will return an object with empty __enter__, __exit__ methods. It also adds one level of nesting. If you want to parallelize n loop nests, you end up 2n the indentation.

Proposal 4:

Due to Rahul Garg

for i in prange(i):
    # parallel body goes here.

Easy to use and understand. Extensible through keyword arguments. Implementation required. Will work only with xrange() style of loops and no consideration given to other iterators.

Note from Rahul : prange will be implemented in unPython.

Proposal 5:

Due to Stefan Behnel.

with thread_each(iterator,threadlocal=...):
    # parallel body

It only requires one level of nesting and makes it clear that what is supposed to happen is not a sequential loop but a parallel operation on the code block.

The downside is that this would be hard to support in CPython in a serialised form.

Proposal 6:

Due to Rahul Garg

What about mergin the proposals 4 and 5 to:

for i in thread_each(iterator,threadlocal=...):
    # parallel body goes here

Compared to proposal 5, this makes it less obvious that things happen in parallel. The thing that is iterated over influences the way the loop is executed, i.e. it changes the semantics of the "for" keyword.

Proposal 7

Use decorated inline functions.

cdef inline doit(item) nogil:
    # parallel body goes here

The main advantage is that this would work without special syntax support and still makes it clear what happens.

The downside of this is that calling a function requires passing all required state into that function. This disadvantage is alleviated by the introduction of closures, however, this would result in additional overhead for the code execution.

Something went wrong with that request. Please try again.