An introduction to high-performance Python using Jupyter
A brief guide to parallel programming using Python and the Jupyter Notebook.
The main aim is to equip the researcher who already has some knowledge of the Python scientific computing stack with an understanding of relevant conceptual approaches to parallel programming and of practical approaches to realising those concepts using popular Python packages.
A secondary aim is to demonstrate the potential of JupyterHub on Grid Engine computer clusters as a high-performance Python programming environment for those with limited knowledge of Linux and the Unix shell. JupyterHub has been deployed on the University of Sheffield's ShARC cluster using funding from the OpenDreamKit project (see Acknowledgements). This teaching material was designed to be used on ShARC but is also relevant to other environments where Jupyter and sufficient hardware resources are available.
- Understanding of why parallelisation is of increasing importance given the death of Moore's Law
- Understanding of the different types of parallelism and their merits
- Basic understanding of theoretical speedups and Amdahl's Law
- Understaning of communication vs computation costs
- Ability to identify and use libraries that can distribute non-Python work between threads
- Ability to distribute Python work between processes using multiprocessing
- Parallelisation using packages that support multithreading
- Parallelising your own code using multiple Python processes on a single machine
- Wilkinson, B. and Allen, M. (1999). Parallel programming: techniques and applications using networked workstations and parallel computers. Prentice Hall, Upper Saddle River, N.J. ISBN: 0-13-671710-1
- Gorelick, M. and Ozsvald, I. (2014). High performance Python, First edition. ed. O’Reilly, Sebastopol, CA. ISBN: 978-1-4493-6159-4
- Using JupyterHub on the University of Sheffield's ShARC cluster