Programming Massively Parallel Hardware (PMPH), Block 1 2020

Course Structure

PMPH is structured to have four hours of virtual lectures and four hours of mixed (physical + virtual) labs per week; potentially we will have no lectures in the last few weeks of the course, so you can concentrate on project work (to be announced).

Course Catalog Web Page

Lectures (zoom links will be posted on Absalon):

Tuesday 10:15 - 12:00
Thursday 10:15 - 12:00

Labs:

Thursday 13:00 - 17:00 (or later if students ask for it)

We have not been yet assigned classes, but the plan is that half of the enrolled students (say according to alphabetical ordering) may physically attend the lab from 13:00 - 15:00 and the other half attends virtually on zoom, and then we switched roles for the session between 15:00 - 17:00. This way, each student is guaranteed to physically attend two hours of lab per week, if she/he wishes to attend (of course).

If you like the idea of attending the labs physically, then I suggest you come to DIKU for the other lab session as well (the one in which you are only guaranteed virtual presence by default), because I will allow such students to join the lab until the corona class capacity is reached (using a first-come first-serve policy). This is because typically some students absentee the physical lab sessions, so there might be space for the others. If we are out of space you can find a quit place to attend by zoom via edurom.

Evaluation

Throughout the course, you will hand in four weekly assignments, which will count for 40% of the final grade. In the last month of the course, you will work on a group project (up to three students per group), and will submit the report and accompanying code. The group project will be presented orally at the exam together with the answers to some individual questions, and this will count for 60% of your final grade.

The "weekly-assignments" (W-assignments) are tentatively planned to be published in each Thursday of the first four weeks. They have one week editing time. If a serious attempt was made but the solution is not satisfactory (or simply if you want to improve your assignment, hence grade), an updated solution should be resubmitted one week after the time when the assignment was graded. Extensions may be possible, but you will need to agree with the TA responsible for that particular assignment (see below).

For the group project no re-submission is possible; the deadline is October 30th (Friday before the exam week).

The oral examination will be hold in the exam week (Wednesday and Thursday, and Friday if necessary). The final evaluation will take up to 20 minutes per student, but probably the whole group will be examined at a time (unless you wish otherwise).

Weekly and group assignment handin is still on Absalon.

Teacher and Teaching Assistants (TAs)

The main teacher is:

Cosmin Oancea.

Your TA's are:

Anders Holst
Dmitry Serykh

The plan is that the teacher will conduct the lectures and the lab. Anders will be grading weekly 1-3 hand-ins and resubmissions, while Dmitry will be grading weekly 4 hand-ins and resubmissions. Questions about a particular weekly assignment should be directed at the TA responsible for that assignment, or posted on the Absalon discussion forum which the TAs will be patrolling (but other students are of course welcome to pitch in).

Course Tracks and Resources

All lectures and lab sessions will be delivered in English. The assignments and projects will be posted in English, and while you can chose to hand in solutions in either English or Danish, English is preferred. All course material except for the hardware book is distributed via this GitHub page. (Assignment handin is still on Absalon.)

The hardware track of the course covers (lecture) topics related to processor, memory and interconnect design, including cache coherency, which are selected from the book Parallel Computer Organization and Design, by Michel Dubois, Murali Annavaram and Per Stenstrom, ISBN 978-521-88675-8. Cambridge University Press, 2012. The book is available at the local bookstore (biocenter). It is not mandatory to buy it---Cosmin thinks that it is possible to understand the material from the lecture slides, which are detailed enough---but also note that lecture notes are not provided for the hardware track, because of copyright issues.
The software track covers (lecture) topics related to parallel-programming models and recipes to recognize and optimize parallelism and locality of reference. It demonstrates that compiler optimizations are essential to fully utilizing hardware, and that some optimizations can be implemented both in hardware and software, but with different pro and cons. The lecture notes are available here, and additional (facultative) reading material (papers) will be linked with individual lectures; see Course Schedule Section below.
The lab track teaches GPGPU hardware specifics and programming in Futhark, CUDA, and OpenMP. The intent is that the lab track applies in practice some of the parallel programming principles and optimizations techniques discussed in the software tracks. It is also intended to provide help for the weekly assignment, project, etc.

Course Schedule

This course schedule is tentative and will be updated as we go along.

The lab sessions are aimed at providing help for the weeklies and group project. Do not assume you can solve them without attending the lab sessions.

Date	Time	Topic	Material
01/09	10:15-12:00	Intro, Hardware Trends and List Homomorphisms (SFT), Chapters 1 and 2 in Lecture Notes	Sergei Gorlatch, "Systematic Extraction and Implementation of Divide-and-Conquer Parallelism"; Richard S. Bird, "An Introduction to the Theory of Lists"; Jeremy Gibons, "The third homomorphism theorem"
03/09	10:15-12:00	List Homomorphism & Parallel Basic Blocks (SFT), Chapters 2 and 3 in Lecture Notes	Various papers related to flattening, but which are not very accessible to students
03/09	13:00-17:00	Lab: Gentle Intro to CUDA, Futhark programming, First Weekly	Parallel Programming in Futhark, sections 1-4, futhark code for L1
03/09	some time	Assignment 1 handout
08/09	10:15-12:00	Parallel Basic Block & Flattening Nested Parallelism (SFT)	chapers 3 and 4 in Lecture Notes
10/09	10:15-12:00	Flattening Nested Parallelism (SFT)	, chapter 4 in Lecture Notes
10/09	13:00-17:00	Lab: Fun Quiz; Reduce and Scan in Cuda	discussing second weekly, helping with the first
10/09	some time	Assignment 2 handout
15/09	10:15-12:00	In-Order Pipelines (HWD)	Chapter 3 of "Parallel Computer Organization and Design" Book
17/09	10:15-12:00	Optimizing ILP, VLIW Architectures (SFT-HWD)	Chapter 3 of "Parallel Computer Organization and Design" Book
17/09	13:00-17:00	Lab: GPU hardware: three important design choices.	helping with the first two weekly assignments.
17/09		No new weekly assignment this week; the third will be published next week
22/09	10:15-12:00	Finishing VLIW Architectures (SFT-HWD), Dependency Analysis of Imperative Loops	Chapter 3 of "Parallel Computer Organization and Design" Book, Chapter 5 of lecture Notes
24/09	10:15-12:00	Dependency Analysis of Imperative Loops, Case Study: Matrix Multiplication and Transposition	Chapters 5 and 6 of lecture Notes
24/09	13:00-17:00	Lab: Recognizing Scan and Reduce Patterns in Imperative Code	helping with the first two weekly assignments.
24/09	some time	Assignment 3 handout
29/09	10:15-12:00	Memory Hierarchy, Bus-Based Coherency Protocols (HWD)	Chapter 4 and 5 of "Parallel Computer Organization and Design" Book
01/10	10:15-12:00	Bus-Based Coherency Protocols (HWD) (HWD)	Chapters 5 and 6 of "Parallel Computer Organization and Design" Book
01/10	13:00-17:00	Lab: Presenting Possible Group Project	helping with two weekly assignments.
06/10	10:15-12:00	Scalable Coherence Protocols, Scalable Interconect (HWD)	Chapters 5 and 6 of "Parallel Computer Organization and Design" Book
08/10	10:15-12:00	Scalable Interconect (HWD)	Chapter 6 of "Parallel Computer Organization and Design" Book, Help for the fourth weekly assignment
08/10	13:00-17:00	Lab: Working on the 4th Weekly Assignment	helping project and anything else.
13/10	10:15-12:00	Autumn break (no lecture)
15/10	10:15-12:00	Autumn break (no lecture)
15/10	13:00-17:00	Autumn break (no lab unless you ask for it!)
20/10	10:15-12:00	Inspector-Executor Techniques for Locality Optimizations (SFT)	Various scientific papers
22/10	10:15-12:00	Modern CPU Design: Tomasulo Algorithm (HWD)	Chapter 3 of "Parallel Computer Organization and Design" Book
22/10	13:00-17:00	Lab: help with group project
27/10	10:15-12:00	No lecture, BUT will help with group project and weeklies
29/10	10:15-12:00	No Lecture
29/10	13:00-17:00	Lab: help with group project and weeklies

Weekly assignments

The weekly assignments are mandatory, must be solved individually, and make up 40% of your final grade. Submission is on Absalon.

You will receive feedback a week after the handin deadline (at the latest). You then have another week to prepare a resubmission. That is, the resubmission deadline is two weeks after the original handin deadline.

Weekly 1 (due September 10th)

Weekly 2 (due September 17th)

Weekly 3 (due October 1st)

Weekly 4 (due October 13th)

Assignment text

Group project (due Friday October 30th (Friday before the exam week))

Several potential choices for group project may be found in thegroup-projects directory, namely

Single Pass Scan in Cuda (basic block of parallel programming)
Bfast: a landscape change detection algorithm (Remote Sensing)
Local Volatility Calibration (Finance)
Trinomial-Tree Option Pricing (Finance)
HP Implementation for Fusing Tensor Contractions (Deep Learning): read the paper, implement the technique (some initial code is provided), and try to replicate the results of the paper.

You are also free to propose your own project, for example from the machine learning field.

GPU + MultiCore Machines

All students will be provided individual accounts on a multi-core and GPGPU machine that supports multi-core programming via C++/OpenMP and CUDA programming. Login to GPU & 16 multicore machines will become operational after 3rd of September:

You log in by first SSHing to the bastion server ssh-diku-apl.science.ku.dk using your KU license plate (abc123) as the user name, and then SSHing on to one of the GPU machines.

$ ssh -l <ku_id> ssh-diku-apl.science.ku.dk
$ ssh gpu04-diku-apl

(or gpu02-diku-apl or gpu03-diku-apl).

Despite their names, they each have 16 cores with 2-way hyperthreading CPUs and plenty of RAM as well. The GPUs are:

gpu02-diku-apl, gpu03-diku-apl have dual GTX780 Ti GPUs.
gpu04-diku-apl has a GTX 2080 Ti GPU (by far the fastest).

For CUDA to work, you may need to add the following to your $HOME/.bash_profile or $HOME/.bashrc file (on one of the gpu02/4-diku-apl machines):

CUDA_DIR=/usr/local/cuda
export PATH=$CUDA_DIR/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_DIR/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$LD_LIBRARY_PATH:$LIBRARY_PATH
export CPLUS_INCLUDE_PATH=$CUDA_DIR/include:$CPLUS_INCLUDE_PATH
export C_INCLUDE_PATH=$CUDA_DIR/include:$C_INCLUDE_PATH

Other resources

Futhark and CUDA

We will use a basic subset of Futhark during the course. Futhark related documentation can be found at Futhark's webpage, in particular a tutorial and user guide
CUDA C Best Practices Guide you may want to browse through this guide to see what offers. No need to read all of it closely.

Other Related Books

Some of the compiler transformations taught in the software track can be found in this book Optimizing Compilers for Modern Architectures. Randy Allen and Ken Kennedy, Morgan Kaufmann, 2001, but you are not expected to buy it or read for the purpose of PMPH.
Similarly, some course topics are further developed in this book High-Performance Computing Paradigm and Infrastructure, e.g., Chapters 3, 8 and 11, but again, you are not expected to buy it or read for the purpose of PMPH.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
futhark-code		futhark-code
group-projects		group-projects
material		material
slides		slides
weeklies		weeklies
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

futhark-code

futhark-code

group-projects

group-projects

material

material

slides

slides

weeklies

weeklies

README.md

README.md

Repository files navigation

Programming Massively Parallel Hardware (PMPH), Block 1 2020

Course Structure

Lectures (zoom links will be posted on Absalon):

Labs:

Evaluation

Teacher and Teaching Assistants (TAs)

Course Tracks and Resources

Course Schedule

Weekly assignments

Weekly 1 (due September 10th)

Weekly 2 (due September 17th)

Weekly 3 (due October 1st)

Weekly 4 (due October 13th)

Group project (due Friday October 30th (Friday before the exam week))

GPU + MultiCore Machines

Other resources

Futhark and CUDA

Other Related Books

About

Releases

Packages

Contributors 2

Languages

diku-dk/pmph-e2020-pub

Folders and files

Latest commit

History

Repository files navigation

Programming Massively Parallel Hardware (PMPH), Block 1 2020

Course Structure

Lectures (zoom links will be posted on Absalon):

Labs:

Evaluation

Teacher and Teaching Assistants (TAs)

Course Tracks and Resources

Course Schedule

Weekly assignments

Weekly 1 (due September 10th)

Weekly 2 (due September 17th)

Weekly 3 (due October 1st)

Weekly 4 (due October 13th)

Group project (due Friday October 30th (Friday before the exam week))

GPU + MultiCore Machines

Other resources

Futhark and CUDA

Other Related Books

About

Resources

Stars

Watchers

Forks

Languages