# WEEK 03 - Procedural Programming 2

## Learning objectives

- Getting started with loops and arrays
- Consolidating the basics
- Learning to use variables and control statements in more intricate combinations
- Getting started with more complex tasks that involve multifaceted logic
- Understanding scheduling and the different methods for distributing tasks
- Understanding the basic implementation of vector-matrix multiplication

## Overview
In this notebook, we are going to get started with programming more complicated tasks, which resemble more closely real-world problems. We are going to need control statements such as loops and conditional statements and data structures such as arrays for these tasks. As working examples, we are going to employ two problems which are relevant to various scientific fields in their simplified form, namely the scheduling problem and matrix-vector multiplication. Note that, as always, a deep understanding of the underlying problem is only optional, and each task can be regarded purely as an example.

## Parallel computing

You may have heard about parallel computing in popular discord or you may have attended the talk by yours truly during the opening ceremony when I discussed, albeit very briefly, the need for and ubiquitous presence of parallel computing in the sciences. Either way, here's the elevator pitch: Computing power is tightly tied to the number of operations that can be done per unit time intuitively enough, which in turn is directly proportional to the number of cycles a processing unit can run per second. This number is the frequency of the processor. When you see a CPU advertised to run at 2.4 GHz, it means it runs $2.4 \times 10^{9} \frac{\text{cycles}}{s}$. It turns out that the achievable frequency is limited mainly by two factors, namely the size of the semiconductors on the chip and the generated heat, both physical constraints, whose limits have been approached in the last decades through the steady advancements in the microprocessor industry. What this means on the ground is that computing units cannot operate at much higher frequencies than those already achieved, at least according to our current understanding of these physical constraints, and you may have noticed that the frequency of the latest iteration of microprocessors have not dramatically changed over the previous years.

So how do we achieve more computing capacity for our evermore power-hungry applications? The short answer is **parallel computing**. The main idea is to increase the number of processing units instead of increasing the base frequency of each processing unit. This idea is sometimes also referred to as *horizontal scaling*. Again, you may have noticed that the latest chips boast a higher *core* count compared to their previous-generation counterparts.

You may wonder if we are going to do any parallel computing here. Well, not really :( It would unfortunately be too intricate to start with that at this stage. Why am I telling you all this then? Although we won't be doing any parallel computing per se, we can address a standing problem in parallel computing! Namely **scheduling**.

## Scheduling

Scheduling can broadly be defined as the problem of distributing a number of tasks between a number of workers, and is encountered in many areas, including but not limited to parallel computing.

The task is relatively straightforward, nevertheless, carries tremendous weight, as the success and performance of the entire procedure hinges upon it.

There are numerous scheduling solutions, each with their own advantages and disadvantages. Here, we are going to limit ourselves to the most basic ones, namely static and dynamic scheduling, as described in the following.

Your task revolves around implementing code that mimics the behavior of the scheduler in the parallel programming API. Let's get started!

### Static Scheduling

Static scheduling is possibly the most straightforward algorithm for distributing a given number of tasks between a given number of processes. Given $n$ tasks and $m$ processes, static scheduling, in its simplest form, starts by assigning tasks $t_{i}, i = 0, 1, \dots, n-1$ to processes $p_{i}, i = 0, 1, \dots, m-1$ as evenly as possible, as shown below, given three participating processes.

<center><img src="figures/static_scheduling.png"/></center>

In other words, the scheduler starts assigning the first batch of tasks to the first process, the second batch of tasks to the second process, and so on, seeking to keep the batch size constant. It is important to note that the number of tasks $n$ must not necessarily be divisible by the number of processes $p$, i.e., there may be a remainder that is typically either assigned to the first or last process.

Your first task is to write a program that mimics the behavior of the static scheduler. We are going to make some simplifications here. In this first iteration, it suffices to print to the standard output that process $i$ is going to run task $j$ whenever it is assigned to it, e.g., "Tasks j is assigned to Process i." It is assumed that a variable number of processes and tasks are given. You can assign the remainder of tasks to the last process in addition to its own share.

**Hints**:

- Make sure that your program produces the expected behavior described above in terms of the distribution. Namely, every process should receive the same number of tasks with the exception of the last process, if necessary
- You can use the indices $i$ and $j$ to identify processes and tasks and do not have to store them in an array
- Think carefully about what you should loop over. Ask yourselves if you should loop over the tasks, the processes, or a combination thereof.

In [90]:
"""
Your task is to write a static scheduler according to the description above
""";

# Number of tasks
n_tasks = 20
# Number of processes
n_procs = 6

# Code here
print("#---VERSION 1---#")
# Seperation of the reminder and the evenly distributed processor
dividable_tasks = n_tasks - n_tasks % n_procs
group = n_tasks // n_procs
# Starts with indices from 1
i = 1
# Loop over the whole tasks
for j in range(n_tasks):
    if j < dividable_tasks:
        print(f"Task {j+1} is assigned to Process {i}") #print with +1 because python always starts from 0
        i += 1
        if i == n_procs + 1:
            i = 1 # loop back to the first processor
    else:
        # Put the reminder tasks on the last processor
        print(f"Task {j+1} is assigned to Process {n_procs}")
print("\n")
print("#---VERSION 2---#")
# Loop over the whole tasks
for j in range(n_tasks):
    if j < dividable_tasks:
        print(f"Task {j+1} is assigned to Process {i}") #print with +1 because python always starts from 0
        if j == group*i - 1: # Move to the next processor if its over the max number of processor
            i += 1
        else: # Continue the iteration if the task still fit the group
            continue
    else:
        # Put the reminder tasks on the last processor
        print(f"Task {j+1} is assigned to Process {n_procs}")


#---VERSION 1---#
Task 1 is assigned to Process 1
Task 2 is assigned to Process 2
Task 3 is assigned to Process 3
Task 4 is assigned to Process 4
Task 5 is assigned to Process 5
Task 6 is assigned to Process 6
Task 7 is assigned to Process 1
Task 8 is assigned to Process 2
Task 9 is assigned to Process 3
Task 10 is assigned to Process 4
Task 11 is assigned to Process 5
Task 12 is assigned to Process 6
Task 13 is assigned to Process 1
Task 14 is assigned to Process 2
Task 15 is assigned to Process 3
Task 16 is assigned to Process 4
Task 17 is assigned to Process 5
Task 18 is assigned to Process 6
Task 19 is assigned to Process 6
Task 20 is assigned to Process 6


#---VERSION 2---#
Task 1 is assigned to Process 1
Task 2 is assigned to Process 1
Task 3 is assigned to Process 1
Task 4 is assigned to Process 2
Task 5 is assigned to Process 2
Task 6 is assigned to Process 2
Task 7 is assigned to Process 3
Task 8 is assigned to Process 3
Task 9 is assigned to Process 3
Task 10 is assigned to 

Make sure that the code is following the expected distribution as illustrated in the figure above.

Now start playing with the number of tasks and processes. Does your code still produce the correct results? In particular, try corner cases such as when $n < m$.

### Static scheduling with chunks

A first evolution of the most basic static scheduler above is the addition of chunks. It simply means that instead of dividing the tasks into evenly sized chunks, a given chunk size is used and the scheduler distributes the tasks to the available processes in a round-robin fashion, as show below, given three participating processes, 10 tasks and a chunk size of two. If the number of tasks exceeds the number of available processes, the scheduler starts from the first process again when it reaches the last process.

<center><img src="figures/static_scheduling_chunk.png"/></center>

Write an application that, in addition to the number of tasks and processes, as above, receives also an arbitrary chunk size and performs static scheduling with the given chunk size as described.

**Hints**:

- Think carefully about what you have to loop over to arrive at an elegant solution. In order to answer this question, you might want to think about the difference between static scheduling and static scheduling with chunks
- Think about [modular arithmetic](https://en.wikipedia.org/wiki/Modular_arithmetic), in particular the modulus operator (%), and how it could help determine the process responsible for a given task, or rather a given chunk of tasks

In [195]:
"""
Your task is to write a static scheduler with chunks according to the description above
""";

# Number of tasks
n_tasks = 21
# Number of processes
n_procs = 12
# chunk size
chunk_size = 4

# Code here

# Separation of the dividable tasks with the number of processor and remainder tasks
rest_tasks = n_tasks - n_tasks % n_procs

# Starts with indices from 1
i = 1
print("---VERSION 1---")
# Loop over the whole tasks
for j in range(n_tasks):
    if i <= n_procs:
        print(f"Task {j+1} is assigned to Process {i}")
        if j == chunk_size*i - 1:
            i += 1
    elif i > n_procs:
        #print(f"Testing: {j%n_procs+1}")
        print(f"Task {j+1} is assigned to Process {i-n_procs}")
        if j == chunk_size*i - 1:
            i += 1

print("\n")
print("---VERSION 2---")
i = 1
for j in range(n_tasks):
    i = (j // chunk_size) % n_procs
    print(f"Task {j+1} is assigned to Process {i+1}")

---VERSION 1---
Task 1 is assigned to Process 1
Task 2 is assigned to Process 1
Task 3 is assigned to Process 1
Task 4 is assigned to Process 1
Task 5 is assigned to Process 2
Task 6 is assigned to Process 2
Task 7 is assigned to Process 2
Task 8 is assigned to Process 2
Task 9 is assigned to Process 3
Task 10 is assigned to Process 3
Task 11 is assigned to Process 3
Task 12 is assigned to Process 3
Task 13 is assigned to Process 4
Task 14 is assigned to Process 4
Task 15 is assigned to Process 4
Task 16 is assigned to Process 4
Task 17 is assigned to Process 5
Task 18 is assigned to Process 5
Task 19 is assigned to Process 5
Task 20 is assigned to Process 5
Task 21 is assigned to Process 6


---VERSION 2---
Task 1 is assigned to Process 1
Task 2 is assigned to Process 1
Task 3 is assigned to Process 1
Task 4 is assigned to Process 1
Task 5 is assigned to Process 2
Task 6 is assigned to Process 2
Task 7 is assigned to Process 2
Task 8 is assigned to Process 2
Task 9 is assigned to Proc

Change the number of tasks and processes and the chunk size and make sure that your program produces the correct results, especially for corner cases such as when the chunk size exceeds the number of tasks, etc.

## Vector-matrix multiplication

We now turn our attention to another ubiquitous operation in all of scientific computing: linear algebra, and in particular, matrix-vector multiplication. You'd be surprised how often this ostensibly simple operation emerges in various scientific applications.

In its core, matrix-vector multiplication is relatively simple. Given an $n \times m$ matrix $\mathbf{A}$ and an $m$-dimensional vector $x$,

\begin{equation}
    b = \mathbf{A} x, \quad b_{i} := \sum_{j=1}^{m} a_{i,j} b{j} \quad \forall \quad i=1,\dots,n.
\end{equation}

It should of course be noted that this rather simple-looking operation has no shortage of intricacies and careers have been built on not much more than the humble matrix-vector multiplication. Nevertheless, we are going to concern ourselves with but the simplest setup in the following.

Desired is to write an application that computes the resultant vector $b = \mathbf{A} x$, given the Matrix $\mathbf{A}$ and vector $x$.

In [294]:
"""
Your task is to write a program that computes the resultant vector b = Ax, given a Matrix A and a vector b.
""";

# Matrix A
A = [
     [1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]
# Vector x
x = [1, 1, 1]

# Code here
# Manual coding
b1 = A[0][0] * x[0] + A[0][1] * x[1] + A[0][2] * x[2]
b2 = A[1][0] * x[0] + A[1][1] * x[1] + A[1][2] * x[2]
b3 = A[2][0] * x[0] + A[2][1] * x[1] + A[2][2] * x[2]
print([b1, b2, b3])

B = []
# Using for loop
for i in range(3):
    b = 0
    for j in range(3):
        b += A[i][j] * x[j]
        print("Inner Loop")
        print(b)
    print("outer loop")
    print(b)
    B.append(b)

print(B)

[6, 15, 24]
Inner Loop
1
Inner Loop
3
Inner Loop
6
outer loop
6
Inner Loop
4
Inner Loop
9
Inner Loop
15
outer loop
15
Inner Loop
7
Inner Loop
15
Inner Loop
24
outer loop
24
[6, 15, 24]


Now let us generalize the task by taking the number of rows $n$ and number of columns $m$ from the user, initializing an $n \times m$ matrix $\mathbf{A}$ and an $m$-dimensional vector $x$ with random values, and computing the resultant vector $b = \mathbf{A}x$.

In [9]:
"""
Your task is to write a program that 
    - receives n and m from the user
    - initializes an nxm matrix A and an m-dimensional vector x with random values
    - computes the resultant vector b = Ax.
""";

import random

# random.random() returns a random float between 0. and 1.

# Receive the number of rows and columns from the user

# Code here

## Homework

### Dynamic scheduling

In certain scenarios, e.g., when the computational intensity of tasks are inhomogeneous, static scheduling fails to provide the most efficient use of resources in general---Try to validate this for yourself!

In such cases, dynamic scheduling provides a more suitable scheduling technique, in which each worker is given one, or more generally, a chunk of task and the scheduler assigns the next task(s) to the first worker that becomes available.

Since the implementation of a dynamic scheduler, even an application that mimics its behavior requires some background functionalities that fall out of the scope of this exercise, I am going to set up some of it in the cell below; therefore, you can run it and safely ignore its content. Note that the number of tasks, number of processes and the chunk size are nevertheless defined here.

In addition, we are going to assume that the chunk size is one in order to make things even simpler.

In [2]:
"""
The following sets up some stuff for the dynamic scheduler that is out of the scope of this exercise.
You can safely run the cell and ignore its content.

""";

from multiprocessing import Process
from random import randrange
import time

# Number of tasks
n_tasks = 21
# Number of processes
n_procs = 12

tasks = []
for i_task in range(0, n_tasks):
    tasks.append(randrange(1, 10))

procs = []
for i_proc in range(0, n_procs):
    procs.append(Process())
    procs[i_proc].start()
    procs[i_proc].join()

def work(task):
    time.sleep(task)

def is_free(i_proc):
    return not procs[i_proc].is_alive()

def submit(i_proc, task):
    if (not procs[i_proc].is_alive()):
        procs[i_proc].join()
        procs[i_proc] = Process(target = work, args = (task,))
        procs[i_proc].start()
    else:
        raise RuntimeError("Process is not free")

With the help of the infrastructure above, you can implement the dynamic scheduler. The code above already defines a list of tasks and processes.

**Hints**:

- The dynamic scheduler should have a loop that runs for as long as there are tasks to be assigned
- Each process should be queried for availability before attempting to assign a task to it
- The current task should be assigned to the first process that is available
- Note that it is not important how the infrastructure above is implemented
- Make sure that you read the notes below on the technical usage of the infrastructure

In [3]:
"""
Your task is to write a dynamic scheduler with chunks according to the description above.

Here's some information about what is available from the code above:
    - The processes are stored in the list "procs"
    - The tasks are stored in the list "tasks"
    - is_free(i) returns True if Process i is available
    - submit(i, tasks[j]) submits Task j to Process i. You will receive an error if you submit a task to a process that
      is not free
    - Task j takes approximately tasks[j] seconds to complete
""";

# The current task to be assigned
i_task = 0
# Loop as long as there are tasks to be assigned
    # Loop through the processes
        # Check if the process is free
            # Submit task to the process if it is free

print ("\nAll tasks finished.")


Submitting Task 0 (approximately 3 seconds) is assigned to Process 0

Submitting Task 1 (approximately 7 seconds) is assigned to Process 1

Submitting Task 2 (approximately 4 seconds) is assigned to Process 2

Submitting Task 3 (approximately 1 seconds) is assigned to Process 3

Submitting Task 4 (approximately 7 seconds) is assigned to Process 4

Submitting Task 5 (approximately 5 seconds) is assigned to Process 5

Submitting Task 6 (approximately 7 seconds) is assigned to Process 6

Submitting Task 7 (approximately 1 seconds) is assigned to Process 7

Submitting Task 8 (approximately 2 seconds) is assigned to Process 8

Submitting Task 9 (approximately 1 seconds) is assigned to Process 9

Submitting Task 10 (approximately 2 seconds) is assigned to Process 10

Submitting Task 11 (approximately 8 seconds) is assigned to Process 11

Submitting Task 12 (approximately 4 seconds) is assigned to Process 3

Submitting Task 13 (approximately 8 seconds) is assigned to Process 7

Submitting Ta

Run the program multiple times. You should be able to observe that the order of execution does not remain constant as it depends on the random workload assigned to each process.

Change the number of tasks and processes in the previous cell and run the scheduler again. Does it produce the correct results?

How about when `n_tasks` < `n_procs`?

### Putting everything together

The more perceptive amongst you may already have anticipated what we are going to do next. To recap, we have created a couple of programs that target task scheduling. It was not important *how* the tasks were executed; however, we saw that, at least in principle, we could schedule tasks to be executed in a few different ways. On the other hand, we tackled, arguably, one of the most important operations in scientific computing, namely matrix-vector multiplication. We can now use the matrix-vector multiplication as the task for dynamic scheduling. In order to simplify the task even further, let us stay with square $n \times n$ matrices. Your task is to write a program that generates $m$ random positive integers up to a maximum upper limit, where each number represents a matrix-vector multiplication task. More specifically, task $t_{i}, i = 0, \dots, m - 1$ represents the computation of $b = \mathbf{A}x$, where $\mathbf{A}$ is a randomly initialized matrix of size $t_{i} \times t_{i}$ and $x$ is a randomly initialized vector of size $t_{i}$.

In [3]:
"""
Your task is to write a program that

    - generates a given number of tasks, where each task represents an nxn matrix-vector multiplication,
      where n is a randomly generated positive integer smaller than a given upper bound,
    - for each task, initialized a matrix A and a vector x to the correct size and random initial values and
    - performs the matrix-vector multiplication b = Ax
""";

from multiprocessing import Process
import random
from random import randrange
import time

# Number of tasks
n_tasks = 100
# Maximum task size
max_size = 10000

# Generate tasks
tasks = []
# Code here
# randrange(1, n) returns a random integer between 1 and n

def work(n):
    # Code here
    # random.random() returns a random float between 0. and 1.
    
    # Initialize Matrix A

    # Initialize Vector x

    # Initialize Vector b

    # Compute b = Ax

# The dynamic scheduler
# The scheduler now automatically submits the work defined above
# The code remains the same as the exercise above; therefore, it can be reused
# Code here
# The current task to be assigned
i_task = 0
# Loop as long as there are tasks to be assigned
    # Loop through the processes
        # Check if the process is free
            # Submit task to the process if it is free

print ("\nAll tasks finished.")

10

Submitting Task 0 (size = 9382) to Process 0

Submitting Task 1 (size = 387) to Process 1

Submitting Task 2 (size = 1254) to Process 2

Submitting Task 3 (size = 823) to Process 3

Submitting Task 4 (size = 1012) to Process 4

Submitting Task 5 (size = 3152) to Process 5

Submitting Task 6 (size = 5043) to Process 6

Submitting Task 7 (size = 3645) to Process 7

Submitting Task 8 (size = 3610) to Process 8

Submitting Task 9 (size = 3811) to Process 9

All tasks finished.


Copyright 2024 &copy; Manuel Saberi, High Performance Computing, Ruhr University Bochum. All rights reserved. No part of this notebook may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher.