## Exercise 1: Use of registers

**Create a vector X of N random numbers, where N is in the order of 1e6 to 1e8 (depending on the speed of your computer).**

**Create the following implementations to calculate the difference between the consecutive elements in X: (resulting in a vector Y with N-1 elements)**
1. Use a regular for loop and calculate the difference as Y(i) = X(i+1) - X(i), where X and Y are implemented as Python lists.
2. Extend the above program with intermediate variables (e.g. x_next and x_now) to store the X(i+1) value for the next iteration.
3. Same as 1, but store X and Y as NumPy arrays.
4. Same as 2, but store X and Y as NumPy arrays.
5. Use a "diff" function to compute the result thereby exploiting vector computation (wide registers) - in Python this function is "numpy.diff". Remember to include "import numpy".

**Measure the execution time of all implementations and explain the difference in performance.**

In [1]:
import numpy as np
import random as rnd
from timeit import default_timer as timer

In [2]:
N = 10000000
X = [rnd.randint(1, 3) for i in range(N)]
X_np = np.random.randint(1, 3, N)

In [3]:
# direct calculation
Y = []
start_time = timer()
for i in range(1, N):
    Y.append(X[i] - X[i-1])
end_time = timer()
print(f"Elapsed time: {end_time - start_time}")

Elapsed time: 0.9888120129980962


In [4]:
# intermediate variables
Y = []
start_time = timer()
for i in range(1, N):
    x_current = X[i]
    x_prev = X[i-1]
    Y.append(x_current - x_prev)
end_time = timer()
print(f"Elapsed time: {end_time - start_time}")

Elapsed time: 1.3310955070010095


In [5]:
# as numpy arrays
Y = np.zeros(N) # preallocate memory
start_time = timer()
for i in range(1, N):
    Y[i-1] = X[i] - X[i-1]
end_time = timer()
print(f"Elapsed time: {end_time - start_time}")

Elapsed time: 1.5559040719999757


In [6]:
# as numpy array and intermediate variables
Y = np.zeros(N)
start_time = timer()
for i in range(1, N):
    x_current = X[i]
    x_prev = X[i-1]
    Y[i-1] = x_current - x_prev
end_time = timer()
print(f"Elapsed time: {end_time - start_time}")

Elapsed time: 1.9672196399988024


In [7]:
# using np.diff
start_time = timer()
np.diff(X)
end_time = timer()
print(f"Elapsed time: {end_time - start_time}")

Elapsed time: 0.33333928600040963


## Memory Organization - C vs Fortrand

**We have 6 elements stored contiguous in memory in the order: 1, 2, 3, 4, 5, 6. In the following, we read this contiguous data into arrays in different ways. What do the arrays look like if we read the data as:**

1. a 2x3 matrix treating data as column-major (Fortran style) as F2x3?

\begin{matrix}
1 & 3 & 5\\
2 & 4 & 6
\end{matrix}

2. a 3x2 matrix treating data as column-major (Fortran style) as F3x2?

\begin{matrix}
1 & 4\\
2 & 5\\
3 & 6
\end{matrix}

3. a 2x3 matrix treating data as row-major (C style) as C2x3?

\begin{matrix}
1 & 2 & 3\\
4 & 5 & 6
\end{matrix}

4. a 3x2 matrix treating data as row-major (C style) as C3x2?

\begin{matrix}
1 & 2\\
3 & 4\\
5 & 6
\end{matrix}

**Explain the relations between the different matrices and how this may be utilized.**

Depending on if we read these matrices in Fortran or C style, they will have different representations in memory. Ideally we want the representation that fits the order we need them best, i.e. seqentially.

In [8]:
N = 1000000
M = 100
X = np.random.randint(1, 3, (N, M))
Y = np.random.randint(1, 3, (M, N))

In [9]:
# sums for X
start_time = timer()
X_row_sum = np.sum(X, axis=1)
end_time = timer()
print(f"Elapsed time for rows: {end_time - start_time}, summed: {len(X_row_sum)} rows")

start_time = timer()
X_col_sum = np.sum(X, axis=0)
end_time = timer()
print(f"Elapsed time for cols: {end_time - start_time}, summed: {len(X_col_sum)} cols")

Elapsed time for rows: 0.05388033000053838, summed: 1000000 rows
Elapsed time for cols: 0.04886839500250062, summed: 100 cols


In [10]:
# sums for Y
start_time = timer()
Y_row_sum = np.sum(Y, axis=1)
end_time = timer()
print(f"Elapsed time for rows: {end_time - start_time}, summed: {len(Y_row_sum)} rows")

start_time = timer()
Y_col_sum = np.sum(Y, axis=0)
end_time = timer()
print(f"Elapsed time for cols: {end_time - start_time}, summed: {len(Y_col_sum)} cols")

Elapsed time for rows: 0.04925018100038869, summed: 100 rows
Elapsed time for cols: 0.08068446600009338, summed: 1000000 cols
