# Exercises 1.1 - Use of register

- Create a vector X of N random numbers, where N is in the order of 1e6 to 1e8 (depending on the speed of your computer).
- Create the following implementations to calculate the difference between the consecutive elements in X: (resulting in a vector Y with N-1 elements)
  1. Use a regular for loop and calculate the difference as Y(i) = X(i+1) - X(i), where X and Y are implemented as python lists.
  2. Extend the above program with intermediate variables (e.g. x_next and x_now) to store the X(i+1) value for the next iteration.
  3. Same as 1, but store X and Y as numpy arrays.
  4. Same as 2, but store X and Y as numpy arrays.
  5. Use a diff-function to compute the result thereby exploiting vector computation (wide registers) - in Python this function is "numpy.diff". Remember to include "import numpy".
- Measure the execution time of all implementations and explain the difference in performance.


In [1]:
# Importing libraries
import numpy as np
import random as rnd
import time

In [2]:
# Creating different implementations
def simple_indexing(x,y,n):
    t1 = time.time()
    for i in range(n-1):
        y[i] = x[i+1] - x[i]
    t2 = time.time()
    
    return t2-t1
    
def intermediate_vars(x,y,n):
    t1 = time.time()
    x_now = x[0]
    for i in range(n-1):
        x_next = x[i+1]
        y[i] = x_next - x_now
        x_now = x_next
    t2 = time.time()
    
    return t2-t1

def np_function(x):
    t1 = time.time()
    np.diff(x)
    t2 = time.time()
    
    return t2-t1
    
# Random number generator
def random_number(N):
    return [rnd.random() for _ in range(N)]

In [3]:
# Setting up variables
N = int(1e6)
X = random_number(N)
Y = [0] * N

print("Python lists:       ",simple_indexing(X,Y,N))
print("Python intermediate:",intermediate_vars(X,Y,N))

X = np.array(X)
Y = np.zeros(N)
print("Numpy lists:        ",simple_indexing(X,Y,N))
print("Numpy intermediate: ",intermediate_vars(X,Y,N))
print("Numpy diff function:",np_function(X))

Python lists:        0.07985734939575195
Python intermediate: 0.08084487915039062
Numpy lists:         0.29164648056030273
Numpy intermediate:  0.21327924728393555
Numpy diff function: 0.0014886856079101562


Results indicate that in the case of python lists, calculating the difference between two elements is (generally) faster by simple indexing, than assigning intermediate variables.

As for numpy arrays, the execution times shows that calculating the difference between two elements is a lot slower than using python lists. However, a slight speedup is seen when using intermediate variables.

Using the built-in numpy function (np.diff()) which uses vectorized operations is a lot faster than traditional python lists or python arrays.


# Exercises 1.2 - Memory organization - C vs Fortran

##### Part A - Theoretical

- We have 6 elements stored contiguous in memory in the order: 1, 2, 3, 4, 5, 6. In the following, we read this contiguous data into arrays in different ways. What do the arrays look like if we read the data as:

  1. a 2x3 matrix treating data as column-major (Fortran style) as F2x3?
  2. a 3x2 matrix treating data as column-major (Fortran style) as F3x2?
  3. a 2x3 matrix treating data as row-major (C style) as C2x3?
  4. a 3x2 matrix treating data as row-major (C style) as C3x2?

- Explain the relations between the different matrices and how this may be utilized.


F2x3 (column-major, 2x3):
| 1 | 3 | 5 |
|---|---|---|
| 2 | 4 | 6 |

F3x2 (column-major, 3x2):
| 1 | 4 |
|---|---|
| 2 | 5 |
| 3 | 6 |

C2x3 (row-major, 2x3):
| 1 | 2 | 3 |
|---|---|---|
| 4 | 5 | 6 |

C3x2 (row-major, 3x2):
| 1 | 2 |
|---|---|
| 3 | 4 |
| 5 | 6 |

F3x2 column-major and C2x3 row-major both allows for the memory to be contiguous, and is a benefit when doing column/row-wise matrix operations respectively.


##### Part B - Practical

- Generate a random vector X with dimension N x M and another vector Y with opposite dimensions M x N, where N >> M, e.g. N = 100.000, M = 100.
- Make a program with two functions: one that loops over each row and calculates the row-sum (using numpy.sum()) and one that does the same, but loops over each column.
- Measure execution speed for each orientation for each for the two vectors.
- Do these results match your expectation given the memory layout difference between Fortran (Matlab) and C (Python)?
  - In Python: if this was implemented with a 2D list, you will probably not see a big difference. Why not?
- Extra info: In Python Numpy you can specify the memory layout for an array explicitly using the keyword order=‘C’ or order=‘F’.


In [4]:
def row_wise():
    return


def column_wise():
    return


def np_random_number(n, m):
    return np.random.random((n, m))

In [5]:
# Setting up variables
N = 100_000
M = 100

X = np_random_number(N,M)
Y = np_random_number(N,M)

print("X row-wise:   ")
print("X column-wise:")
print("Y row-wise:   ")
print("Y column-wise:")

X row-wise:   
X column-wise:
Y row-wise:   
Y column-wise:
