# Understanding the `order` Parameter in NumPy Arrays

In this notebook, we will explore the concept of memory ordering in NumPy arrays, specifically focusing on the `order` parameter. Understanding how data is stored in memory is crucial for efficient computations, particularly when working with multi-dimensional arrays or tensors in scientific computing and machine learning.

## Table of Contents
1. What is the `order` Parameter?
2. Row-Major (C-style) vs Column-Major (Fortran-style) Ordering
3. Ordering for multi-dimensional arrays (tensors)
4. Performance Implications of Different Memory Orders
5. Permuting and Reshaping
6. Manipulating Strides for Advanced Array Operations
7. Exercises


## 1. What is the `order` Parameter?

The `order` parameter in NumPy determines how a multi-dimensional array is stored in memory:
- `order='C'` (C-style): Row-major order, where rows are stored one after the other.
- `order='F'` (Fortran-style): Column-major order, where columns are stored one after the other.
- `order='A'`: (Any order) Preserves the original order of the array. Defaults to C-order if the array is neither C- nor F-contiguous.
- `order='K'`: (Keep order) Preserves the order of elements, depending on the memory layout.

Understanding this parameter is essential for optimizing array operations and understanding how data is accessed and manipulated.

## 2. Row-Major (C-style) vs Column-Major (Fortran-style) Ordering

In **row-major (C-style) ordering**, the elements of a row are stored in contiguous memory locations. This is the default ordering for NumPy arrays. In **column-major (Fortran-style) ordering**, the elements of a column are stored in contiguous memory locations. This ordering is commonly used in scientific computing environments like Fortran and MATLAB.

### Row-Major (C-style) Example
Consider a 2D array:
$\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} $

In row-major order, this array is stored in memory as: `[1, 2, 3, 4, 5, 6]`.

In [64]:
import numpy as np

# Default C-style ordering
matrix_c = np.array([[1, 2, 3], [4, 5, 6]], order='C')
print("C-style ordering:", np.ravel(matrix_c, order='K'))

C-style ordering: [1 2 3 4 5 6]



### Column-Major (Fortran-style) Example
The same array in column-major order is stored as: `[1, 4, 2, 5, 3, 6]`.

In [65]:
# Fortran-style ordering
matrix_f = np.array(matrix_c, order='F', copy=True)
print("Fortran-style ordering:", np.ravel(matrix_f, order='K'))

Fortran-style ordering: [1 4 2 5 3 6]


You can see whether an array is C-contiguous or Fortran-contiguous by checking the `flags` attribute:

In [66]:
matrix_c.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [67]:
matrix_f.flags

  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

## 3. Ordering for multi-dimensional arrays (tensors)

We've seen how C and Fortran style ordering works for matrices. In C-style ordering elements are ordered first by the row index, then by column index; whereas in Fortran-style ordering the column index take precedence. Tensors don't just have rows and columns but an arbitrary number of indices, so we need to generalise how elements are ordered. 

### C-style ordering for tensors

In **C-style (row-major) order** tensor elements are ordered first by the first index, then by the second, then the thrid, and so on. In other words, elements are stored in memory such that **the last axis (index) changes fastest**, and the first axis changes slowest.

Consider a 3D array as an example:

$$
A[i, j, k]
$$

In C-style order:
- The last index `k` changes the fastest.
- The second-to-last index `j` changes next.
- The first index `i` changes the slowest.

For example, if the shape of A was `(2, 3, 4)`, the memory layout in C-style order would be:

$$
[A[0, 0, 0], A[0, 0, 1], A[0, 0, 2], A[0, 0, 3], A[0, 1, 0], A[0, 1, 1], ..., A[1, 2, 3]]
$$

Here, the last index `k` changes first, then `j`, and finally `i`.

### **Lexicographical Ordering and C-Style Order**

C-style ordering is equivalent to **lexicographical ordering**. It works as follows:

1. Start with the leftmost index and compare.
2. If the leftmost indices are the same, compare the next index to the right.
3. Continue this process until a difference is found or all indices are compared.

Lexicographical ordering is how words are ordered in the dictionary, compare the first 7 words of the dictionary with the way the array elements are stored:

1. A
2. Aa
3. Aardvark
4. Aardwolf
5. Aasvogel
6. Ab
7. Abaca

### **Strides**

When considering how elements are ordering in NumPy arrays is it important to understand the **strides** attribute. Strides in a NumPy array refer to the number of bytes that need to be jumped in memory to move to the next position along each dimension of the array. An array that is stored in C-contiguous order will have different strides than an array with the same shape stored in F-contiguous (Fortran) order.

In [68]:
# Create a 2D C-style ordered NumPy array
array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("Array:")
print(array)

# Display the strides of the array
print(f"Strides of the array: {array.strides}")
print(f"Data type: {array.dtype}")


Array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Strides of the array: (32, 8)
Data type: int64


### Explanation of Strides Output

The output `(32, 8)` indicates the number of bytes that must be moved in memory to proceed to the next row and the next column, respectively:

- `32` bytes to move to the next row (since a row contains 4 elements, each of 8 bytes since the data type is `int64`)
- `8` bytes to move to the next column (as each element is 8 bytes)

**In a C-contiguous array the strides will always be in decreasing order**

In [69]:
# Create random C-contiguous tensor
array = np.random.rand(3, 5, 2, 4)
print(f"Strides of C-contiguous tensor are in decreaing order: {array.strides}")


Strides of C-contiguous tensor are in decreaing order: (320, 64, 32, 8)




### Fortran-style ordering for tensors

In fortran style ordering the first index changes fastest.

Consider again the 3D array:

$$
A[i, j, k]
$$

In Fortran-style order:
- The first index `i` changes the fastest.
- The second index `j` changes next.
- The last index `k` changes the slowest.

The memory layout in Fortran-style order would be:
$$
[A[0,0,0],A[1,0,0],A[0,1,0],A[1,1,0],A[0,2,0],A[1,2,0],A[0,0,1],A[1,0,1],…,A[1,2,3]]
$$

### **Colexicographical Ordering**

Fortran-style ordering is equivalent to **colexicographical ordering**, the opposite of lexicographical ordering which is less common.

**In a F-contiguous array the strides will always be in increasing order**


In [70]:
# Create random F-contiguous tensor
array = np.array(np.random.rand(3, 5, 2, 4), order='F')
print(f"Strides of F-contiguous tensor are in increasing order: {array.strides}")

Strides of F-contiguous tensor are in increasing order: (8, 24, 120, 240)


## 4. Performance Implications of Different Memory Orders

The choice of memory order can have a significant impact on performance, especially for large arrays. Contiguous access patterns (accessing data in the order it is stored in memory) are generally faster because they take advantage of CPU caching.
Let's measure the performance difference between row-wise and column-wise access for a large 2D array.

In [71]:
# Create a large 2D array with C-ordering
large_array = np.random.rand(1000, 1000)
# Check the ordering
large_array.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [72]:
%%timeit
# Measure column access time
col_sum = np.sum(large_array[:, 100])


2.41 μs ± 221 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [73]:
%%timeit
# Measure row access time
row_sum = np.sum(large_array[100, :])

1.65 μs ± 29.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


## 5. Permuting and Reshaping

In the previous notebook we looked at permuting and reshaping tensors. We saw that `np.transpose` always returns a view of the array, only changing the `shape` and the `strides` attributes, without copying the data. Reshape on other hand, sometimes requires copying data. Now that we understand better how NumPy arrays work we can understand this behaviour.

Let's consider permuting a 3D tensor:

In [74]:
# Create a 3D, C-contiguous tensor
tensor_3d = np.arange(32).reshape(8, 2, 2)
# Swap the first and second axes
permuted_tensor = np.transpose(tensor_3d, (1, 0, 2))
# Now check the strides
print(f"Strides of permuted tensor: {permuted_tensor.strides}")

Strides of permuted tensor: (16, 32, 8)


We can see that the strides of the permuted tensor are not in decreasing or increasing order, meaning that the array is **neither** C-contiguous, nor F-contiguous! We can confirm this by checking the `flags`:

In [75]:
print(permuted_tensor.flags)

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False



Indeed, permuting a tensor in NumPy can sometimes lead to the elements being stored in a non-contiguous way. The reason for this behaviour is performance: while it is usually diserable for the elemenets to be stored contiguously - many operations are more efficient when this is case - depending on what we want to do with the permuted array, we may not care that the storage is non-contiguous. And of course, copying data is very slow so it may not be worthwhile rearranging the elements if the performance gains later on are small or non-existant.

If we want, we can explicitly tell NumPy to make the array contiguous (this will of course involve copying data):

In [76]:
# Make array contiguous
contiguous_permuted_tensor = np.ascontiguousarray(permuted_tensor)
print(contiguous_permuted_tensor.flags)
print(f"Strides are now back in decreasing order: {contiguous_permuted_tensor.strides}")

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Strides are now back in decreasing order: (128, 16, 8)


Other operations like slicing can also lead to arrays stored in a non-contiguous way.

### Reshape

Now, when you reshape an array in NumPy, whether or not a data copy is required depends on how the array is laid out in memory. If the array is stored in a contiguous way (C-style for Fortran-style), then reshaping can be done without copying the data. However, if the array is non-contiguous, reshaping typically requires copying data because the new shape cannot simply reinterpret the existing memory layout.

- Contiguous arrays can be reshaped without copying data because their elements are stored sequentially in memory.
- Non-contiguous arrays (such as slices or transposed arrays) require a data copy when reshaped because the elements are not stored sequentially, and NumPy must rearrange them into a contiguous block to satisfy the new shape.

Let's look at how this works in detail, reshaping the permuted tensor to shape (2, 16):

In [77]:
reshaped_permuted_tensor = np.reshape(permuted_tensor, (2, 16))

Is actually equivalent to:

In [78]:
raveled_permuted_tensor = np.ravel(permuted_tensor, order='C') # A copy is made if permuted_tensor is not C-contiguous
reshaped_permuted_tensor = np.reshape(raveled_permuted_tensor, (2, 16))

## 6. Manipulating Strides for Advanced Array Operations

By directly manipulating strides, we can create powerful views into data without copying it. This can be useful for implementing sliding windows or accessing non-contiguous subsets of data.

### Example: Creating a View with a Different Stride

In [79]:
# Create a 2D NumPy array
array = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("Array:")
print(array)

# Display the strides of the array
print("\nStrides of the array:")
print(array.strides)

# Create a new view of the original array with a stride of 8 (1 byte) along the first axis
new_strided_view = np.lib.stride_tricks.as_strided(array, shape=(9, 4), strides=(8, 8))
print("\nNew strided view:")
print(new_strided_view)

# Display the strides of the new strided view
print("\nStrides of the new strided view:")
print(new_strided_view.strides)

Array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Strides of the array:
(32, 8)

New strided view:
[[ 1  2  3  4]
 [ 2  3  4  5]
 [ 3  4  5  6]
 [ 4  5  6  7]
 [ 5  6  7  8]
 [ 6  7  8  9]
 [ 7  8  9 10]
 [ 8  9 10 11]
 [ 9 10 11 12]]

Strides of the new strided view:
(8, 8)


### Observation

The new view with altered strides `(8, 8)` allows us to traverse the array in a non-standard way, effectively accessing overlapping elements or applying a sliding window operation.

## 7. Exercises

1. Create a 3D NumPy array with shape `(3, 4, 5)` and visualize its memory layout in both row-major and column-major order.
2. Measure the performance difference between row-wise and column-wise access for a 3D array with shape `(100, 100, 100)` in different memory orders.
3. Revisit the exercises at the end of the previous notebook, explain why some reshapes led to data copying and others didn't.
4. Create a 3D NumPy array with shape `(2, 3, 4)` and display its strides. Then, transpose the array and display the new strides.
5. Use `np.lib.stride_tricks.as_strided` to create a view of a 1D array with overlapping windows of size 3 and print the array.
6. Create a 2D NumPy array, reshape it, and compare the strides before and after reshaping.