# Chapter 2. An Array of Sequences
---
## ToC

[When a List Is Not the Answer](#when-a-list-is-not-the-answer)

1. [Arrays](#arrays)  
2. [Memory Views](#memory-views)
3. [NumPy](#numpy)
---

## When a List Is Not the Answer

The `list` type is flexible and easy to use, but depending on specific requirements,
there are better options. For example, an `array` saves a lot of memory when you need
to handle millions of floating-point values. On the other hand, if you are constantly
adding and removing items from opposite ends of a list, it’s good to know that a
`deque` (double-ended queue) is a more efficient FIFO data structure.

![Figure 29](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/29.PNG)

### Arrays

If a list only contains numbers, an `array.array` is a more efficient replacement. Arrays support all mutable sequence operations (including `.pop`, `.insert`, and `.extend`), as well as additional methods for fast loading and saving, such as `.frombytes` and `.tofile`. An array of float values does not hold full-fledged float instances, but only the packed bytes representing their machine values—similar to an array of double in the C language

### Summary of Key Differences

| Feature         | Tuple                        | array.array / numpy.array         |
|----------------|------------------------------|-----------------------------------|
| Type flexibility | Heterogeneous                 | Homogeneous (fixed type)          |
| Memory layout   | Pointers to PyObjects         | Raw contiguous memory             |
| Performance     | Slower for numeric ops        | Fast for numeric operations       |
| Mutability      | Immutable (tuple)             | Mutable                           |
| Use cases       | General-purpose grouping      | Efficient numeric storage/computing |

For more details on the memory layout differences, visit [Array VS Tuple](https://github.com/berserkhmdvhb/Training-Python/blob/main/src/Part_I/Chapter_2_ArrayOfSequences/ArrayVSTuple.md)

In [1]:
import sys
import array
import numpy as np

# Mixed-type tuple
t = (1, 'a', 3.14)
print("TUPLE:")
print(f"Total size of tuple: {sys.getsizeof(t)} bytes")
for i, item in enumerate(t):
    print(f"  Element {i}: value={item}, id={id(item)}, size={sys.getsizeof(item)} bytes")

TUPLE:
Total size of tuple: 64 bytes
  Element 0: value=1, id=140717557146536, size=28 bytes
  Element 1: value=a, id=140717557211056, size=42 bytes
  Element 2: value=3.14, id=2449950598896, size=24 bytes


In [2]:
# Integer array.array
a = array.array('i', [1, 2, 3])
print("\nARRAY.ARRAY:")
print(f"Total size of array.array: {sys.getsizeof(a)} bytes")
for i, item in enumerate(a):
    print(f"  Element {i}: value={item}")


ARRAY.ARRAY:
Total size of array.array: 92 bytes
  Element 0: value=1
  Element 1: value=2
  Element 2: value=3


In [3]:
# NumPy array
n = np.array([1, 2, 3], dtype=np.int32)
print("\nNUMPY.ARRAY:")
print(f"Total size of numpy.array: {n.nbytes} bytes (data only)")
print(f"Total size including metadata: {sys.getsizeof(n)} bytes")
for i, item in enumerate(n):
    address = n.ctypes.data + i * n.itemsize
    print(f"  Element {i}: value={item}, address offset={address}")


NUMPY.ARRAY:
Total size of numpy.array: 12 bytes (data only)
Total size including metadata: 124 bytes
  Element 0: value=1, address offset=2449954772880
  Element 1: value=2, address offset=2449954772884
  Element 2: value=3, address offset=2449954772888


When creating an array, you provide a typecode, a letter to determine the underlying C
type used to store each item in the array. For example, b is the typecode for what
C calls a signed char, an integer ranging from –128 to 127. If you create an
`array('b')`, then each item will be stored in a single byte and interpreted as an integer.
For large sequences of numbers, this saves a lot of memory.

More info: [Python Docs - array](https://docs.python.org/3/library/array.html)

In [4]:
array.array('b', [-128,127])

array('b', [-128, 127])

In [5]:
array.array('b', [-129,127])

OverflowError: signed char is less than minimum

In [None]:
array.array('b', [-128,128])

OverflowError: signed char is greater than maximum

**Creating, saving, and loading a large array of floats**

In [6]:
from array import array
from random import random
floats = array('d', (random() for i in range(10**7)))
floats[-1]

0.45079958731707725

In [7]:
fp = open('materials/floats.bin', 'wb')
# Save the array to a binary file.
floats.tofile(fp)
fp.close()

In [8]:
# Create an empty array of doubles.
floats2 = array('d')
fp = open('materials/floats.bin', 'rb')
# Read 10 million numbers from the binary file.
floats2.fromfile(fp, 10**7)
fp.close()
floats2[-1]

0.45079958731707725

In [9]:
# Verify that the contents of the arrays match.
floats2 == floats

True

![Figure 30](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/30.PNG)

![Figure 31](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/31.PNG)

## Memory Views

The built-in memoryview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. It was inspired by the NumPy library. Travis Oliphant, lead author of NumPy, answers the question, ["When should a memoryview be used?"](https://fpy.li/2-17) like this:

> A memoryview is essentially a generalized NumPy array structure in Python itself
(without the math). It allows you to share memory between data-structures (things like
PIL images, SQLite databases, NumPy arrays, etc.) without first copying. This is very
important for large data sets.

Using notation similar to the `array` module, the `memoryview.cast` method lets you change the way multiple bytes are read or written as units without moving bits around. `memoryview.cast` returns yet another `memoryview` object, always sharing the same memory.

**Example:** How to create alternate views on the same array of 6 bytes, to
operate on it as a 2×3 matrix or a 3×2 matrix.

In [10]:
from array import array
# Build array of 6 bytes (typecode 'B').
octets = array('B', range(6))
octets

array('B', [0, 1, 2, 3, 4, 5])

In [11]:
# Build memoryview from that array, then export it as a list.
m1 = memoryview(octets)
m1.tolist()

[0, 1, 2, 3, 4, 5]

In [12]:
# Build new memoryview from that previous one, but with 2 rows and 3 columns
m2 = m1.cast('B', [2, 3])
m2.tolist()

[[0, 1, 2], [3, 4, 5]]

In [13]:
# 3 rows 2 columns
m3 = m1.cast('B', [3, 2])
m3.tolist()

[[0, 1], [2, 3], [4, 5]]

In [14]:
# proving that the memory was shared among octets, m1, m2, and m3.
m2[1, 1] = 22
m3[1, 1] = 33
octets

array('B', [0, 1, 2, 33, 22, 5])

**Example:** How to to change a single byte of an item in an array of 16-bit integers.

In [15]:
numbers = array('h', [-2, -1, 0, 1, 2])
# Build memoryview from array of 5 16-bit signed integers (typecode 'h').
memv = memoryview(numbers)
memv.tolist()

[-2, -1, 0, 1, 2]

In [16]:
memv[0]

-2

In [17]:
# Create memv_oct by casting the elements of memv to bytes (typecode 'B').
memv_oct = memv.cast('B')
# Export elements of memv_oct as a list of 10 bytes, for inspection.
memv_oct.tolist()

[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]

Explanation of values transformed:

| Int Value | 16-bit Binary        | Bytes (little-endian)  | Interpreted Unsigned Bytes  |
|-----------|----------------------|------------------------|-----------------------------|
|    -2     | 11111111 11111110    | 0xFE 0xFF              | 254, 255                    |
|    -1     | 11111111 11111111    | 0xFF 0xFF              | 255, 255                    |
|     0     | 00000000 00000000    | 0x00 0x00              | 0, 0                        |
|     1     | 00000000 00000001    | 0x01 0x00              | 1, 0                        |
|     2     | 00000000 00000010    | 0x02 0x00              | 2, 0                        |
|     4     | 00000000 00000100    | 0x04 0x00              | 4, 0                        |
|  1024     | 00000100 00000000    | 0x00 0x04              | 0, 4                        |

In [18]:
# a 4 in the most significant byte of a 2-byte unsigned integer is 1024.
memv_oct[5] = 4
memv_oct.tolist()

[254, 255, 255, 255, 0, 4, 1, 0, 2, 0]

In [19]:
numbers

array('h', [-2, -1, 1024, 1, 2])

```
[0,4]

First byte = 0 → 00000000 (low byte)

Second byte = 4 → 00000100 (high byte)

high byte + low byte = 00000100 00000000

   4           0     
00000100   00000000  
   ↑           ↑     
high byte   low byte 
```
$$2^{10} = 1024$$

![Figure 32](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/32.PNG)

Link: [fluentpython.com: “Parsing binary records with struct”.](https://fpy.li/2-18)

Meanwhile, if you are doing advanced numeric processing in arrays, you should be
using the NumPy libraries.

## NumPy

NumPy implements multidimensional, homogeneous arrays and matrix types that hold not only numbers but also user-defined records, and provides efficient element-wise operations.

SciPy is a library, written on top of NumPy, offering many scientific computing algorithms from linear algebra, numerical calculus, and statistics. SciPy is fast and reliable because it leverages the widely used C and Fortran codebase from the [Netlib Repository](https://fpy.li/2-19). In other words, SciPy gives scientists the best of both worlds: an interactive prompt and high-level Python APIs, together with industrial-strength numbercrunching functions optimized in C and Fortran.

**Basic operations with twodimensional arrays**

In [20]:
import numpy as np
a = np.arange(12)
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [21]:
type(a)

numpy.ndarray

In [22]:
a.shape

(12,)

In [23]:
a.shape = 3, 4
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [24]:
a[2]

array([ 8,  9, 10, 11])

In [25]:
a[2, 1]

np.int64(9)

In [26]:
a[:, 1]

array([1, 5, 9])

**High-level operations for loading, saving, and operating on all elements of a numpy.ndarray**

In [27]:
a.transpose()

array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

In [28]:
import numpy
floats = numpy.loadtxt('materials/floats-10M-lines.txt')
floats[-3:]

array([[ 1.49890e+04,  7.05410e-02, -2.29033e-01,  0.00000e+00],
       [ 1.49900e+04,  7.64190e-02, -2.29033e-01,  0.00000e+00],
       [ 1.49910e+04,  7.93580e-02, -2.34926e-01,  0.00000e+00]])

In [29]:
floats *= .5
floats[-3:]

array([[ 7.494500e+03,  3.527050e-02, -1.145165e-01,  0.000000e+00],
       [ 7.495000e+03,  3.820950e-02, -1.145165e-01,  0.000000e+00],
       [ 7.495500e+03,  3.967900e-02, -1.174630e-01,  0.000000e+00]])

In [30]:
# Import the high-resolution performance measurement timer
from time import perf_counter as pc
t0 = pc();
floats /= 3;
pc() - t0

6.930000381544232e-05

In [31]:
numpy.save('materials/floats-10M', floats)

In [32]:
floats2 = numpy.load('materials/floats-10M.npy', 'r+')
floats2 *= 6
floats2[-3:]

memmap([[ 1.49890e+04,  7.05410e-02, -2.29033e-01,  0.00000e+00],
        [ 1.49900e+04,  7.64190e-02, -2.29033e-01,  0.00000e+00],
        [ 1.49910e+04,  7.93580e-02, -2.34926e-01,  0.00000e+00]])

## Deques and Other Queues

The `.append` and `.pop` methods make a list usable as a stack or a queue (if you use `.append` and `.pop(0)`, you get FIFO behavior). But inserting and removing from the head of a list (the 0-index end) is costly because the entire list must be shifted in memory. 

The class `collections.deque` is a thread-safe double-ended queue designed for fast inserting and removing from both ends. It is also the way to go if you need to keep a list of “last seen items” or something of that nature, because a `deque` can be bounded —i.e., created with a fixed maximum length. If a bounded deque is full, when you add a new item, it discards an item from the opposite end.

In [33]:
from collections import deque
dq = deque(range(10), maxlen=10)
dq

deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)

In [34]:
dq.rotate(3)
dq

deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)

Rotating with `n > 0` takes items from the right end and prepends them to the
left; when `n < 0` items are taken from left and appended to the right.

In [35]:
dq.rotate(-4)
dq

deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)

In [36]:
dq.appendleft(-1)
dq

deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)

Appending to a deque that is full `(len(d) == d.maxlen)` discards items from the other end;

In [37]:
dq.extend([11, 22, 33])

In [38]:
dq

deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)

**Note:** But there is a hidden cost: removing items from the middle of a deque is not as fast. It is really optimized for appending
and popping from the ends.

![Figure 33](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/33.PNG)