In [1]:
from Session.Session import seshTrack

Awwwwwwwwwyeah!
Initializing new session: 11/21/2021 09:58:57


# When to use Arrays and Deques

Important mutable alternatives to lists with examples such as:
- array to efficiently hold 10mil floats as packed bytes representing machine vals
- deques (double-ended queue) to allow repeated addition or removal of items from either end of a list 
    - This is a FIFO (first in, first out) data structure 
        - For queues, another FIFO, this legit means the first in is the first out (oldest processed first)

## Arrays

array.array is better when dealing with just numbers (restricted to a single type), and w/ same mutable operations as lists
- .pop(), .insert(), .remove(), .extend()

Also supports fast loading and saving
- .frombytes and .tofile

Requires a typecode, designating the underlying C type used to store each item in the array
- 'b' is signed char, where items in array('b') are stored in a single byte and interpreted as Ints from -128 to 127
    - any number above 127 is two bytes

Array of double data type is generally a default choice for floats, but this is for another discussion
- Important note: it should not be used for precise decimal positions such as that in currency

In [4]:
# Ex 2-20 creating, saving and loading large array of floats

from array import array # imports array type
from random import random # imports random() -> x in [0,1)

# Create array and save to file
floats = array('d', (random() for i in range(10**7))) #store 10^7 random() vals into array of doubles 
last_float1 = floats[-1] # save last val for comparison
fp = open('testdata/floats.bin', 'wb') # open new writable file for signed char
floats.tofile(fp) # save array to binary file
fp.close()

# Create new array from reading prev file
floats2 = array('d') # empty array of doubles
fp = open('testdata/floats.bin', 'rb')
floats2.fromfile(fp, 10**7) # read 10mil floats from binary file
fp.close()
last_float2 = floats2[-1] # save last val for comparison
print("Same last vals: ", last_float1 == last_float2)
print("Same contents in arrays: ", floats2 == floats)

Same last vals:  True
Same contents in arrays:  True


### Example advantages of using array.array
- array.fromfile creates a binary file that loads ~60x faster than reading from a textfile
- array.tofile creates a file ~7x faster than writing floats into a textfile
- size of binary file (in this example) is less than half the size of the textfile with the same contents
    - ~80mil bytes vs ~180mil bytes

In [18]:
seshTrack("Last edited:")

**celebratory karate moves**
Last edited: 10/28/2021 23:50:40


## Memory Views
- Generalized NumPy array structure that allows you to share memory b/w data structures (PIL imgs, SQLite dbs, NumPy arrays) without first copying
    - valuable in handling large datasets
- memoryview.cast method
    - Similar notation to array module
    - changes read/write of multiple bytes w/o moving them around
    - returns a memoryview object that always shares the same memory

In [14]:
# Ex 2-21 Change val of array item by poking one of its bytes
from array import array # imports array type

numbers = array('h', [-2, -1, 0, 1, 2])
memv = memoryview(numbers) # build memview from array of 5 short signed ints (typecode 'h')
len(memv)
# 5 originally

memv[0]
# -2 originally, memv should = numbers at this point

memv_oct = memv.cast('B') # created new memv_oct by casting eles of memv to typecode 'B' unsigned char
memv_oct.tolist() # 254, 255, 255, 255, 0, 0, 1, 0, 2, 0 to list for inspection
memv_oct[5] = 4
numbers

array('h', [-2, -1, 1024, 1, 2])

In [15]:
seshTrack("Last edited:")

🫵💻📈😀🌎
Last edited: 11/16/2021 22:07:04


## NumPy and SciPy
- Major contributors to why Python became popular for scientific computing
- NumPy library
    - multi-dimensional, homogeneous arrays and matrix types
    - can hold numbers or user-defined records
    - efficient element-wise operations
- SciPy library
    - adds to NumPy with scientific computing algorithms from linear algebra, numerical calculus and statistics
    - interactive prompt, high-level python APIs, functions optimized in C and Fortran (Fast)

In [36]:
# Ex 2-22 basic operations w/ rows and cols in numpy.ndarray
import numpy
a = numpy.arange(12) # create 1d array with 12 elements
print('a:', a)
print('type(a):', type(a))
print('a.shape:', a.shape) # num of elements per axis/dimension
a.shape = 3, 4 # add one dimension to create 3 rows of 4 elements
print('a.shape:', a.shape)
print('reshaped a:\n', a)
print('a[2]:',a[2]) #single val indexes the row
print('a[2,1]:', a[2,1]) # [x, y] indices access [row, column]
print('a[:,1]:', a[:,1]) # all vals in col index 1
print('a.transpose():\n', a.transpose()) #swap columns with rows
a.transpose() # does not transpose in place, creates a new array
a # for comparison, unchanged


a: [ 0  1  2  3  4  5  6  7  8  9 10 11]
type(a): <class 'numpy.ndarray'>
a.shape: (12,)
a.shape: (3, 4)
reshaped a:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
a[2]: [ 8  9 10 11]
a[2,1]: 9
a[:,1]: [1 5 9]
a.transpose():
 [[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

NumPy supports loading, saving and operation on all elements of a numpy.ndarray.

Earlier in this section, the array type holding 10mil floats was stored to and loaded from a bin file (Ex 2-20). Let's approach this now using NumPy.

In [80]:
# NumPy saving, loading and operations on numpy.ndarrays
import numpy as np
from random import random
from time import perf_counter as pc

np_floats = np.array([random() for i in range(10**7)]) # store 10^7 random() vals into numpy.ndarray
print('np_floats[-3:]:', np_floats[-3:]) # inspect last 3 eles
np_floats *= 0.5 # multiply every ele by 0.5, could also use /= 2 here
print('np_floats[-3:]:',np_floats[-3:]) # inspect

t0 = pc() # time zero
np_floats /= 3 # divide 10mil eles by 3 in numpy.ndarray
print('Time elapsed(np_floats /= 3): ', pc() - t0) # end time - time zero

np.save('testdata/floats-10M', np_floats) # save in npy binary file - yes binary
np_floats2 = np.load('testdata/floats-10M.npy', 'r+') # load as memory-mapped file into another array*
np_floats2 *= 6 # multiplier to offset the *= 0.5 and /= 3
print('np_floats2[-3:]:', np_floats2[-3:]) # inspect
# *r+ here refers to opening a file for read and write access, with the pointer starting at the beginning of the file

# Recall previously array was used, but the array type cannot be subjected to operations as simply as numpy arrays
from array import array
arr_floats = array('d')
fp = open('testdata/floats.bin', 'rb') # must be rb, r and r+ do not work here
arr_floats.fromfile(fp, 10**7) # read 10mil floats from binary file
fp.close()
print('arr_floats[-3:]:', arr_floats[-3:]) # inspect
try:
    arr_floats *= 0.5 # try operation on array type
    print('arr_floats[-3:]', arr_floats[-3:]) # inspect
except TypeError:
    arr_to_np_floats = np.array(arr_floats) # try conversion to numpy.ndarray
    arr_to_np_floats *= 0.5 # try operation again
    print('arr_to_np_floats[-3:]:', arr_to_np_floats[-3:]) # inspect



np_floats[-3:]: [0.6920266  0.97449818 0.80439692]
np_floats[-3:]: [0.3460133  0.48724909 0.40219846]
Time elapsed(np_floats /= 3):  0.002105124999616237
np_floats2[-3:]: [0.6920266  0.97449818 0.80439692]
arr_floats[-3:]: array('d', [0.9650237399926022, 0.4984799428406267, 0.9576639769119581])
arr_to_np_floats[-3:]: [0.48251187 0.24923997 0.47883199]


In [81]:
seshTrack("Last edited:")

Don't stop believing in yourself!
Last edited: 11/18/2021 09:31:49


## Queues
### Treating lists as stacks or queues:
- .append
    - add an element to the end of a list
- .pop
    - pop(n) removes the nth index value from the list
    - pop(0) removes the earliest (zero) element in the list
- Use append + pop(0) for first-in-first-out (FIFO) behavior
    - Removing from the left (0-index end) of a list is time costly
        - entire list must be shifted
    - Deques get around this

### Deques
- class collections.deque
    - double-ended queue for fast inserting and removing from either end
    - "bounded" deques are valuable for keeping list of "last seen" items
        - defined max length, where exceeding the max discards items from the opposite end
    - .rotate
        - if n > 0, take n eles from right end and move to the left end
        - if n < 0, take n eles from left end and move to the right end
    - .append
        - Alternative: .appendleft
        - appends a single element to the deque (default: right end)
    - .extend
        - Alternative: .extendedleft
        - successively appends elements to the deque (default: right end)
            - this means appending to the left using .extendleft will reverse the input list elements
    - .pop
        - Alternative: .popleft
        - removes an element from the deque (default: right end)
    - NOTE: Removal from the middle of the deque is time costly, only use these for appending to and popping from ends


In [19]:
# Ex 2-23 Working with a deque

from collections import deque
dq = deque(range(10), maxlen=10) # deque with max 10 eles, vals 0-9
print("dq:", dq)
dq.rotate(3) # takes n eles from the right end, and moves to the left end
print("dq.rotate(3):", dq)
dq.rotate(-4) # takes n eles from the left end, and moves to the right end
print("dq.rotate(-4):", dq)
dq.appendleft(-1) # appends the ele to the left end of the deque
print("dq.appendleft(-1):", dq)
dq.extend([11, 22, 33]) # *successively* add each ele to right end
print("dq.extend([11, 22, 33]):", dq)
dq.extendleft([10, 20, 30, 40]) # *successively* add each ele to left end
print("dq.extendleft([10, 20, 30, 40]):", dq)
dq.pop() # removes rightmost
print("dq.pop():", dq)
dq.popleft() # removes leftmost
print("dq.popleft():", dq)

dq: deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
dq.rotate(3): deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)
dq.rotate(-4): deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)
dq.appendleft(-1): deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)
dq.extend([11, 22, 33]): deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)
dq.extendleft([10, 20, 30, 40]): deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)
dq.pop(): deque([40, 30, 20, 10, 3, 4, 5, 6, 7], maxlen=10)
dq.popleft(): deque([30, 20, 10, 3, 4, 5, 6, 7], maxlen=10)


### Other Python standard library packages implementing queues (REVISIT THIS)
#### queue
- synchronized (thread-safe) classes Queue, LifoQueue, and PriorityQueue
- used for safe communication b/w threads
- can be bounded to a maxsize
    - they do not discard items to make room
    - when queue is full, insertion of a new item is blocked
        - waits until some other thread makes room/takes an item from the queue
        - useful to throttle number of live threads

#### multiprocessing
- implements bounded Queue, similar to queue.Queue
    - designed for interprocess communication
- See specialized multiprocessing.JoinableQueue for task management

#### asyncio (Python 3.4)
- provides Queue, LifoQueue, PriorityQueue, and JoinableQueue with APIs inspired by classes contained in the queue and multiprocessing modules
    - adapted for managing tasks in asynchronous programming
#### heapq
- does not implement a queue class
- provides fxns like heappush and heappop
    - allows use of a mutable sequence as a heap queue or priority queue

In [20]:
seshTrack("Last edited:")

Ayyyyyyy! Well done!
Last edited: 11/21/2021 10:45:48
