Using Modules

Modules allow for code-reuse and portability. Python is a batteries included language, meaning that lots of excellent modules
are already included in the base language.

Due to its legacy as a web programming
language, most of the standard libraries deal with network protocols and other topics
important to web development. The standard library modules are documented on the
main Python site


In [1]:
import math 
dir(math)

['__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'cbrt',
 'ceil',
 'comb',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'dist',
 'e',
 'erf',
 'erfc',
 'exp',
 'exp2',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'isqrt',
 'lcm',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'nextafter',
 'perm',
 'pi',
 'pow',
 'prod',
 'radians',
 'remainder',
 'sin',
 'sinh',
 'sqrt',
 'sumprod',
 'tan',
 'tanh',
 'tau',
 'trunc',
 'ulp']

Writing and Using Your Own Modules

We will create a separate Py file as our module and import it to our interactive session:

In [2]:
from OwnModuleExample import sqrt 
sqrt(3)

9

In [3]:
import OwnModuleExample
dir(OwnModuleExample)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'sqrt']

In [4]:
OwnModuleExample.sqrt(3)

9

On adding a few changes to the module file (from sqrt return x*x to sqrt return x/2 in our function, and adding a poly func that was not there at the start), the code does not reflect the changes made to the module, as shown below: 

In [5]:
OwnModuleExample.sqrt(3)

9

Why? You have to importlib.reload in order to get new changes in your file
into the interpreter.

A directory called __pycache__ will automatically appear in
the same directory as our module.

This directory will automatically be
refreshed every time you make changes to your module file. It is important to never
include the __pycache__ directory into your Git repository because when others
clone your repository, if the filesystem gets the timestamps wrong, it could be that
__pycache__ falls out of sync with the source code.

Using a Directory as a Module

Beyond sticking all your Python code into a single file, you can use a directory to
organize your code into separate files. The trick is to put a __init__.py file in
the top level of the directory you want to import from. The file can be empty. 

--------------------------------------------------------------------------

Dynamic Importing

In case you do not know the names of the modules that you need to import ahead
of time, the __import__ function can load modules from a specified list of module
names.

In [8]:
sys = __import__('sys')   #import module from string argument
sys.version

'3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:03:56) [MSC v.1929 64 bit (AMD64)]'

Namespaces distinguish between importing and running a Python script

In [None]:
if __name__ == '__main__':
    #these statements are not executed during import
    #do run statements here 

#There is also __file__ which is the filename of the imported file.    

In addition
to the __init__.py file, you can put a __main__.py file in the top level of
your module directory if you want to call your module using the -m switch on
the commandline. Doing this means the __init__.py function will also run

------------------------------------------------------------------------

1.0 NUMPY

Dtypes

In [1]:
import numpy as np 

a = np.array([0], np.int16) #16-bit integer
a.itemsize #returns in 8-bit bytes

2

In [2]:
a.nbytes

2

In [4]:
a = np.array([0], np.int64) #64-bit integer
a.itemsize 

8

In [5]:
#Numerical arrays will also follow the same pattern 
a = np.array([0,1,23,4], np.int64)
a.shape 

(4,)

In [6]:
a.nbytes 

32

In [7]:
#we cannot tack on extra elements to a Numpy array after creation 
#raises an index error 
a = np.array([1,2])
a[2] = 32

IndexError: index 2 is out of bounds for axis 0 with size 2

the block of memory has already been delineated and Numpy will
not allocate new memory and copy data without explicit instruction.

Also, once you
create the array with a specific dtype, assigning to that array will cast to that type:

In [8]:
x = np.array(range(5), dtype=int)
x[0] = 1.33 #float assignment does not match dtype
x

array([1, 1, 2, 3, 4])

In [10]:
x[0] = 'this is a string'

#Different to Matlab 

ValueError: invalid literal for int() with base 10: 'this is a string'

1.2 Multidimensional Arrays

In [11]:
# they follow the same pattern
a = np.array([[1,3],[4,5]]) #omitting dtype picks default 
a 

array([[1, 3],
       [4, 5]])

In [12]:
a.dtype 

dtype('int32')

In [13]:
a.shape

(2, 2)

In [14]:
a.flatten()

array([1, 3, 4, 5])

maximum limit on the number of dimensions depends on how it is configured
during the Numpy build (usually thirty-two). Numpy offers many ways to build
arrays automatically:

In [16]:
a = np.arange(10)  #same as range()
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
a = np.ones((2,2))
a

array([[1., 1.],
       [1., 1.]])

In [18]:
a = np.linspace(0,1,5) #start stop step 
a

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

meshgrid is used to create coordinate grids from two or more 1D arrays, often for plotting or evaluating functions over a grid:

In [19]:
X, Y = np.meshgrid([1,2,3],[5,6]) 
X

array([[1, 2, 3],
       [1, 2, 3]])

In [20]:
Y

array([[5, 5, 5],
       [6, 6, 6]])

In [22]:
a = np.zeros([2,2])
a

array([[0., 0.],
       [0., 0.]])

In [23]:
#create Numpy arrays using functions, 
np.fromfunction(lambda i,j: abs(i-j)<=1, (4,4))

array([[ True,  True, False, False],
       [ True,  True,  True, False],
       [False,  True,  True,  True],
       [False, False,  True,  True]])

In [24]:
#we can have field names on np arrays 
a = np.zeros((2,2), dtype = [('x','f4')])
a['x']

array([[0., 0.],
       [0., 0.]], dtype=float32)

In [25]:
x = np.array([(1,2)], dtype=[('value','f4'),('amount','c8')])
x['value']

array([1.], dtype=float32)

In [26]:
x['amount']

array([2.+0.j], dtype=complex64)

In [27]:
x = np.array([(1,9),(2,10),(3,11),(4,14)], dtype = [('value','f4'), ('amount','c8')])
x['value']

array([1., 2., 3., 4.], dtype=float32)

'value' is the field name for the first element in each tuple, and 'f4' is its data type, meaning it's a 32-bit floating point (single precision).

'amount' is the field name for the second element, and 'c8' is its data type, meaning it's a string of 8 characters (although numbers are provided, they'll be stored as strings).

In [28]:
#Numpy arrays can also be accessed by their attributes using recarray

y = x.view(np.recarray)
y.amount #access as attribute

array([ 9.+0.j, 10.+0.j, 11.+0.j, 14.+0.j], dtype=complex64)

Reshaping and Stacking Numpy Arrays

In [29]:
#we can stack arrays horizontally and vertically
x = np.arange(5)
y = np.array([9,10,11,12,13])

In [30]:
np.hstack([x,y])  #stacking them horizontally

array([ 0,  1,  2,  3,  4,  9, 10, 11, 12, 13])

In [31]:
np.vstack([x,y])

array([[ 0,  1,  2,  3,  4],
       [ 9, 10, 11, 12, 13]])

There is also a dstack method if you want to stack in the third depth dimension.
Numpy np.concatenate handles the general arbitrary-dimension case. In some
codes (e.g., scikit-learn), you may find the terse np.c_ and np.r_ used to
stack arrays column-wise and row-wise:

In [32]:
np.c_[x,x]  #stack column-wise 


array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])

In [33]:
np.r_[x,x]  #stack row-wise

array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4])

1.4 Duplicating Numpy Arrays

Numpy has a repeat function for duplicating elements and a more generalized
version in tile that lays out a block matrix of the specified shape:

In [34]:
x = np.arange(4)
np.repeat(x,2)

array([0, 0, 1, 1, 2, 2, 3, 3])

In [35]:
np.tile(x,(2,1))

array([[0, 1, 2, 3],
       [0, 1, 2, 3]])

In [37]:
np.tile(x,(2,2))  #repeat twice across rows and columns 

array([[0, 1, 2, 3, 0, 1, 2, 3],
       [0, 1, 2, 3, 0, 1, 2, 3]])

In [39]:
#you can also have non-numerics like strings as items in the array:
np.array(['a','b','cow','dive'])

array(['a', 'b', 'cow', 'dive'], dtype='<U4')

'U4' refers to string length of 4, which is the longest string in the array (dive)

In [41]:
#Reshaping numpy arrays 
a =np.arange(10).reshape(2,5)
a

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

you can be lazy and replace one of the dimensions above by negative one (i.e.,
reshape(-1,5) ), and Numpy will figure out the conforming of the other dimension.

The array transpose method operation is the same as the .T attribute:

In [42]:
a.transpose()

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

In [44]:
a.T

array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])

1.5 Slicing, Logical Array Operations

Numpy arrays follow the same zero-indexed slicing logic as Python lists and strings:

In [47]:
x = np.arange(50).reshape(5,10)
x

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [48]:
#the colon means select all along the indicated dimension(rows):
x[:,0] #all rows in the first column 

array([ 0, 10, 20, 30, 40])

In [49]:
x[0,:]  #all columns on the first row 

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [50]:
x = np.arange(2*3*4).reshape(2,3,4)
x

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [51]:
x[:,1,[2,1]]  #index each dimension 

#: represents all (0th slice) axes - 1st dimension
#1 selects second row of each axis (index 1) -2nd dimension 
# [2,1] selects the elements at index 2 and index 1 from the selected row for each slice 


array([[ 6,  5],
       [18, 17]])

Numpy’s where function can find array elements according to specific logical
criteria. Note that np.where returns a tuple of Numpy indices,

In [52]:
np.where(x % 2 ==0)

#Here, the output is a tuple of arrays, with the first array representing the indices where x is even 
# (i.e., at positions 0, 1, 2).

(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1], dtype=int64),
 array([0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2], dtype=int64),
 array([0, 2, 0, 2, 0, 2, 0, 2, 0, 2, 0, 2], dtype=int64))

In [53]:
x[np.where (x %2 ==0)]

#Here, you’re selecting the actual values from x at the indices where
#  the condition is True (even numbers in this case).

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22])

In [54]:
x[np.where(np.logical_and(x%2 ==0, x<9))] #must meet the two conditions combined 

array([0, 2, 4, 6, 8])

1.6 Numpy Arrays and Memory

Numpy uses pass-by-reference semantics so that slice operations are views into
the array without implicit copying, which is consistent with Python’s semantics. 

This is particularly helpful with large arrays that already strain available memory.
In Numpy terminology, slicing creates views (no copying) and advanced indexing
creates copies.

1. If the indexing object (i.e., the item between the brackets) is a non-tuple sequence
object, another Numpy array (of type integer or boolean), or a tuple with at least one sequence object or Numpy array, then indexing creates copies. 

In [57]:
x = np.ones((3,3))
x

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [58]:
x[:,[0,1,2,2]]  #notice duplicated last dimension  

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [59]:
y=x[:,[0,1,2,2]]

Because of advanced indexing, the variable y has its own memory because the
relevant parts of x were copied. 

To prove it, we assign a new element to x and
see that y is not updated:

In [60]:
x[0,0] = 999 #changed element in x
x

array([[999.,   1.,   1.],
       [  1.,   1.,   1.],
       [  1.,   1.,   1.]])

In [61]:
y  #NOT CHANGED

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

However, if we start over and construct y by slicing (which makes it a view) as
shown below, then the change we made does affect y because a view is just a
window into the same memory:

In [63]:
x = np.ones((3,3))
y = x[:2,:2]  #view of uper left piece
x[0,0] = 999 #change value
x

array([[999.,   1.,   1.],
       [  1.,   1.,   1.],
       [  1.,   1.,   1.]])

In [64]:
y

array([[999.,   1.],
       [  1.,   1.]])

Note that if you want to explicitly force a copy without any indexing tricks, you can
do y=x.copy().

 The code below works through another example of advanced
indexing versus slicing:

In [65]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [66]:
y = x[[0,1,2]]  #indexed by integer list to make copy 
y

array([0, 1, 2])

In [67]:
z = x[:3]
z

array([0, 1, 2])

In [68]:
x[0] = 999
x

array([999,   1,   2,   3,   4])

In [69]:
y  #y is not affected

array([0, 1, 2])

In [70]:
z #z is affected since it's a view (slicing happened)

array([999,   1,   2])

In the above example, y is a copy, not a view, because it was created using advanced
indexing, whereas z was created using slicing. Thus, even though y and z have the
same entries, only z is affected by changes to x. 

Overlapping Numpy arrays

Manipulating memory using views is particularly
powerful for signal and image processing algorithms that require overlapping
fragments of memory. 

Here is how to use advanced Numpy
to create overlapping blocks that do not actually consume additional memory:

In [2]:
import numpy as np
from numpy.lib.stride_tricks import as_strided 

x = np.arange(16).astype(np.int32)
y=as_strided(x,(7,4),(8,4))  #overlapped slices 
y

array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13],
       [12, 13, 14, 15]])

The above code creates a range of integers and then overlaps the entries to create a
7x4 Numpy array. The final argument in the as_strided function are the strides,
which are the steps in bytes to move in the row and column dimensions, respectively.

Thus, the resulting array steps four bytes in the column dimension and eight bytes
in the row dimension. Because the integer elements in the Numpy array are four
bytes, this is equivalent to moving by one element in the column dimension and by
two elements in the row dimension.

The second row in the Numpy array starts at
eight bytes (two elements) from the first entry (i.e., 2) and then proceeds by four
bytes (by one element) in the column dimension (i.e., 2,3,4,5). 

The important part is that memory is re-used in the resulting 7x4 Numpy array. The code below
demonstrates this by reassigning elements in the original x array. The changes show
up in the y array because they point at the same allocated memory:

In [3]:
x[:2]  = 99  #assign every other value 
x

array([99, 99,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])

In [4]:
y #the changes appear because y is a view 

array([[99, 99,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13],
       [12, 13, 14, 15]])