<a href="https://colab.research.google.com/github/YCCS-Summer-2023-DDNMA/project/blob/main/ramesh_natarajan/numpy_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!python --version

Python 3.10.12


First import the numpy package and check its version number

In [4]:
import numpy as np
np.__version__

'1.22.4'

Numpy array are multidimensional arrays (e.g. 1-d, 2-d, 3-d etc.). A 1-d array is similar to a list or tuple, a 2-d array is similar to a matrix or a list of lists, etc.; however unlike python lists numpy arrays are stored contiguosly in memory for faster processing of array computations.

We will mostly be interested in Numpy arrays that store floats or integers for numerical computations.


Numpy array values are mutable and can be accessed and modified by indices or index ranges, similar to the case with Python lists, list of lists etc.;  however, unlike Python lists, numpy arrays values must have the same type. 

A numpy array object has 3 properties:

*   ndim - which is the number of array dimensions 
*   shape - which is a tuple containing the size of each dimension in order
*   dtype - which is type of the array values (e.g. float32)


When printed, values of the numpy array are displayed with the index moving fastest from the last index to the first index.

We demonstrate these properties below.

In [24]:
# initialize 1-d numpy array from a list

a = np.array([x for x in range(10)])
print(f"the values of a are : \n {a}")
print(f"ndim  of a = {a.ndim}")
print(f"shape of a = {a.shape}")
print(f"type of values of a = {a.dtype}")

the values of a are : 
 [0 1 2 3 4 5 6 7 8 9]
ndim  of a = 1
shape of a = (10,)
type of values of a = int64


In [27]:
# initialize a 2-d array from a list 
b = np.array([[i+j for j in range(5)] for i in range(0,30,10)])
print(f"the values of b are (second index moving fastest) : \n {b}")
print(f"ndim  of b = {b.ndim}")
print(f"shape of b = {b.shape}")
print(f"type of values of b = {b.dtype}")

the values of b are : 
 [[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]
ndim  of b = 2
shape of b = (5, 5)
type of values of b = int64


Special numpy arrays of different shapes can be created (all zeros, all ones, all constant).  The shapes  of these arrays can be derived or inferred from the shapes of other numpy arrays (using the corresponding _like method as shown below).  The default dtype can also be overriden when creating arrays.

In [35]:
# all zeros of a given shape
a = np.zeros([2,3])
print(f"a: {a}")

# all ones of a given shape
b = np.ones([2,3])
print(f"b {b}")

# all constant values of a given shape
c = np.full([2,3], 2.0)
print(f"c: {c}")

d = np.zeros_like(c, dtype=int)
print(f"d: {d}")

a: [[0. 0. 0.]
 [0. 0. 0.]]
b [[1. 1. 1.]
 [1. 1. 1.]]
c: [[2. 2. 2.]
 [2. 2. 2.]]
d: [[0 0 0]
 [0 0 0]]


Random number generation basically involves creating a pseudo random number generator object (default_rng) and calling methods on it to obtain random number from various distributions.   

You should set the seed in the call to default_rng to get reproducible random numbers.  If seed is not set, then some value for it will be obtained from the operating system and it will be nondeterministic.   Not setting the seed is a common reason for non-reproducibility issues in machine learning programs.


In [38]:
# let look at the case when no seed is set - create 2 rng objects

rng1 = np.random.default_rng()
rng2 = np.random.default_rng()

# generate a set of 5 random integers between 0 and 10 and compare

randint1 = rng1.integers(low=0, high=10, size=5)
randint2 = rng1.integers(low=0, high=10, size=5)

print("different seeds: randint1 and randint2 will be different")
print(f"randint1:\n  {randint1}")
print(f"randint2:\n  {randint2}")

# lets try this with the same seed  

seed = 1234
rng3 = np.random.default_rng(seed=seed)
rng4 = np.random.default_rng(seed=seed)

# generate a set of 5 random integers between 0 and 10 and compare

randint3 = rng3.integers(low=0, high=10, size=5)
randint4 = rng4.integers(low=0, high=10, size=5)

print("same seeeds: randint3 and randint2 will be different")
print(f"randint3:\n  {randint3}")
print(f"randint4:\n  {randint4}")

randint1 and randint2 will be different
randint1: [6 4 6 3 5]
randint2: [7 9 9 6 3]
randint3 and randint2 will be different
randint3: [9 9 9 3 1]
randint4: [9 9 9 3 1]


You can extract individual elements or slices & dices of numpy arrays using index notation.  These extracted elements or slices may have a lower rank depending on the extraction.

Note that using array indexing, positive indices are counted from the from and negative indices are counted from the back.  Similarly positive steps go forward while negative steps go backward. 

In [11]:
# for slicing and dicing, generate a random 3x3x3 array
rng = np.random.default_rng(1234)

a = np.random.uniform(low = -1.0, high = 1.0, size = (3,3,3))
print(f"nontrivial a:\n {a}")

# print the value in a given position
print(f"a[0,0,0]: \n {a[0,0,0]}")
print(f"a[-1,-1,-1]: \n {a[-1,-1,-1]}")

# print all values in the slice (0, :, :)
print(f"a[0,:, :]:\n {a[0,:,:]}")

# print all values in the slice (-1, :, :)
print(f"a[-1,:, :]:\n {a[-1,:,:]}")

# print all values in the dice (0, 0:2, 0:2)
print(f"a[0, 0:2, 0:2]:\n {a[0, 0:2, 0:2]}")



nontrivial a:
 [[[ 0.43859803 -0.34896337  0.90079142]
  [-0.13148565  0.66358014  0.27385787]
  [-0.88819928  0.44367567  0.50913955]]

 [[ 0.17518658  0.43279415 -0.91064623]
  [ 0.24505127  0.71801429  0.30185831]
  [-0.39833683 -0.99293906  0.63498701]]

 [[-0.17368381  0.10584973 -0.13486481]
  [-0.74839853  0.3311068  -0.5903107 ]
  [ 0.99317435  0.24681786  0.7357125 ]]]
a[0,0,0]: 
 0.43859803246686613
a[-1,-1,-1]: 
 0.7357124989051038
a[0,:, :]:
 [[ 0.43859803 -0.34896337  0.90079142]
 [-0.13148565  0.66358014  0.27385787]
 [-0.88819928  0.44367567  0.50913955]]
a[-1,:, :]:
 [[-0.17368381  0.10584973 -0.13486481]
 [-0.74839853  0.3311068  -0.5903107 ]
 [ 0.99317435  0.24681786  0.7357125 ]]
a[0, 0:2, 0:2]:
 [[ 0.43859803 -0.34896337]
 [-0.13148565  0.66358014]]


You can flatten multi-dimensional arrays to a 1-d array, choosing to vary the order of the elements from the first index to the last (row order or "c-style') or the last index to the first (column order or 'fortran-style').

Similarly the flattened arrays can be reshaped back to the original arrays

In [22]:
from re import A
# for slicing and dicing, generate a random 3x3x3 array
rng = np.random.default_rng(1234)

a = np.random.uniform(low = -1.0, high = 1.0, size = (3,3,3))
print(f"original a:\n {a}")

# save the shape
ashape = a.shape

# flatten the array by row and col orders respectively
broworder = a.flatten(order='C')
bcolorder = a.flatten(order='F')

print(f"\nflatten a by row (c-style); \n {broworder}")
print(f"\nflatten a by column (fortran-style); \n {bcolorder}")

# restore the original array from flattened arrays

crow = np.reshape(broworder, newshape = ashape, order = 'C')
ccol = np.reshape(bcolorder, newshape = ashape, order = 'F')

print(f"\nrestore a from broworder by row (c-style); \n {crow}")
print(f"\nrestopre a from bcolorder by column (fortran-style); \n {ccol}")

original a:
 [[[-0.40302699 -0.85880326 -0.32470971]
  [-0.90480572 -0.14246673  0.91024028]
  [ 0.78117461 -0.30083988 -0.5056109 ]]

 [[-0.38703013 -0.96567864  0.25073122]
  [ 0.01989169  0.99041451 -0.22785672]
  [ 0.83334238 -0.99844754 -0.85434489]]

 [[ 0.66643914 -0.60391865 -0.4933494 ]
  [ 0.88903308  0.94112035 -0.08617802]
  [ 0.51622226 -0.77290638  0.96178995]]]

flatten a by row (c-style); 
 [-0.40302699 -0.85880326 -0.32470971 -0.90480572 -0.14246673  0.91024028
  0.78117461 -0.30083988 -0.5056109  -0.38703013 -0.96567864  0.25073122
  0.01989169  0.99041451 -0.22785672  0.83334238 -0.99844754 -0.85434489
  0.66643914 -0.60391865 -0.4933494   0.88903308  0.94112035 -0.08617802
  0.51622226 -0.77290638  0.96178995]

flatten a by column (fortran-style); 
 [-0.40302699 -0.38703013  0.66643914 -0.90480572  0.01989169  0.88903308
  0.78117461  0.83334238  0.51622226 -0.85880326 -0.96567864 -0.60391865
 -0.14246673  0.99041451  0.94112035 -0.30083988 -0.99844754 -0.77290638
 

Numpy arrays can be mutated by assignment to the indexed array elements. This is helpful as generally you don't want to be making copies of large arrays just to change a few elements;.

In [12]:
# mutate an individual element or a slice of an array
a = np.ones(shape = [2,2])
print(f"initial a: \n {a}")

a[:, 0] = 0
a[1,1] = 2
print(f"after assignment a \n {a}")

initial a: 
 [[1. 1.]
 [1. 1.]]
after assignment a 
 [[0. 1.]
 [0. 2.]]



Some array operations can be surprising: for example a = a + b and a += b seem to be identical but the first returns a new array while the second returns the mutated array.

In [13]:
# follow the behavior of c which is original array a 

# a = a + b does no mutates the contents of the original a
a = np.ones(shape=[2,2])
b = np.full_like(a, 2)
c = a

a = a + b
print("/n unmutated case")
print(f"a = a+b: \n {a}")
print (f"c: \n {c}")

# a += b mutates the contents of the original a
a = np.ones(shape=[2,2])
b = np.full_like(a, 2)
c = a

a += b
print(f"a += b: \n {a}")
print (f"c: \n {c}")


/n unmutated case
a = a+b: 
 [[1. 1.]
 [1. 3.]]
c: 
 [[1. 1.]
 [1. 3.]]
/n mutated case
a += b: 
 [[1. 1.]
 [1. 3.]]
c: 
 [[1. 1.]
 [1. 3.]]


When **elementwise** array numerical operations are performed on 2 numpy arrays (e.g. array addition or array multiplication, if possible, the smaller array in the operation is often "broadcast" (i.e. the values are copied to a shape that is compatible with the shape of the larger array).   The rules for this broadcasting are given [here](https://numpy.org/doc/stable/user/basics.broadcasting.html) and should be reviewed carefully.

Lets look at some examples of broadcasting:

In [34]:

# this broadcast during addition works because the dimensions of the smaller array are compatible
a = np.ones(shape=[2,2])
b = np.array([5,6])

c = a + b
d = a*b

print("/n addition example: note how b is copied along the second axis to be compatible with a")
print(f"a: \n {a}")
print (f"b: \n {b}")
print (f"c = a + b: \n {c}")
print (f"c = a * b: \n {d}")

# this example will throw an error as the dimensions of the smaller array are not compatible
a = np.ones(shape=[2,2])
b = np.array([5,6, 7])
print("/n addition example: in this case b cannot be copied along the second axis to be compatible with a")
print(f"a: \n {a}")
print (f"b: \n {b}")
try :
  c = a + b
except:
  print(f"\nthrows an exception since the dimensions of a: {a.shape} and b: {b.shape} are incompatible")




/n addition example: note how b is copied along the second axis to be compatible with a
a: 
 [[1. 1.]
 [1. 1.]]
b: 
 [5 6]
c = a + b: 
 [[6. 7.]
 [6. 7.]]
c = a * b: 
 [[5. 6.]
 [5. 6.]]
/n addition example: in this case b cannot be copied along the second axis to be compatible with a
a: 
 [[1. 1.]
 [1. 1.]]
b: 
 [5 6 7]

throws an exception since the dimensions of a: (2, 2) and b: (3,) are incompatible
