# Numpy
Multi-dimensional array library


![numpy array](Numpy_Array_Dims.png)


## Why Numpy Arrays instead of Lists
A - Lists are slow

Why? 

1. Numpy uses fixed type.

    so, an integer (e.g the number 5) in numpy could be specified as a 8-bit digit, thereby only taking up 1 byte of memory

    but for a list, the same number has to be stored as 4 components: Object Value (represented as a *long* = 8 bytes), Object Type (8 bytes), Reference Count (8 bytes), and Size (4 bytes)

    that's 28 times as large for each integer value stored

    It's always faster to read in less bytes of memory.

    You also don't need to check the type when iterating through a numpy array (unlike a list), coz all elements are of the same type (e.g. int32).
    
.


2. Numpy utilises contiguous memory

    Lists store pieces of info scattered all over in memory, while numpy stores them together as a chunk. 


![memory management](Memory_Management.png)


    This allows us to use Single Instruction Multiple Data (SIMD) vector processing to perform computations on all the values in the data structure (instead of one at a time). 

    Also allows for effective cache management.

B - Allows for all the functionalities of lists + much more

e.g. multiplying two arrays


C - Used everywhere:

* For mathematics (as a replacement for MATLAB)
* In plotting packages (e.g. Matplotlib)
* In backend & data-processing packages (pandas, image processing etc)
* Machine learning (tensors are quite similar to numpy arrays)


### Load in NumPy (pip install numpy if not installed)

In [1]:
import numpy as np

### The Basics

In [23]:
#Initialise an Array
a = np.array([1,2,3])
print(a)

[1 2 3]


In [24]:
#Could be a 2d array of floats
b = np.array([[9.0,8.0,7.0],[6.0,5.0,4.0]])
print(b)

[[9. 8. 7.]
 [6. 5. 4.]]


In [25]:
# Get Dimension
print (a.ndim)
print (b.ndim)

1
2


In [26]:
# Get Shape
print (a.shape)
print (b.shape)

(3,)
(2, 3)


we can see how much memory our arrays take up

In [28]:
# Get Type
print (a.dtype)
print (b.dtype)

c = np.array([1,2,3], dtype='int16')
print (c.dtype)

int64
float64
int16


In [30]:
# Get Size (Bytes)
print(a.itemsize)
print(c.itemsize)

8
2


In [31]:
# Get total size
# alt: a.size * a.itemsize
print(a.nbytes)
print(c.nbytes)

24
6


In [9]:
# Get number of elements
a.size

3

### Accessing/Changing specific elements, rows, columns, etc

In [32]:
a = np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]]) #2x7 array
print(a)

[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]


In [34]:
a.shape #just to confirm the shape

(2, 7)

In [35]:
# Get a specific element [r, c] e.g. to get 13
a[1, 5] #alt a[1,-2]

13

In [36]:
# Get a specific row 
a[0, :] #basic slice syntax

array([1, 2, 3, 4, 5, 6, 7])

In [37]:
# Get a specific column
a[:, 2]

array([ 3, 10])

In [38]:
# Getting a little more fancy [startindex:endindex:stepsize] e.g. every other col between 2 and 6
# a[ 0, 1:6,2 ] 
a[0, 1:-1:2] #note: endindex is exlusive

array([2, 4, 6])

In [39]:
#Change specific element
a[1,5] = 20
print(a)
#Change a column of values
#a[:,2] = 5 #changes all to 5's
a[:,2] = [1,2]
print(a)

[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 20 14]]
[[ 1  2  1  4  5  6  7]
 [ 8  9  2 11 12 20 14]]


3-dim example

In [44]:
b = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
print(b)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [41]:
# Get specific element (work outside in) e.g. getting the 4
b[0,1,1] #play around with it e.g. b[:,1,:] to see results

4

In [45]:
# Replace (first check dimensions)
print (b[:,1,:])


[[3 4]
 [7 8]]


In [46]:
b[:,1,:] = [[9,9],[8,8]] #must be same dimension
print (b)

[[[1 2]
  [9 9]]

 [[5 6]
  [8 8]]]


### Indexing Experiment

How would you get each (blue set, green set and red set) of the following values in the array

![Indexing Question](Indexing_Tricks_Qn.png)

### Solutions
![Indexing Solution](Indexing_Tricks_Soln.png)

### Initializing Different Types of Arrays
Reference: [https://numpy.org/doc/stable/reference/routines.array-creation.html](https://numpy.org/doc/stable/reference/routines.array-creation.html)

In [47]:
# All 0s matrix
#e.g np.zeros(5) # all it needs is a shape
np.zeros((2,3)) #try ((2,3,3,2))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [48]:
# All 1s matrix (datatype is optional)
np.ones((4,2,2), dtype='int32')

array([[[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]],

       [[1, 1],
        [1, 1]]], dtype=int32)

In [49]:
# Any other number (parameters are shape and values)
np.full((2,2), 99) #add , dtype='float32' to see difference

array([[99, 99],
       [99, 99]])

In [50]:
# Any other number (full_like)
a = np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
np.full_like(a, 4) #array who's shape you wanna emulate + initialiser value (optional)
#alt: np.full(a.shape, 4)

array([[4, 4, 4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4, 4, 4]])

In [51]:
# Random decimal numbers
np.random.rand(4,2) #NOTE: No inner tuple, just integers

array([[0.47050457, 0.5383274 ],
       [0.30183274, 0.34762344],
       [0.02784019, 0.43027933],
       [0.27458377, 0.4061509 ]])

In [52]:
#from existing shape
np.random.random_sample(a.shape)

array([[0.88112287, 0.4778928 , 0.16759709, 0.03648196, 0.33028403,
        0.47943578, 0.47286137],
       [0.32144266, 0.06263779, 0.21041109, 0.22928579, 0.49331647,
        0.63177872, 0.64450346]])

In [53]:
# Random Integer values
np.random.randint(-4,8, size=(3,3)) #start(optional), end(optional, but can work without start, exlusive)

array([[-4,  7, -3],
       [ 4, -2,  2],
       [ 7,  7, -3]])

In [54]:
# The identity matrix
np.identity(5) #only 1 parameter coz always a square matrix

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [55]:
# Repeat an array
arr = np.array([[1,2,3]])
r1 = np.repeat(arr,3) #same as axis=1
print(r1)
r2 = np.repeat(arr,3, axis=0)
print(r2)

[1 1 1 2 2 2 3 3 3]
[[1 2 3]
 [1 2 3]
 [1 2 3]]


## Experiment
Try to initialise the following array without typing in all the values


![Array Image](array_initialisation_experiment.png)

### Solutions

In [56]:
#Possible Solution 1
output = np.ones((5,5))
print(output)

z = np.zeros((3,3))
z[1,1] = 9
print(z)

output[1:-1,1:-1] = z
print(output)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[0. 0. 0.]
 [0. 9. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 9. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]


In [57]:
#Possible Solution 2
output = np.ones((5,5))
print(output)
output[1:4,1:4]=0 # or output[1:-1,1:-1]=0
print(output)
output[2,2]=9
print(output)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]
[[1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 0. 9. 0. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1.]]


In [58]:
#Possible Solution 3
output = np.zeros((5,5), dtype='int8')
output[:,0:5:4], output[0:5:4,:], output[2,2] = 1, 1, 9
print(output)

[[1 1 1 1 1]
 [1 0 0 0 1]
 [1 0 9 0 1]
 [1 0 0 0 1]
 [1 1 1 1 1]]


**NOTE: Be careful when copying arrays**

In [60]:
#This seems fine
a = np.array([1,2,3])
b = a

print(b)

[1 2 3]


In [61]:
#But let's see what happens when you change something
b[0] = 100
print (b)
print (a)

[100   2   3]
[100   2   3]


Why? When we set b=a, we just tell numpy that *b* points to the same memory that *a* does. We didn't tell it to make a copy of *a* and call it *b*

In [62]:
a = np.array([1,2,3])
b = a.copy()
b[0]=100
print(a)
print(b)

[1 2 3]
[100   2   3]


### Mathematics
Reference: [https://docs.scipy.org/doc/numpy/reference/routines.math.html](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

In [63]:
a = np.array([1,2,3,4])
print(a)

[1 2 3 4]


In [64]:
#Element-wise arithmetic
a + 2

array([3, 4, 5, 6])

In [65]:
a - 2

array([-1,  0,  1,  2])

In [66]:
a * 2

array([2, 4, 6, 8])

In [67]:
a / 2

array([0.5, 1. , 1.5, 2. ])

In [68]:
b = np.array([1,0,1,0])
a + b

array([2, 2, 4, 4])

In [69]:
a ** 2

array([ 1,  4,  9, 16])

In [70]:
# Take the sin and cosine
print(np.sin(a))
print(np.cos(a))

[ 0.84147098  0.90929743  0.14112001 -0.7568025 ]
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362]


#### Linear Algebra

In [71]:
a = np.ones((2,3)) 
print(a)

b = np.full((3,2), 2)
print(b)

np.matmul(a,b) #2x3 * 3*2 = 2*2 (cols of a must be same as rows of b)

[[1. 1. 1.]
 [1. 1. 1.]]
[[2 2]
 [2 2]
 [2 2]]


array([[6., 6.],
       [6., 6.]])

In [72]:
# Find the determinant
c = np.identity(3)
np.linalg.det(c)

1.0

Reference: [https://docs.scipy.org/doc/numpy/reference/routines.linalg.html](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html)

Other functions:

    * Determinant
    * Trace
    * Singular Vector Decomposition
    * Eigenvalues
    * Matrix Norm
    * Inverse


#### Statistics

In [73]:
stats = np.array([[1,2,3],[4,5,6]])
stats

array([[1, 2, 3],
       [4, 5, 6]])

In [74]:
np.min(stats) #max etc

1

In [76]:
# can be on a row by row basis
np.max(stats, axis=1)

array([3, 6])

In [77]:
#or col by col
np.sum(stats, axis=0)

array([5, 7, 9])

### Reorganizing Arrays

In [78]:
#Let's say we have this array
before = np.array([[1,2,3,4],[5,6,7,8]])
print(before.shape)

(2, 4)


In [79]:
# and we want it to be an 8x1 array
after = before.reshape((8,1))
print(after)

[[1]
 [2]
 [3]
 [4]
 [5]
 [6]
 [7]
 [8]]


In [80]:
# or a 4x2 array
after = before.reshape((4,2))
print(after)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]


In [82]:
# could be anything, as long as the total number of values matches
after = before.reshape((2,2,2))
print(after)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [83]:
# if not, you get an error
after = before.reshape((2,3))
print(after)

ValueError: cannot reshape array of size 8 into shape (2,3)

In [85]:
# Vertically stacking vectors
v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])

np.vstack([v1,v2])

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [96]:
# can be multiple stacked together in whatever order (as long as sizes along axis of aligment match up)
np.vstack([v1,v2,v2,v1,v1])

array([[1, 2, 3, 4],
       [5, 6, 7, 8],
       [5, 6, 7, 8],
       [1, 2, 3, 4],
       [1, 2, 3, 4]])

In [88]:
# Horizontal  stack
h1 = np.ones((2,4))
h2 = np.zeros((2,2))
print(h1)
print(h2)

np.hstack((h1,h2)) #can be () or [] for inner part - both work

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[0. 0.]
 [0. 0.]]


array([[1., 1., 1., 1., 0., 0.],
       [1., 1., 1., 1., 0., 0.]])

In [91]:
#other form of stacking (but element based)
np.stack((v1,v2), axis=1) 

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

### Miscellaneous
#### Load Data from File

In [98]:
filedata = np.genfromtxt('data.txt', delimiter=',')
print(filedata)
#change type once loaded in
filedata = filedata.astype('int32') #must pass back into itself for changes to take place
print(filedata)

[[  1.  13.  21.  11. 196.  75.   4.   3.  34.   6.   7.   8.   0.   1.
    2.   3.   4.   5.]
 [  3.  42.  12.  33. 766.  75.   4.  55.   6.   4.   3.   4.   5.   6.
    7.   0.  11.  12.]
 [  1.  22.  33.  11. 999.  11.   2.   1.  78.   0.   1.   2.   9.   8.
    7.   1.  76.  88.]]
[[  1  13  21  11 196  75   4   3  34   6   7   8   0   1   2   3   4   5]
 [  3  42  12  33 766  75   4  55   6   4   3   4   5   6   7   0  11  12]
 [  1  22  33  11 999  11   2   1  78   0   1   2   9   8   7   1  76  88]]


#### Boolean Masking and Advanced Indexing

In [99]:
#where in filedata are values greater than 50
filedata > 50

array([[False, False, False, False,  True,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])

In [100]:
#useful as an index to get those values back
filedata [filedata > 50]

array([196,  75, 766,  75,  55, 999,  78,  76,  88], dtype=int32)

In [101]:
#Yes, you can index in numpy with a list!
a = np.array([1,2,3,4,5,6,7,8,9])
#let's say we want 2,3 and 9 in the above array
a[[1,2,8]]

array([2, 3, 9])

In [102]:
#is any value in any column greater than 50
np.any(filedata > 50, axis = 0) #axis=1 would be on a row basis

array([False, False, False, False,  True,  True, False,  True,  True,
       False, False, False, False, False, False, False,  True,  True])

In [103]:
#only return true if all meet the criteria
np.all(filedata > 50, axis = 0) #only 5th column is True

array([False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False])

In [104]:
#you can combine criteria - syntax similar to pandas (which is built on numpy, so makes sense)
((filedata > 50) & (filedata < 100))

array([[False, False, False, False, False,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])

In [105]:
#~ means "not", so the reverse of this, i.e. all values not greater than 50, or greater than 100 i.e. everything else
(~((filedata > 50) & (filedata < 100)))

array([[ True,  True,  True,  True,  True, False,  True,  True,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True, False,  True, False,  True,
         True,  True,  True,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True, False,
         True,  True,  True,  True,  True,  True,  True, False, False]])

In [107]:
#numpy also useful for non-number values, but if you mix types, better to use dtype=object
a = np.array([["String",1,2]], dtype=object)
b = [["another string", 3, 4]]
a = np.vstack((a,np.asarray(b,object)))
print(a)

[['String' 1 2]
 ['another string' 3 4]]


In [111]:
a[0,0]

'String'