## Python arrays with Numpy

In [1]:
import numpy as np
import pandas as pd

### Numpy Arrays

* A Numpy Array is a grid of values, all of the same type
* The number of dimensions is the _rank_ of the array
* The _shape_ of the array is a tuple of integers giving the size of the array along each dimension

Arrays can be converted from regular python lists with the np.array() function



In [2]:
# Convert a  list to a numpy array

list1 = [1, 2, 3, 4, 5]
array1 = np.array(list1)

In [3]:
array1

array([1, 2, 3, 4, 5])

*Note*: np.array() is a function that creates an instance of np.ndarray

If you want to check if an object of is a numpy array, you do `isinstance(a, np.ndarray)`

### Shape

Whenever you program with Arrays or matrices in Python, it helps to know (and check!) what the shape of your arrays are.

Generally: if a matrix of shape `[m x n]` is multiplied by a matrix of shape `[n x p]`, then new matrix is `[m x p]`

*Pro tip:* print the shape of your input and output vectors and matrices as you code to make sure they are as expected. This can save you much debugging time!

The _shape_ is given by the attribute `shape`

In [4]:
# What is the shape of this array?

array1.shape

(5,)

Why is there an empty slot in the shape tuple? `(5, )`

`(5, )` is python's way of signifying a tuple with one entry. The value of (5) is 5 so this would be ambiguous. An array with shape `(5,)` has only one dimension with length 5.

Let's see what the difference is with an array of shape `(5, 1)`


In [13]:
print(array1)
print(array1.shape)
print(array1.size)

print("Now create a 1 column array")

new_array = array1[np.newaxis]  # array([[1, 2, 3, 4, 5]])) newaxis increases the dimension of an array

print(new_array.T)
print("Shape is:", new_array.T.shape)
print(new_array.size)

[1 2 3 4 5]
(5,)
5
Now create a 1 column array
[[1]
 [2]
 [3]
 [4]
 [5]]
Shape is: (5, 1)
5


### Multi-dimensional arrays

In [16]:
# Create another list
list2 = [100, 200, 300, 400, 500]

# Create a list of lists
two_lists = [ list1, list2 ]
array2 = np.array(two_lists)

# show the resulting array
print(array2)

print("\nThe dimension of this array is:", array2.ndim)


[[  1   2   3   4   5]
 [100 200 300 400 500]]

The dimension of this array is: 2


In [17]:
# what is the shape of this array?
array2.shape

# => two rows, five columns

(2, 5)

In [65]:
# what is the type of this array?
array2.dtype

dtype('int32')

### Indexing arrays

Arrays are indexed using 0-based indexing

In [22]:
# let's look at our one-dimensional array again
print(array1)
# what is the third element of this array?
print(array1[3])
# just like lists, arrays support indexing from the end of the array
print(array1[-1])

[1 2 3 4 5]
4
5


In [26]:
# indexing two dimensional arays is done by passing the the index of each dimension
print(array2)
print(array2[0,0])
print(array2[0, 2])   # row 0, column 2

[[  1   2   3   4   5]
 [100 200 300 400 500]]
1
3


### Slicing Arrays

Numpy arrays support the rich slicing operators available to lists

In [19]:
# first create an array with 10 elements
a = np.arange(10)
print(a)
b = a[2:7:2]    # slice out the elements starting at 2, stopping at 7 and step by 2 (start:stop:step)
print(b)

[0 1 2 3 4 5 6 7 8 9]
[2 4 6]


In [23]:
#  slice items starting at an index and continuing to the end
c = a[2:]

print(c)

# and between the indexes
print(a[2:5])

[2 3 4 5 6 7 8 9]
[2 3 4]


In [27]:
# Multi-dimensional array slicing

a = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(a)
print()
# slice the items starting from row 1
print(a[1:])

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[[4 5 6]
 [7 8 9]]


In [38]:
# Slice out the elements from the second column
print(a); print()
print(a[...,1])
print()

# now get the items from the second row
print(a[1,...])
print()
# get the items from column 1 onwards
print(a[...,1:])

[[1 2 3]
 [4 5 6]
 [7 8 9]]

[2 5 8]

[4 5 6]

[[2 3]
 [5 6]
 [8 9]]


### Matrix Operations

In [66]:
# Create an array initialized to zeros
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [39]:
# Initialize with ones
np.ones((5,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [40]:
np.ones((5,5)).dtype

dtype('float64')

In [41]:
# you can create empty arrays. Careful! these are randomly initialized to anything that's in memory.

np.empty((4, 4))

array([[6.23042070e-307, 4.67296746e-307, 1.69121096e-306,
        1.86922637e-306],
       [6.23060744e-307, 2.22522597e-306, 1.33511969e-306,
        1.37962320e-306],
       [9.34604358e-307, 9.79101082e-307, 1.78020576e-306,
        1.69119873e-306],
       [2.22522868e-306, 1.24611809e-306, 8.06632139e-308,
        1.60221208e-306]])

In [42]:
# create an identity matrix

I = np.identity(4)
print(I)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [43]:
# The arange(n) function is similar to the range(n) function, but it returns a numpy array
# It initializes the array with the range of integers specified
A = np.arange(5)

In [44]:
A

array([0, 1, 2, 3, 4])

In [47]:
# create some matrices 
x = np.array([[1, 2], [3, 4]]) 
y = np.array([[7, 8], [9, 10]]) 
  
print("x =\n", x)
print("y =\n", y)
print()

# using add() to add matrices element-wise
print ("The element-wise addition of the two matrices is : ") 
print (np.add(x,y)) 
  
# using subtract() to subtract matrices 
print ("The element-wise subtraction of the matrices is : ") 
print (np.subtract(x,y)) 
  
# using divide() to divide matrices 
print ("The element-wise division of matrix is : ") 
print (np.divide(x,y)) 

x =
 [[1 2]
 [3 4]]
y =
 [[ 7  8]
 [ 9 10]]

The element-wise addition of the two matrices is : 
[[ 8 10]
 [12 14]]
The element-wise subtraction of the matrices is : 
[[-6 -6]
 [-6 -6]]
The element-wise division of matrix is : 
[[0.14285714 0.25      ]
 [0.33333333 0.4       ]]


In [76]:
# element-wise multiplication of matrices
print(np.multiply(x, y))

[[ 7 16]
 [27 40]]


In [94]:
# another way to do it
print(x * y)

[[ 7 16]
 [27 40]]


In [50]:
# the dot product of the matrices
print(x)
print(y)
print()
print(np.dot(x, y))

[[1 2]
 [3 4]]
[[ 7  8]
 [ 9 10]]

[[25 28]
 [57 64]]


In [51]:
# sum all the elements of the matrix
print(y)
print()
print("the sum of all the elements is", np.sum(y))

[[ 7  8]
 [ 9 10]]

the sum of all the elements is 34


In [52]:
# colum-wise summation
print("column-wise summation:", np.sum(y, axis=1))

column-wise summation: [15 19]


In [53]:
# Transpose the matrix
print(x)
print()
print(x.T)

[[1 2]
 [3 4]]

[[1 3]
 [2 4]]


In [55]:
# create an array
arr1 = np.array([[1,2,3],[8,9,10]])
arr1


array([[ 1,  2,  3],
       [ 8,  9, 10]])

In [58]:
# multiply arrays
arr1 * arr1

array([[  1,   4,   9],
       [ 64,  81, 100]])

In [62]:
# scalar operations on an array
arr1 * 4

array([[ 4,  8, 12],
       [32, 36, 40]])

In [63]:
arr1 **2

array([[  1,   4,   9],
       [ 64,  81, 100]], dtype=int32)

In [64]:
# take an inverse
1 / arr1

array([[1.        , 0.5       , 0.33333333],
       [0.125     , 0.11111111, 0.1       ]])

In [65]:
arr1 * (1/arr1)

array([[1., 1., 1.],
       [1., 1., 1.]])

In [66]:
# Create a (3,3) array
X = np.array([[1, 2, 3],
           [4, 5, 6],
           [7, 8, 9]])

In [67]:
X

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [68]:
I2 = np.identity(3)
I2

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [69]:
# what do you get when you multiply a matrix by its identity?
print(X)
print(X.dot(I2))
print(I2.dot(X))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


### Multiplicative inverse


In [70]:
# create an array and its inverse
A = np.array([[2., 3.], [3., 4.]])
# B = np.array([[-4, 3], [3, -2]])

# create its inverse
B = np.linalg.inv(A)


In [71]:
print(A)
print()
print(B)

[[2. 3.]
 [3. 4.]]

[[-4.  3.]
 [ 3. -2.]]


In [72]:
print(A.dot(B))
print(B.dot(A))

[[1. 0.]
 [0. 1.]]
[[1. 0.]
 [0. 1.]]


## Numpy Broadcasting

Broadcasting refers to how numpy handles arrays with different shapes during arithmetic operations.

In most cases, the smaller array is "broadcast" across the larger array so they have compatible shapes.

Broadcasting supports the concept of _vectorizing_ an operation so that loops are done very efficiently in C instead of python, without making needless copies of the data.

Broadcasting has the effect of "stretching" the smaller array so that it is compatible with a matrix operation with the other array



### Rules of broadcasting

1. If the two arrays differ in their number of dimensions, the shape of the one with the fewer dimensions is padded with ones on its left side.
2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape
3. If any dimension sizes disagree and niether is equal to 1, an error is raised.

Normally, array operations are done with arrays of compatible sizes

In [75]:
a = np.array([1,2,3])
b = np.array([4,4,4])
a + b

array([5, 6, 7])

When multiplying by a scalar, broadcasting 'stretches' the scalar to have a compatible shape

In [77]:
a + 4

array([5, 6, 7])

This has the same effect as expanding the scalar to an array `[4,4,4]`

In [78]:
a + np.array([4,4,4])

array([5, 6, 7])

What happens when we add a one-dimensional array to a two-dimensional array?

In [79]:
M = np.ones((3,3))
print(M)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [81]:
print(a)
print()
M + a

[1 2 3]



array([[2., 3., 4.],
       [2., 3., 4.],
       [2., 3., 4.]])

Here, the one-dimensional array `a` was broadcast (stretched) across the second dimension to match the shape of M

What happens when we have two dimensions that differ?

In [82]:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]

print(a)
print(b)

[0 1 2]
[[0]
 [1]
 [2]]


In [83]:
a  + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

What happened? Here, the `[0, 1, 2]` was broadcast to `[[0,1,2],[0,1,2],[0,1,2]]`
and the `[[0],[1],[2]]` (single-column array) was extended to `[[0,0,0],[1,1,1],[2,2,2]]`

In [86]:
a1 = np.array([[0,1,2],[0,1,2],[0,1,2]])
a2 = np.array([[0,0,0],[1,1,1],[2,2,2]])
print("The effect of broadcasting")
print(a1)
print(a2)

The effect of broadcasting
[[0 1 2]
 [0 1 2]
 [0 1 2]]
[[0 0 0]
 [1 1 1]
 [2 2 2]]


In [88]:
print("Gives expected result")
print(a1 + a2)

Gives expected result
[[0 1 2]
 [1 2 3]
 [2 3 4]]


### Practice example - Center observations to have zero mean

1. Create an array of 10 random observations with three columns each call this array X [10 x 3] use `np.random.random(n, n)`
2. Take the mean of each column X into an array [1 x 3]  _hint: use <array>.mean(axis=0)_
3. Subract the mean to get a representation of the original array X that is 'centered' around zero
    
For extra credit, implement the Mean Normalization formula where X  is the original value and x' is the normalized value.

$$x' = \frac{x - average(x)}{max(x) - min(x)}$$

In [102]:
# fill in the code here. Rpleace None with the proper code

X = None
# take the mean of the columns of X
X_mean = None
print("The mean of each column" , X_mean)
# subtract the mean to get the centered values
X_centered = None
# check your result by seeing if the mean of the new matrix is zero (to the precision of the cpu)
if X_centered is not None:
    print("Check your work")
    print(X_centered.mean(axis=0))



The mean of each column None


In [103]:
# Answer: 
X = np.random.random((10, 3))
print(X)
print()
# take the mean of the columns of X
X_mean = X.mean(0)
print("The mean of each column" , X_mean)
# subtract the mean to get the centered values
X_centered = X - X_mean
print(X_centered)
print("Check your work:")
print(X_centered.mean(0))

[[0.10955671 0.23034082 0.12911881]
 [0.95548795 0.01460991 0.61146307]
 [0.64003555 0.70942536 0.70114952]
 [0.44551036 0.10059025 0.743953  ]
 [0.90540154 0.48825    0.64997915]
 [0.60747308 0.99377717 0.36802317]
 [0.80167975 0.62843945 0.55705164]
 [0.44060395 0.44999859 0.82859436]
 [0.42398147 0.80858677 0.80381626]
 [0.71381209 0.97684153 0.24298307]]

The mean of each column [0.60435424 0.54008599 0.5636132 ]
[[-0.49479754 -0.30974517 -0.4344944 ]
 [ 0.3511337  -0.52547608  0.04784987]
 [ 0.03568131  0.16933937  0.13753631]
 [-0.15884388 -0.43949573  0.1803398 ]
 [ 0.30104729 -0.05183598  0.08636594]
 [ 0.00311884  0.45369119 -0.19559003]
 [ 0.19732551  0.08835347 -0.00656156]
 [-0.1637503  -0.09008739  0.26498115]
 [-0.18037277  0.26850078  0.24020305]
 [ 0.10945784  0.43675554 -0.32063014]]
Check your work:
[-5.55111512e-17 -1.11022302e-17  1.11022302e-17]
