# <span style="color:dodgerBlue">Video 1 Goals</span>

* Create Rank 1 and Rank 2 ndarrays
* Access elements in ndarrays using basic indexing
* Use built-in functions to quickly and easily create useful ndarrays

## Creating a Rank 1 numpy array

In [5]:
import numpy as np
an_array = np.array([3, 33, 333]) # Create a rank 1 array
print(type(an_array)) # The type of an ndarray is "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [2]:
# test the shape of the array that was just created, it should have just one dimension
print(an_array.shape)

(3,)


In [3]:
#because this is a 1-rank array, we need only one index to access  each element
print(an_array[0], an_array[1], an_array[2])

3 33 333


In [4]:
# Since ndarrays are mutable, we can an element's value
an_array[0] = 888
print(an_array)

[888  33 333]


## How to create a Rank 2 numpy array

In [10]:
another = np.array([[11, 12, 13], [21, 22, 23]]) # Create a rank 2 array
print(another)
print("The shape is 2 rows, 3 columns: ", another.shape) # print rows x columns
print("Accessing elements [0,0], [0, 1], and [1, 0] of the ndarray: ", another[0 ,0], ",", another[0, 1], ",", another[1, 0])

[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 columns:  (2, 3)
Accessing elements [0,0], [0, 1], and [1, 0] of the ndarray:  11 , 12 , 21


### There are many ways to create numpy arrays

In [12]:
import numpy as np

# create a 2x2 array of zeros
ex1 = np.zeros((2, 2,))
print(ex1)

[[0. 0.]
 [0. 0.]]


In [13]:
# create a 2x2 array filled with 9.0
ex2 = np.full((2, 2,), 9.0)
print(ex2)

[[9. 9.]
 [9. 9.]]


In [14]:
# create a 2x2 matrix with the diagonal 1s and the others 0
ex3 = np.eye(2,2)
print(ex3)

[[1. 0.]
 [0. 1.]]


In [18]:
# create an array of ones
ex4 = np.ones((1, 2))
print(ex4)

# ex4 is actually rank 2, its is a 2x1 array
print(ex4.shape)

# we need to use two indixes to access an element
print(ex4[0, 1])

[[1. 1.]]
(1, 2)
1.0


In [19]:
# create an array of random floats between 0 and 1
ex5 = np.random.random((2, 2))
print(ex5)

[[0.55043449 0.87648094]
 [0.61963682 0.07707372]]


# <span style="color:dodgerBlue">Video 2 Goals</span>

* Use slice indexing to access subsets of an ndarray
* Recognize that such indexing creates a second reference to the same underlying data

## Array Indexing

### Slice Indexing

In [24]:
import numpy as np

# Create a rank 2 array of shape (3, 4)
an_array = np.array([[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows x 2 columns

In [26]:
a_slice = an_array[:2, 1:3]
print(a_slice)

[[12 13]
 [22 23]]


When you modify a slice, you actually modify the underlying array

In [27]:
print("Before:", an_array[0, 1]) # inspect the element at 0, 1
a_slice[0, 0] = 1000 # a_slice[0,0] is the same piece of data as an_array[0, 1]
print("After:", an_array[0, 1])

Before: 12
After: 1000


### Using both integer indexing and slice indexing

In [30]:
# Create a rank 2 array of shape (3, 4)
an_array = np.array([[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [31]:
# Using both integer indexing and slicing generates an array of lower rank
row_rank1 = an_array[1, :] # rank 1 view
print(row_rank1, row_rank1.shape) # notice that there is only a single []

[21 22 23 24] (4,)


In [32]:
# Using only slicing generates an array of the same rank as the original array
row_rank2 = an_array[1:2, :] # Rank 2 view
print(row_rank2, row_rank2.shape) # notice that there is [[]]

[[21 22 23 24]] (1, 4)


In [33]:
# The same concept applies to collumns of an array
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]
print(col_rank1, col_rank1.shape)
print()
print(col_rank2, col_rank2.shape)

[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


### Array Indexing for changing elements

In [42]:
# Create a new array (4, 3)
an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])
print("Original Array:")
print(an_array)

Original Array:
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [43]:
# Create an array of indices
col_indices = np.array([0, 1, 2, 0])
print("\nColumn indices picked : ", col_indices)

row_indices = np.arange(4)
print("\nRow indices picked : ", row_indices)


Column indices picked :  [0 1 2 0]

Row indices picked :  [0 1 2 3]


In [44]:
# Use zip to examine the pairings of row_indices and col_indices
for row, col in zip(row_indices, col_indices):
        print(row, ", ", col)

0 ,  0
1 ,  1
2 ,  2
3 ,  0


In [46]:
# Select one element from each row
print("Values in the array at those indices: ",an_array[row_indices, col_indices])

Values in the array at those indices:  [11 22 33 41]


In [48]:
# Change an element from each row using the indices selected
an_array[row_indices, col_indices] += 100
print("\nChanged Array:")
print(an_array)


Changed Array:
[[121  12  13]
 [ 21 132  23]
 [ 31  32 143]
 [151  42  43]]


# <span style="color:dodgerBlue">Video 3 Goals</span>

* Use boolean indexing to access and permute relevant data in ndarrays

In [50]:
# Create an 3x2 array
an_array = np.array([[11, 12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [53]:
# Create a filter which will assign boolean values for whether each element is > 15
filter = (an_array > 15)
filter

array([[False, False],
       [ True,  True],
       [ True,  True]])

The filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15.

In [54]:
# Select just those elements which meet the criteria
print(an_array[filter])

[21 22 31 32]


In [56]:
# We could do the previous two steps in one step
print(an_array[an_array > 15])

[21 22 31 32]


What is particularly useful is that we can actually change elements in the array applying a similar logical filter.  Let's add 100 to all the even values.

In [57]:
an_array[an_array % 2 == 0] += 100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


# <span style="color:dodgerBlue">Video 4 Goals</span>

* Examine and set the datatype of an ndarray
* Use common ndarray functions

## Datatypes and Array Operations

### Datatypes

In [58]:
ex1 = np.array([11, 12]) # Python automatically assigns the data type
print(ex1.dtype)

int32


In [60]:
ex2 = np.array([11.0, 12.0]) # Python automatically assigns the data type
print(ex2.dtype)

float64


In [65]:
ex3 = np.array([11, 21], dtype=np.int64) # We can also tell Python the  data type
print(ex3.dtype)

int64


In [66]:
# We can use this to force floats into integers (using floor function)
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [67]:
# We can use this to force integers into floats if you anticipate that the values may change to floats later
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)
print()
print(ex5)

float64

[11. 21.]


### Arithmetic Array Operations

In [68]:
x = np.array([[111,112],[121,122]], dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)

print(x)
print()
print(y)

[[111 112]
 [121 122]]

[[211.1 212.1]
 [221.1 222.1]]


In [69]:
# Add arrays x and y
print(x + y)         # The plus sign works
print()
print(np.add(x, y))  # So does the numpy function "add"

[[322.1 324.1]
 [342.1 344.1]]

[[322.1 324.1]
 [342.1 344.1]]


In [70]:
# subtract
print(x - y)
print()
print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [71]:
# multiply
print(x * y)
print()
print(np.multiply(x, y))

[[23432.1 23755.2]
 [26753.1 27096.2]]

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [72]:
# divide
print(x / y)
print()
print(np.divide(x, y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [73]:
# square root
print(np.sqrt(x))

[[10.53565375 10.58300524]
 [11.         11.04536102]]


In [74]:
# exponent (e ** x)
print(np.exp(x))

[[1.60948707e+48 4.37503945e+48]
 [3.54513118e+52 9.63666567e+52]]


# <span style="color:dodgerBlue">Video 5 Goals</span>

* Use common ndarray functions for data analysis including statistical, sorting, and set operations

## Statistical Methods, Sorting, and Set Operations

### Basic Statistical Operations

In [76]:
# setup a random 2 x 4 matrix
arr = 10 * np.random.randn(2,5)
print(arr)

[[ 3.24124891 10.79221245 13.24476096 -2.60633465  1.50900523]
 [15.75695651 -3.2707305  -0.02088301  0.29872254  8.92361285]]


In [77]:
# compute the mean for all elements
print(arr.mean())

4.786857128183984


In [78]:
# compute the means by row
print(arr.mean(axis = 1))

[5.23617858 4.33753568]


In [79]:
# compute the means by column
print(arr.mean(axis = 0))

[ 9.49910271  3.76074097  6.61193897 -1.15380606  5.21630904]


In [80]:
# sum all the elements
print(arr.sum())

47.86857128183984


In [81]:
# compute the medians
print(np.median(arr, axis = 1))

[3.24124891 0.29872254]


### Sorting

In [82]:
# create a 10 element array of randoms
unsorted = np.random.randn(10)

print(unsorted)

[ 1.91994769 -0.70920206 -1.05920345 -1.15961669  1.2622369  -0.21435793
  0.20314342  1.47261184 -0.10817332 -0.08749661]


In [83]:
# create copy and sort
sorted = np.array(unsorted)
sorted.sort()

print(sorted)
print()
print(unsorted)

[-1.15961669 -1.05920345 -0.70920206 -0.21435793 -0.10817332 -0.08749661
  0.20314342  1.2622369   1.47261184  1.91994769]

[ 1.91994769 -0.70920206 -1.05920345 -1.15961669  1.2622369  -0.21435793
  0.20314342  1.47261184 -0.10817332 -0.08749661]


In [84]:
# inplace sorting
unsorted.sort() 

print(unsorted)

[-1.15961669 -1.05920345 -0.70920206 -0.21435793 -0.10817332 -0.08749661
  0.20314342  1.2622369   1.47261184  1.91994769]


### Finding Unique Elements

In [85]:
array = np.array([1, 2, 1, 4, 2, 1, 4, 2])

print(np.unique(array))

[1 2 4]


### Set operations with np.array data type

In [86]:
s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb','chair'])
print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [87]:
print( np.intersect1d(s1, s2) ) 

['bulb' 'chair']


In [88]:
print( np.union1d(s1, s2) )

['bulb' 'chair' 'desk' 'lamp']


In [89]:
print( np.setdiff1d(s1, s2) ) # elements in s1 that are not in s2

['desk']


In [91]:
print( np.in1d(s1, s2) ) # boolean representing which element of s1 is also in s2

[False  True  True]


# <span style="color:dodgerBlue">Video 5 Goals</span>

* Employ broadcasting to perform operations on different size ndarrays

## <a href="https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html">Broadcasting</a>

In [93]:
import numpy as np

start = np.zeros((4,3))
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [94]:
# create a rank 1 ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [95]:
y = start + add_rows  # add to each row of 'start' using broadcasting
print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [97]:
# create an ndarray which is 4 x 1 to broadcast across columns
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T # Transpose array

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [98]:
# add to each column of 'start' using broadcasting
y = start + add_cols 
print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [99]:
# this will just broadcast in both dimensions
add_scalar = np.array([1])  
print(start+add_scalar)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


# <span style="color:dodgerBlue">Video 6 Goals</span>

* Describe the speed benefits of ndarrays over lists

## Speedtest: ndarrays vs lists

First setup paramaters for the speed test. We'll be testing time to sum elements in an ndarray versus a list.

In [100]:
from numpy import arange
from timeit import Timer

size    = 1000000
timeits = 1000

In [101]:
# create the ndarray with values 0,1,2...,size-1
nd_array = arange(size)
print( type(nd_array) )

<class 'numpy.ndarray'>


In [103]:
# timer expects the operation as a parameter, 
# here we pass nd_array.sum()
timer_numpy = Timer("nd_array.sum()", "from __main__ import nd_array")

print("Time taken by numpy ndarray: %f seconds" % 
      (timer_numpy.timeit(timeits)/timeits))

Time taken by numpy ndarray: 0.000643 seconds


In [104]:
# create the list with values 0,1,2...,size-1
a_list = list(range(size))
print (type(a_list) )

<class 'list'>


In [105]:
# timer expects the operation as a parameter, here we pass sum(a_list)
timer_list = Timer("sum(a_list)", "from __main__ import a_list")

print("Time taken by list:  %f seconds" % 
      (timer_list.timeit(timeits)/timeits))

Time taken by list:  0.038216 seconds
