## Module 4: Python


# More on Data
## NUMPY
<br>

Asel Kushkeyeva<br>
Data Science Institute, University of Toronto<br>
2022

### Jupyter Notebook as a Slideshow

To see this notebook as a live slideshow, we need to install RISE (Reveal.js - Jupyter/IPython Slideshow Extension):

1. Insert a cell and execute the following code: `conda install -c conda-forge rise`
2. Restart the Jupyter Notebook.
3. On the top of your notebook you have a new icon that looks like a bar chart; hover over the icon to see 'Enter/Exit RISE Slideshow'.
4. Click on the RISE icon and enjoy the slideshow.
5. You can edit the notebook in a slideshow mode by double clicking the line.
*This is done only once. Now all your notebooks will have the RISE extension (unless you re-install the Jupyter Notebook).*

# Agenda

1. Array Creation
2. ndarray
3. Basic Operations
4. Indexing, Slicing, and Iterating

# NumPy Install and Array Creation

If you haven't installed NumPy, please run `conda install numpy`.

In [1]:
import numpy as np

NumPy's object is a __multidimensional array__ of elements of the __same type__. Dimensions in an array are called __*axes*__.

Let us look at array creation methods.

In [11]:
# method one: type the array paying attention to the square brackets position
a = np.array([(1, 2, 3),
            (3, 2, 1)])

In [12]:
a

array([[1, 2, 3],
       [3, 2, 1]])

In [17]:
# method two: array of zeros; digits in the brackets set array's dimensions (axes)
np.zeros((2, 5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [15]:
# method three: first two digits sets a range, the third digit set a step
np.arange(1, 10, 2)

array([1, 3, 5, 7, 9])

In [18]:
# method four: first two digits set a linear space meaning that the elements of the array are spaced out 
# by an equal length, the third digit sets number of array elements
np.linspace(1, 3, 10)

array([1.        , 1.22222222, 1.44444444, 1.66666667, 1.88888889,
       2.11111111, 2.33333333, 2.55555556, 2.77777778, 3.        ])

In [36]:
# method five: set a range and set axes by reshape()
np.arange(20).reshape(2, 10)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])

In [95]:
# method six: random number between 0 and 1
np.random.seed(1) # seed is used to create reproducible example
np.random.random((3, 4))

array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, 3.02332573e-01],
       [1.46755891e-01, 9.23385948e-02, 1.86260211e-01, 3.45560727e-01],
       [3.96767474e-01, 5.38816734e-01, 4.19194514e-01, 6.85219500e-01]])

In [96]:
# method seven: random integers
np.random.randint(1, 10, (3, 7)) # random integers between 1 and 10 exclusive and shaped into 3 by 7

array([[3, 5, 8, 8, 2, 8, 1],
       [7, 8, 7, 2, 1, 2, 9],
       [9, 4, 9, 8, 4, 7, 6]])

In [105]:
# method eight: by stretching an array
onedim_arr = np.array([1, 2, 3, 4, 5])

In [112]:
multidim_arr = np.tile(onedim_arr, (5,1))
multidim_arr

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5]])

In [117]:
another_arr = np.tile(onedim_arr, (3, 3))
another_arr

array([[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]])

In [114]:
# method nine: by repeating elements
np.repeat(onedim_arr, 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5])

There are many more ways of array creation you can find on the [NumPy Quickstart](https://numpy.org/devdocs/user/quickstart.html) webpage.

# ndarray Attributes

NumPy's array class is called __ndarray__. Let us look at some ndarray's attributes.

In [28]:
b = np.array([(1, 4, 5, 7, 2), 
              (4, 3, 2, 8, 0), 
              (0, 5, 3, 8, 1), 
              (0, 9, 5, 8, 3), 
              (5, 3, 1, 6, 8), 
              (5, 1, 7, 9, 0)])

In [22]:
b.ndim

2

In [29]:
b.shape

(6, 5)

In [None]:
b = np.array([(1, 4, 5, 7, 2), 
              (4, 3, 2, 8, 0), 
              (0, 5, 3, 8, 1), 
              (0, 9, 5, 8, 3), 
              (5, 3, 1, 6, 8), 
              (5, 1, 7, 9, 0)])

In [30]:
b.size

30

In [31]:
b.dtype

dtype('int64')

In [32]:
# size in bytes of each element
b.itemsize

8

# Basic Operations

In [37]:
arr1 = np.array([5, 10, 15, 20])
arr1

array([ 5, 10, 15, 20])

In [41]:
arr2 = np.arange(5, 9)
arr2

array([5, 6, 7, 8])

In [43]:
arr3 = arr1 - arr2
arr3

array([ 0,  4,  8, 12])

In [45]:
arr1

array([ 5, 10, 15, 20])

In [46]:
arr1**2

array([ 25, 100, 225, 400])

In [47]:
arr2

array([5, 6, 7, 8])

In [50]:
arr2 < 8

array([ True,  True,  True, False])

In [51]:
arr1

array([ 5, 10, 15, 20])

In [52]:
arr2

array([5, 6, 7, 8])

In [53]:
# product of two array is products of each element pair-wise
arr1 * arr2

array([ 25,  60, 105, 160])

In [54]:
arr1

array([ 5, 10, 15, 20])

In [55]:
arr1.sum()

50

In [56]:
arr1.min()

5

In [57]:
arr1.mean()

12.5

In [58]:
b = np.array([(1, 4, 5, 7, 2), 
              (4, 3, 2, 8, 0), 
              (0, 5, 3, 8, 1), 
              (0, 9, 5, 8, 3), 
              (5, 3, 1, 6, 8), 
              (5, 1, 7, 9, 0)])

In [60]:
# sum of each row
b.sum(axis = 1)

array([19, 17, 17, 25, 23, 22])

In [61]:
# sum of each column
b.sum(axis = 0)

array([15, 25, 23, 46, 14])

# Indexing, Slicing, and Iterating

Indexing, slicing, and iterating over __one-dimensional__ arrays are similar to the ones of a list.

In [62]:
arr1

array([ 5, 10, 15, 20])

In [63]:
arr1[1]

10

In [64]:
arr1[:3]

array([ 5, 10, 15])

In [65]:
arr1

array([ 5, 10, 15, 20])

In [66]:
for i in arr1:
    print(i**2)

25
100
225
400


In [69]:
# reversed array:
arr1[::-1]

array([20, 15, 10,  5])

In [102]:
arr_text = np.array(['pencil', 'marker', 'pen', 'sharpener'])

In [100]:
arr_text.dtype

dtype('<U9')

Dtype `<U9` means that the `arr_text` array can hold strings of maximum length 9.

In [101]:
arr_text[2] = 'paper made from recycled paper'
arr_text
# notice how the new element was shortened to fit length 9:

array(['pencil', 'marker', 'paper mad', 'sharpener'], dtype='<U9')

__Multidimensional__ arrays have index per each axes.

In [74]:
m = np.arange(60).reshape(6, 10)
m

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

In [76]:
m[4, 9]

49

In [78]:
m

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

In [77]:
m[0:3, 7]

array([ 7, 17, 27])

In [118]:
rows = np.arange(1, 11)
columns = rows[:,np.newaxis]
new_array = rows * columns
new_array

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100]])

## PRACTICE IN YOUR NOTEBOOK

a. Create a multiplication table for the number 6.


b. Create the following arrays by slicing the `new_array`:

- [9, 18, 27]
- [30, 60, 90]

# References

- Basic Numpy. https://numpy.org/devdocs/user/quickstart.html
- Numpy Routines. https://numpy.org/doc/stable/reference/routines.html