# Lecture 8

# Section 3: Arrays with NumPy

Hint: All the examples and explanations from this part of today's lecture can be found in chapter 7 of the book.

### **NumPy** (**Numerical Python**) Library
* First appeared in 2006 and is the **preferred Python array implementation**.
* High-performance, richly functional **_n_-dimensional array** type called **`ndarray`**. 
* **Written in C** and **up to 100 times faster than lists**.
* Critical in big-data processing, AI applications and much more. 
* According to `libraries.io`, **over 450 Python libraries depend on NumPy**. 
* Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy. 

# 3.1 Creating `array`s from Existing Data 
* Creating an array with the **`array`** function 
* Argument is an `array` or other iterable
* Returns a new `array` containing the argument’s elements

In [2]:
import numpy as np

In [3]:
numbers = np.array([2, 3, 5, 7, 11])

In [4]:
type(numbers)

numpy.ndarray

In [5]:
numbers.ndim

1

### Multidimensional Arguments

In [6]:
np.array([[1, 2, 3], [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

In [7]:
np.array([[1, 2, 3], [4, 5, 6]]).ndim

2

In [8]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]]).shape
print("This above is {} matrix".format(a))

This above is (3, 3) matrix


# 3.2 `array` Attributes 
* **attributes**  enable you to discover information about its structure and contents

In [9]:
import numpy as np

In [10]:
integers = np.array([[1, 2, 3], [4, 5, 6]])

In [11]:
integers

array([[1, 2, 3],
       [4, 5, 6]])

In [12]:
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])

In [13]:
floats

array([0. , 0.1, 0.2, 0.3, 0.4])

* NumPy does not display trailing 0s

### Determining an `array`’s Element Type

In [14]:
integers.dtype

dtype('int64')

In [15]:
floats.dtype

dtype('float64')

* For performance reasons, NumPy is written in the C programming language and uses C’s data types
* [Other NumPy types](https://docs.scipy.org/doc/numpy/user/basics.types.html)

### Determining an `array`’s Dimensions
* **`ndim`** contains an `array`’s number of dimensions 
* **`shape`** contains a _tuple_ specifying an `array`’s dimensions

In [16]:
integers.ndim

2

In [17]:
floats.ndim

1

In [18]:
integers.shape

(2, 3)

In [19]:
floats.shape

(5,)

### Determining an `array`’s Number of Elements and Element Size
* view an `array`’s total number of elements with **`size`** 
* view number of bytes required to store each element with **`itemsize`**

In [20]:
integers.size

6

In [21]:
integers.itemsize

8

In [22]:
floats.size

5

In [23]:
floats.itemsize

8

### Iterating through a Multidimensional `array`’s Elements


In [24]:
for row in integers:
    for column in row:
        print(column, end='  ')
    print() 

1  2  3  
4  5  6  


* Iterate through a multidimensional `array` as if it were one-dimensional by using **`flat`**

In [25]:
for i in integers.flat:
    print(i, end='  ')

1  2  3  4  5  6  

# 3.3 Filling `array`s with Specific Values
* Functions **`zeros`**, **`ones`** and **`full`** create `array`s containing  `0`s, `1`s or a specified value, respectively

In [26]:
import numpy as np

In [27]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

* For a tuple of integers, these functions return a multidimensional `array` with the specified dimensions

In [28]:
np.ones((2, 4), dtype=int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [29]:
np.full((3, 5), 13)

array([[13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13]])

# 3.4 Creating `array`s from Ranges 
* NumPy provides optimized functions for creating `array`s from ranges

### Creating Integer Ranges with `arange`

In [30]:
import numpy as np

In [31]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [32]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [33]:
np.arange(10, 1, -2)

array([10,  8,  6,  4,  2])

### Creating Floating-Point Ranges with `linspace` 
* Produce evenly spaced floating-point ranges with NumPy’s **`linspace`** function
* Ending value **is included** in the `array`

In [34]:
np.linspace(0.0, 1.0, num=5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### Reshaping an `array` 
* `array` method **`reshape`** transforms an array into different number of dimensions
* New shape must have the **same** number of elements as the original

In [35]:
np.arange(1, 21).reshape(4, 5)

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

### Displaying Large `array`s 
* When displaying an `array`, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output

In [36]:
np.arange(1, 100001).reshape(4, 25000)

array([[     1,      2,      3, ...,  24998,  24999,  25000],
       [ 25001,  25002,  25003, ...,  49998,  49999,  50000],
       [ 50001,  50002,  50003, ...,  74998,  74999,  75000],
       [ 75001,  75002,  75003, ...,  99998,  99999, 100000]])

In [37]:
np.arange(1, 100001).reshape(100, 1000)

array([[     1,      2,      3, ...,    998,    999,   1000],
       [  1001,   1002,   1003, ...,   1998,   1999,   2000],
       [  2001,   2002,   2003, ...,   2998,   2999,   3000],
       ...,
       [ 97001,  97002,  97003, ...,  97998,  97999,  98000],
       [ 98001,  98002,  98003, ...,  98998,  98999,  99000],
       [ 99001,  99002,  99003, ...,  99998,  99999, 100000]])

# 3.5 List vs. `array` Performance: Introducing `%timeit` 
* Most `array` operations execute **significantly** faster than corresponding list operations
* IPython **`%timeit` magic** command times the **average** duration of operations

### Timing the Creation of a List Containing Results of 6,000,000 Die Rolls 

In [38]:
import random

In [39]:
%timeit rolls_list = \
   [random.randrange(1, 7) for i in range(0, 6_000_000)]

3.85 s ± 190 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


* By default, `%timeit` executes a statement in a loop, and it runs the loop _seven_ times
* If you do not indicate the number of loops, `%timeit` chooses an appropriate value
* After executing the statement, `%timeit` displays the statement’s _average_ execution time, as well as the standard deviation of all the executions

### Timing the Creation of an `array` Containing Results of 6,000,000 Die Rolls  

In [40]:
import numpy as np

In [41]:
%timeit rolls_array = np.random.randint(1, 7, 6_000_000)

67.8 ms ± 1.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


### 60,000,000 and 600,000,000 Die Rolls  

In [1]:
%timeit rolls_array = np.random.randint(1, 7, 60_000_000)

NameError: name 'np' is not defined

In [53]:
%timeit rolls_array = np.random.randint(1, 7, 600_000_000)

8.28 s ± 95.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# 3.6 `array` Operators
* `array` operators perform operations on **entire `array`s**. 
* Can perform arithmetic **between `array`s and scalar numeric values**, and **between `array`s of the same shape**.

In [54]:
import numpy as np

In [None]:
numbers = np.arange(1, 6)

In [None]:
numbers

In [None]:
numbers * 2

In [None]:
numbers ** 3

In [None]:
numbers  # numbers is unchanged by the arithmetic operators

In [None]:
numbers += 10

In [None]:
numbers

### Broadcasting 
* Arithmetic operations require as operands two `array`s of the **same size and shape**. 
* **`numbers * 2`** is equivalent to **`numbers * [2, 2, 2, 2, 2]`** for a 5-element array.
* Applying the operation to every element is called **broadcasting**. 
* Also can be applied between `array`s of different sizes and shapes, enabling some concise and powerful manipulations.

### Arithmetic Operations Between `array`s 
* Can perform arithmetic operations and augmented assignments between `array`s of the _same_ shape

In [None]:
numbers2 = np.linspace(1.1, 5.5, 5)

In [None]:
numbers2

In [None]:
numbers * numbers2

# 3.7 NumPy Calculation Methods
* These methods **ignore the `array`’s shape** and **use all the elements in the calculations**. 
* Consider an `array` representing four students’ grades on three exams:

In [None]:
import numpy as np

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

In [None]:
grades

* Can use methods to calculate **`sum`**, **`min`**, **`max`**, **`mean`**, **`std`** (standard deviation) and **`var`** (variance)
* Each is a functional-style programming **reduction**

In [None]:
grades.sum()

In [None]:
grades.min()

In [None]:
grades.max()

In [None]:
grades.mean()

In [None]:
grades.std()

In [None]:
grades.var()

### Calculations by Row or Column

* You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
* Each 2D+ array has [**one axis per dimension**](https://docs.scipy.org/doc/numpy-1.16.0/glossary.html)
* In a 2D array, **`axis=0`** indicates calculations should be **column-by-column**

In [None]:
grades.mean(axis=0)

*  In a 2D array, **`axis=1`** indicates calculations should be **row-by-row**

In [None]:
grades.mean(axis=1)

* [Other Numpy `array` Calculation Methods](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  