# Lecture 8

# Section 2: Arrays with NumPy

Hint: All the examples and explanations from this part of today's lecture can be found in chapter 7 of the book.

### **NumPy** (**Numerical Python**) Library
* First appeared in 2006 and is the **preferred Python array implementation**.
* High-performance, richly functional **_n_-dimensional array** type called **`ndarray`**. 
* **Written in C** and **up to 100 times faster than lists**.
* Critical in big-data processing, AI applications and many more.  
* More or less every popular data science and machine learning library such as _Pandas_, _Scikit-learn_ (Machine Learning), _Matplotlib_ and _Seaborn_ (Data Visualization), _SciPy_ (Scientific Python) and _Keras_ (for deep learning) are built on or **depend on NumPy**. 

# 2.1 Creating `array`s from Existing Data 
* Creating an array with the **`array`** function 
* Argument is an `array` or other iterable
* Returns a new `array` containing the argument’s elements

In [2]:
print("Hello World!")


Hello World!


In [2]:
numbers = np.array([2, 3, 5, 7, 11])

In [3]:
type(numbers)

numpy.ndarray

In [4]:
numbers

array([ 2,  3,  5,  7, 11])

### Multidimensional Arguments

In [5]:
np.array([[1, 2, 3], [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

# 2.2 `array` Attributes 
* **attributes**  enable you to discover information about its structure and contents

An array of integers:

In [6]:
integers = np.array([[1, 2, 3], [4, 5, 6]])

In [7]:
integers

array([[1, 2, 3],
       [4, 5, 6]])

An array of floats:

In [8]:
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])

In [9]:
floats

array([0. , 0.1, 0.2, 0.3, 0.4])

NumPy does not display _trailing_ 0s

### Determining an `array`’s Element Type

In [10]:
integers.dtype #C datatype!

dtype('int64')

In [11]:
floats.dtype

dtype('float64')

* For performance reasons, NumPy is written in the C programming language and uses C’s data types
* [Other NumPy types](https://docs.scipy.org/doc/numpy/user/basics.types.html)

### Determining an `array`’s Dimensions
* **`ndim`** contains an `array`’s number of dimensions 
* **`shape`** contains a _tuple_ specifying an `array`’s dimensions

In [12]:
integers.ndim

2

In [13]:
floats.ndim

1

In [14]:
integers.shape

(2, 3)

In [15]:
floats.shape

(5,)

### Iterating through a Multidimensional `array`’s Elements


In [16]:
for row in integers:
    for column in row:
        print(column, end='  ')
    print() 

1  2  3  
4  5  6  


* Iterate through a multidimensional `array` as if it was one-dimensional by using **`flat`**

In [17]:
for i in integers.flat:
    print(i, end='  ')

1  2  3  4  5  6  

# 2.3 Creating `array`s from Ranges 
* NumPy provides optimized functions for creating `array`s from ranges

### Creating Integer Ranges with `arange`

In [18]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [19]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [20]:
np.arange(10, 1, -2)

array([10,  8,  6,  4,  2])

### Creating Floating-Point Ranges with `linspace` 
* Produce evenly spaced floating-point ranges with NumPy’s **`linspace`** function
* Ending value **is included** in the `array`

In [21]:
np.linspace(1.0, 6.0, num=21)

array([1.  , 1.25, 1.5 , 1.75, 2.  , 2.25, 2.5 , 2.75, 3.  , 3.25, 3.5 ,
       3.75, 4.  , 4.25, 4.5 , 4.75, 5.  , 5.25, 5.5 , 5.75, 6.  ])

# 2.4 List vs. `array` Performance: Introducing `%timeit` 
* Most `array` operations execute **significantly** faster than corresponding list operations
* IPython **`%timeit` magic** command times the **average** duration of operations

### Timing the Creation of a List Containing Results of 6,000,000 Die Rolls 

In [22]:
import random

In [25]:
%timeit rolls_list = [random.randrange(1, 7) for i in range(0, 6_000_000)]

16.8 s ± 391 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


* By default, `%timeit` executes a statement in a loop, and it runs the loop _seven_ times
* If you do not indicate the number of loops, `%timeit` chooses an appropriate value
* After executing the statement, `%timeit` displays the statement’s _average_ execution time, as well as the standard deviation of all the executions

### Timing the Creation of an `array` Containing Results of 6,000,000 Die Rolls  

In [23]:
import numpy as np

In [24]:
%timeit rolls_array = np.random.randint(1, 7, 6_000_000)

326 ms ± 34.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### 60,000,000 and 600,000,000 Die Rolls  

In [None]:
%timeit rolls_array = np.random.randint(1, 7, 60_000_000) 

In [None]:
%timeit rolls_array = np.random.randint(1, 7, 600_000_000)

# 2.5 NumPy Calculation Methods
* These methods **ignore the `array`’s shape** and **use all the elements in the calculations**. 
* Consider an `array` representing four students’ grades on three exams:

In [26]:
import numpy as np

In [27]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

In [28]:
grades

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

* Can use methods to calculate **`sum`**, **`min`**, **`max`**, **`mean`**, **`std`** (standard deviation) and **`var`** (variance)
* Each is a functional-style programming **reduction**

In [29]:
grades.sum()

np.int64(1054)

In [30]:
grades.min()

np.int64(70)

In [31]:
grades.max()

np.int64(100)

In [32]:
grades.mean()

np.float64(87.83333333333333)

In [33]:
grades.std()

np.float64(8.792357792739987)

In [34]:
grades.var()

np.float64(77.30555555555556)

### Calculations by Row or Column

* You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
* Each 2D+ array has [**one axis per dimension**](https://docs.scipy.org/doc/numpy-1.16.0/glossary.html)
* In a 2D array, **`axis=0`** indicates calculations should be **column-by-column**

In [35]:
grades.mean(axis=0)

array([95.25, 85.25, 83.  ])

*  In a 2D array, **`axis=1`** indicates calculations should be **row-by-row**

In [36]:
grades.mean(axis=1)

array([84.33333333, 92.33333333, 87.        , 87.66666667])

* [Other Numpy `array` Calculation Methods](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)

------
&copy;1992&ndash;2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book [**Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud**](https://amzn.to/2VvdnxE).

DISCLAIMER: The authors and publisher of this book have used their 
best efforts in preparing the book. These efforts include the 
development, research, and testing of the theories and programs 
to determine their effectiveness. The authors and publisher make 
no warranty of any kind, expressed or implied, with regard to these 
programs or to the documentation contained in these books. The authors 
and publisher shall not be liable in any event for incidental or 
consequential damages in connection with, or arising out of, the 
furnishing, performance, or use of these programs.                  