# 7. Array-Oriented Programming with NumPy

# 7.1 Introduction

### **NumPy** (**Numerical Python**) Library
* First appeared in 2006 and is the **preferred Python array implementation**.
* High-performance, richly functional **_n_-dimensional array** type called **`ndarray`**. 
* **Written in C** and **up to 100 times faster than lists**.
* Critical in big-data processing, AI applications and much more. 
* According to `libraries.io`, **over 450 Python libraries depend on NumPy**. 
* Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy. 

### Array-Oriented Programming
* **Functional-style programming** with **internal iteration** makes array-oriented manipulations concise and straightforward, and reduces the possibility of error.

# 7.2 Creating `array`s from Existing Data 
* Creating an array with the **`array`** function 
* Argument is an `array` or other iterable
* Returns a new `array` containing the argument’s elements

In [1]:
import numpy as np

In [2]:
numbers = np.array([2, 3, 5, 7, 11])

In [3]:
type(numbers)

numpy.ndarray

In [4]:
numbers

array([ 2,  3,  5,  7, 11])

In [5]:
numbers_list = [2, 3, 5, 7, 11]

In [6]:
type(numbers_list)

list

In [7]:
numbers_list

[2, 3, 5, 7, 11]

## Speed up

### Comparing list and np.array to sum values

In [8]:
import time

# Generate a large random list and ndarray
n = 1000000
lst = list(range(n))
arr = np.array(lst)

# Compute the sum of the list using a for loop
start_time = time.time()
lst_sum = sum(lst)
lst_time = time.time() - start_time


# Compute the sum of the array
start_time = time.time()
arr_sum = np.sum(arr)
arr_time = time.time() - start_time

# Print the results
print(f"List sum: {lst_sum}")
print(f"List time: {lst_time:.6f} seconds")
print(f"Array sum: {arr_sum}")
print(f"Array time: {arr_time:.6f} seconds")
print(f"NumPy ndarray is {lst_time/arr_time:.2f} times faster than list.")


List sum: 499999500000
List time: 0.047997 seconds
Array sum: 1783293664
Array time: 0.000000 seconds


ZeroDivisionError: float division by zero

### Using a for-loop is extremely slow

In [None]:
# Compute the sum of the list using a for loop
start_time = time.time()
lst_sum = 0
for i in range(n):
    lst_sum += lst[i]
lst_time = time.time() - start_time

# Print the results
print(f"List sum: {lst_sum}")
print(f"List time: {lst_time:.6f} seconds")
print(f"NumPy ndarray is {lst_time/arr_time:.2f} times faster than looping over a list.")

### Multidimensional Arguments

In [None]:
np.array([[1, 2, 3], [4, 5, 6]])

# 7.3 `array` Attributes 


In [None]:
integers = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
integers

In [None]:
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])

In [None]:
floats

* NumPy does not display trailing 0s

### Determining an `array`’s Element Type
* **attributes**  enable you to discover information about its structure and contents

In [None]:
integers.dtype

In [None]:
floats.dtype

* For performance reasons, NumPy is written in the C programming language and uses C’s data types
* [Other NumPy types](https://docs.scipy.org/doc/numpy/user/basics.types.html)

### Determining an `array`’s Dimensions
* **`ndim`** contains an `array`’s number of dimensions 
* **`shape`** contains a _tuple_ specifying an `array`’s dimensions

In [None]:
integers.ndim

In [None]:
floats.ndim

In [None]:
integers.shape

In [None]:
floats.shape 

In [None]:
integers.size

In [None]:
floats.size

In [None]:
# 1d matrix
floats_1d_matrix = np.array([[0.0, 0.1, 0.2, 0.3, 0.4]])
floats_1d_matrix.shape

### Iterating through a Multidimensional `array`’s Elements


In [None]:
for row in integers:
    for column in row:
        print(column, end='  ')
    print() 

* Iterate through a multidimensional `array` as if it were one-dimensional by using **`flat`**

In [None]:
for i in integers.flat:
    print(i, end='  ')

In [None]:
list(integers.flat)

# 7.4 Filling `array`s with Specific Values
* Functions **`zeros`**, **`ones`** and **`full`** create `array`s containing  `0`s, `1`s or a specified value, respectively

In [None]:
np.zeros(5)

* For a tuple of integers, these functions return a multidimensional `array` with the specified dimensions

In [None]:
np.ones((2, 4), dtype=int)

In [None]:
np.full((3, 5), 13)

### Creating Integer Ranges with `arange`

In [None]:
np.arange(5)

In [None]:
np.arange(5, 10)

In [None]:
np.arange(10, 1, -2)

### Creating Floating-Point Ranges with `linspace` 
* Produce evenly spaced floating-point ranges with NumPy’s **`linspace`** function
* Ending value **is included** in the `array`

In [None]:
np.linspace(0.0, 1.0, num=6)

### Reshaping an `array` 
* `array` method **`reshape`** transforms an array into different number of dimensions
* New shape must have the **same** number of elements as the original

In [None]:
np.arange(1, 21).reshape(4, 5)

### Displaying Large `array`s 
* When displaying an `array`, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output

In [None]:
np.arange(1, 100001).reshape(4, 25000)

In [None]:
np.arange(1, 100001).reshape(100, 1000)

In [None]:
test = np.array([[1, 2, 3], [4, 5, 6]])

In [None]:
test

In [None]:
test.reshape(3,2)

# 7.7 `array` Operators
* `array` operators perform operations on **entire `array`s**. 
* Can perform arithmetic **between `array`s and scalar numeric values**, and **between `array`s of the same shape**.

In [None]:
numbers = np.arange(1, 6)

In [None]:
numbers

In [None]:
numbers * 2

In [None]:
numbers ** 3

In [None]:
numbers  # numbers is unchanged by the arithmetic operators

In [None]:
numbers += 10

In [None]:
numbers

### Broadcasting 
* Arithmetic operations require as operands two `array`s of the **same size and shape**. 
* **`numbers * 2`** is equivalent to **`numbers * [2, 2, 2, 2, 2]`** for a 5-element array.
* Applying the operation to every element is called **broadcasting**. 
* Also can be applied between `array`s of different sizes and shapes, enabling some concise and powerful manipulations.

### Arithmetic Operations Between `array`s 
* Can perform arithmetic operations and augmented assignments between `array`s of the _same_ shape

In [None]:
numbers2 = np.linspace(1.1, 5.5, 5)

In [None]:
numbers2

In [None]:
numbers * numbers2

### Comparing `array`s
* Can compare `array`s with individual values and with other `array`s
* Comparisons performed **element-wise**
* Produce `array`s of Boolean values in which each element’s `True` or `False` value indicates the comparison result

In [None]:
numbers

In [None]:
numbers >= 13

In [None]:
numbers2

In [None]:
numbers2 < numbers

In [None]:
numbers == numbers2

In [None]:
numbers == numbers

# 7.8 NumPy Calculation Methods
* These methods **ignore the `array`’s shape** and **use all the elements in the calculations**. 
* Consider an `array` representing four students’ grades on three exams:

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

In [None]:
grades

* Can use methods to calculate **`sum`**, **`min`**, **`max`**, **`mean`**, **`std`** (standard deviation) and **`var`** (variance)
* Each is a functional-style programming **reduction**

In [None]:
grades.sum()

In [None]:
grades.min()

In [None]:
grades.max()

In [None]:
grades.mean()

In [None]:
grades.std()

In [None]:
grades.var()

### Calculations by Row or Column

* You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
* Each 2D+ array has [**one axis per dimension**](https://docs.scipy.org/doc/numpy-1.16.0/glossary.html)
* In a 2D array, **`axis=0`** indicates calculations are along rows, so should be **column-by-column**
* In a 2D array, **`axis=1`** indicates calculations are along columns, so should be **row-by-row**

In [None]:
grades.mean(axis=0)

*  In a 2D array, **`axis=1`** indicates calculations should be **row-by-row**

In [None]:
grades.mean(axis=1)

* [Other Numpy `array` Calculation Methods](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html)

# 7.9 Universal Functions
* Standalone [**universal functions** (**ufuncs**)](https://docs.scipy.org/doc/numpy/reference/ufuncs.html) perform **element-wise operations** using one or two `array` or array-like arguments (like lists)
* Each returns a **new `array`** containing the results
* Some ufuncs are called when you use `array` operators like `+` and `*`

* Create an `array` and calculate the square root of its values, using the **`sqrt` universal function**

In [None]:
numbers = np.array([1, 4, 9, 16, 25, 36])

In [None]:
np.sqrt(numbers)

* Add two `array`s with the same shape, using the **`add` universal function**
* Equivalent to:
```python
numbers + numbers2
```

In [None]:
numbers2 = np.arange(1, 7) * 10

In [None]:
numbers2

In [None]:
np.add(numbers, numbers2)

In [None]:
numbers + numbers2

### Broadcasting with Universal Functions
* Universal functions can use broadcasting, just like NumPy `array` operators

In [None]:
numbers2

In [None]:
np.multiply(numbers2, 5)

In [None]:
numbers2 * 5

In [None]:
numbers3 = numbers2.reshape(2, 3)

In [None]:
numbers3

In [None]:
numbers4 = np.array([2, 4, 6])
numbers4

In [None]:
np.multiply(numbers3, numbers4)

In [None]:
numbers3 * numbers4

* [Broadcasting rules documentation](https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)

### Other Universal Functions

| NumPy universal functions|
| ----------|
| **_Math_** — `add`, `subtract`, `multiply`, `divide`, `remainder`, `exp`, `log`, `sqrt`, `power`, and more.
| **_Trigonometry_** —`sin`, `cos`, `tan`, `hypot`, `arcsin`, `arccos`, `arctan`, and more.
| **_Bit manipulation_** —`bitwise_and`, `bitwise_or`, `bitwise_xor`, `invert`, `left_shift` and `right_shift`.
| **_Comparison_** —`greater`, `greater_equal`, `less`, `less_equal`, `equal`, `not_equal`, `logical_and`, `logical_or`, `logical_xor`, `logical_not`, `minimum`, `maximum`, and more.
| **_Floating point_** —`floor`, `ceil`, `isinf`, `isnan`, `fabs`, `trunc`, and more.

# 7.10 Indexing and Slicing 
* One-dimensional `array`s can be **indexed** and **sliced** like lists. 

### Indexing with Two-Dimensional `array`s
* To select an element in a two-dimensional `array`, specify a tuple containing the element’s row and column indices in square brackets

In [None]:
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

In [None]:
grades

In [None]:
grades[0, 1]  # row 0, column 1

### Selecting a Subset of a Two-Dimensional `array`’s Rows
* To select a single row, specify only one index in square brackets

In [None]:
grades[1]

* Select multiple sequential rows with slice notation

In [None]:
grades[0:2]

* Select multiple non-sequential rows with a list of row indices

In [None]:
grades[[1, 3]]

### Selecting a Subset of a Two-Dimensional `array`’s Columns
* The **column index** also can be a specific **index**, a **slice** or a **list** 

In [None]:
grades

In [None]:
grades[:, 0]

In [None]:
grades[:, 1:3]

In [None]:
grades[:, [0, 2]]