# NumPy Guide

- NumPy provides the numerical backend for nearly every scientific or technical library for Python. In fact, NumPy is the foundation library for scientific computing in Python since it provides data structures and high-performing functions that the basic Python standard library cannot provide. Therefore, knowledge of this library is essential in terms of numerical calculations since its correct use can greatly influence the performance of your computations. 

- NumPy provides the following additional features:
   - `Ndarray`: A multidimensional array much faster and more efficient
than those provided by the basic package of Python. The core of NumPy is implemented in C and provides efficient functions for manipulating and processing arrays.

  - `Element-wise computation`: A set of functions for performing this type of calculation with arrays and mathematical operations between arrays.
  
  - `Integration with other languages such as C, C++, and FORTRAN`: A
set of tools to integrate code developed with these programming
languages.

- At a first glance, NumPy arrays bear some resemblance to Python’s list data structure. But an important difference is that while Python lists are generic containers of objects:
  - NumPy arrays are homogenous and typed arrays of fixed size. 
  - Homogenous means that all elements in the array have the same data type. 
  - Fixed size means that an array cannot be resized (without creating a new array). 

## Importing the modules

In order to use the NumPy library, we need to import it in our program. By convention,
the numPy module imported under the alias np, like so:

In [None]:
import numpy as np

After this, we can access functions and classes in the numpy module using the np
namespace. Throughout this notebook, we assume that the NumPy module is imported in
this way.

## The NumPy Array Object

- The core of the NumPy Library is one main object: `ndarray` (which stands for N-dimensional array)
- This object is a multi-dimensional homogeneous array with a predetermined number of items
- In addition to the data stored in the array, this data structure also contains important metadata about the array, such as its shape, size, data type, and other attributes. 


**Basic Attributes of the ndarray Class**

| Attribute | Description                                                                                              |
|-----------|----------------------------------------------------------------------------------------------------------|
| shape     | A tuple that contains the number of elements (i.e., the length)  for each dimension (axis) of the array. |
| size      | The total number elements in the array.                                                                  |
| ndim      | Number of dimensions (axes).                                                                             |
| nbytes    | Number of bytes used to store the data.                                                                  |
| dtype     | The data type of the elements in the array.                                                              |
| itemsize  | Defines teh size in bytes of each item in the array.                                                     |
| data      | A buffer containing the actual elements of the array.                                                    |

In [None]:
data = np.array([[10, 2], [5, 8], [1, 1]])
data

In [None]:
type(data)

In [None]:
data.ndim

In [None]:
data.shape

In [None]:
data.size

In [None]:
data.dtype

In [None]:
data.nbytes

In [None]:
data.itemsize

In [None]:
data.data

Here the ndarray instance data is created from a nested Python list using the
function `np.array`. More ways to create ndarray instances from data and from rules of
various kinds are introduced later in this tutorial. 

### Data types

- `dtype` attribute of the `ndarray` describes the data type of each element in the array.
- Since NumPy arrays are homogeneous, all elements have the same data type. 

**Basic Numerical Data Types Available in NumPy**

| dtype   | Variants                            | Description                           |
|---------|-------------------------------------|---------------------------------------|
| int     | int8, int16, int32, int64           | Integers                              |
| uint    | uint8, uint16, uint32, uint64       | Unsigned (non-negative) integers      |
| bool    | Bool                                | Boolean (True or False)               |
| float   | float16, float32, float64, float128 | Floating-point numbers                |
| complex | complex64, complex128, complex256   | Complex-valued floating-point numbers |

Once a NumPy array is created, its `dtype` cannot be changed, other than by creating a new copy with type-casted array values

In [None]:
data = np.array([5, 9, 87], dtype=np.float32)
data

In [None]:
data = np.array(data, dtype=np.int32) # use np.array function for type-casting
data

In [None]:
data = np.array([5, 9, 87], dtype=np.float32)
data

In [None]:
data = data.astype(np.int32) # Use astype method of the ndarray class for type-casting
data

**Data Type Promotion**

When working with NumPy arrays, the data type might get promoted from one type to another, if required by the operation. 
For instance, adding float-value and integer-valued arrays, the resulting array is a float-valued array:

In [None]:
arr1 = np.array([0, 2, 3], dtype=float)
arr1

In [None]:
arr2 = np.array([10, 20, 30], dtype=int)
arr2

In [None]:
res = arr1 + arr2
res

In [None]:
res.dtype

<div class="alert alert-info">

**Note:** 
    
In some cases, depending on the application and its requirements, it is essential to create arrays with data type appropriately set to right data type. The default data type is `float`:
</div>


    


In [None]:
np.sqrt(np.array([0, -1, 2]))

In [None]:
np.sqrt(np.array([0, -1, 2], dtype=complex))

Here, using the `np.sqrt` function to compute the square root of each element in
an array gives different results depending on the data type of the array. Only when the data type of the array is complex is the square root of `–1` resulting in the imaginary unit (denoted as `1j` in Python).

### Memory layout of multi-dimensional arrays

Multidimensional arrays are stored as contiguous data in memory. There's freedom of choice in how to arrange the array elements in this memory segment. Consider the case of a two-dimensional array, containing rows and columns:

- One possible way to store this array as a consecutive sequence of values is to store the rows after each other, and another equally valid approach is to store the columns one after another. 

- The former is called **row-major** format and the latter is **column-major** format. 


 Memory Layout                          | Format
:---------------------------------------:|:--------------:
![](../assets/images/row-major.jpg)    | row-major
![](../assets/images/column-major.jpg)    | column-major

- Whether to use row-major or column-major is a matter of conventions, and row-major format is used, for example, in the C programming language, whereas Fortran uses the column-major format. 

- A NumPy array can be specified to be stored in row-major format, using the keyword argument `order='C'`, and the column-major format, using the keyword argument `order='F'`, when the array is created or reshaped. 

- The default format is row-major. 

- The `'C'` or `'F'` ordering of NumPy array is particularly relevant when NumPy arrays are used in interfaces with software written in C and Fortran, which is often required when working with numerical computing with Python. 

- Row-major and column-major ordering are special cases of strategies for mapping
the index used to address an element, to the offset for the element in the array’s memory segment. 

- In general, the NumPy array attribute `ndarray.strides` defines exactly how this mapping is done. 

- The strides attribute is a tuple of the same length as the number of axes (dimensions) of the array. Each value in strides is the factor by which the index for the corresponding axis is multiplied when calculating the memory offset (in bytes) for a given index expression.


Let's see how this looks:

In [None]:
arrc = np.array([[1, 2, 3], [11, 12, 13], [21, 22, 23]], dtype='uint8', order='C')
arrf = np.array([[1, 2, 3], [11, 12, 13], [21, 22, 23]], dtype='uint8', order='F')

In [None]:
arrc

In [None]:
arrc.itemsize # Each item uses 1 byte because the data type is uint8

The strides attribute of this array is therefore `(1x3, 1x1) = (3, 1)`, because each increment of `m=3` in `A[n, m]` increases the memory offset with one item or 1 byte. Likewise, each increment of n increases the memory offset with three items or 3 bytes (because the second dimension of the array has lenght 3)

In [None]:
arrc.strides

In [None]:
'  '.join(str(x) for x in np.nditer(arrc))

In `"C"` order, elements of rows are contiguous, as expected. Let's try Fortran layout now:

In [None]:
arrf

In [None]:
arrf.strides

In [None]:
'  '.join(str(x) for x in np.nditer(arrf))

- Using strides to describe the mapping of array index to array memory offset is clever because it can be used to describe different mapping strategies, and many common operations on arrays, such as for example the transpose, can be implemented by simply changing the strides attribute, which can eliminate the need for moving data around in the memory. 

- Operations that only require changing the strides attribute result in new ndarray objects that refer to the same data as the original array. Such arrays are called views. 

- For efficiency, NumPy strives to create views rather than copies when applying operations on arrays. This is generally a good thing, but it is important to be aware of that some array operations result in views rather than new independent arrays, because modifying their data also modifies the data of the original array.

## Creating Arrays

In the previous section, we looked at NumPy’s basic data structure for representing arrays, the ndarray class, and we looked at the basic attributes of this class. In this section we focus on functions from the NumPy library that can be used to create ndarray instances.

NumPy provides a set of functions  generate ndarrays depending on their properties and the applications they are used for. Throughout this tutorial you'll discover that these features will be useful. 

### Arrays Created from Lists and Other Array-like Objects

- Using the `np.array()` function, NumPy arrays can be constructed from explicit Python lists, iterable expressions and other array-like objects (such as other `ndarray` instances).

In [None]:
data = np.array([1, 3, 9]) # Create a one-dimensional array from a Python list
data

In [None]:
data.ndim

In [None]:
data.shape

In [None]:
data = np.array([[1, 2], [5, 7]]) # Create 2D array using nested Python lists
data

In [None]:
data.ndim

In [None]:
data.shape

### Arrays Filled with Constant Values

- The functions `np.zeros()` and `np.ones()` create and return arrays filled with zeros and ones respectively. 
- They take, as first argument, an integer or a tuple that describes the number of elements along each dimension of the array


In [None]:
np.zeros((2, 3))

In [None]:
np.ones(5)

- Like other array-generating functions, these functions also accept an optional keyword argument that specifies the data type for the elements in the array.
- By default, the data type is `float64`:

In [None]:
data = np.ones(10)
data

In [None]:
data.dtype

In [None]:
data = np.ones(5, dtype=complex)
data

In [None]:
data.dtype

- Arrays filled with an arbitrary constant value can be generated by first creating an array filled with ones and then multiplying the array with the desired fill value. 
- However, NumPy also provides the function `np.full()` that does exactly this in one step:

In [None]:
x = 10.5 * np.ones(5)
x

In [None]:
y = np.full(shape=5, fill_value=10.5)
y

- An already created array can also be filled with constant values using the `np.fill()` function, which takes an array and a value as arguments, and set all elements in the array to the given value:

In [None]:
x1 = np.empty(3) # generates an array with uninitialized values, of the given size
x1

In [None]:
x1.fill(3.0)

In [None]:
x1

In [None]:
x2 = np.full(shape=5, fill_value=3.0)
x2

### Arrays with Incremental Sequences

In numerical computing it is very common to require arrays with evenly spaced values between a starting value and ending value. NumPy provides two similar functions to create such arrays: `np.arange()` and `np.linspace`

In [None]:
np.arange(start=0.0, stop=10, step=1)

In [None]:
np.linspace(start=0, stop=10, num=11)

<div class="alert alert-info">

**Note:** 
    
- `np.arange()` does not include the end value, while by default `np.linspace()` does (although this behavior can be changed using the optional `endpoint` keyword argument).
    
- Whether to use `np.arange()` or `np.linspace()` is mostly a matter of personal preference, but it is generally recommended to use `np.linspace()` whenever the increment is a noninteger. 
</div>


### Arrays Filled with Logarithmic Sequences

The function `np.logspace()` is similar to `np.linspace()`, but the increments between the elements in the array are logarithmically distributed, and the first two arguments, for the start and end values, are the powers of the optional `base` keyword argument (which defaults to 10):

In [None]:
np.logspace(start=0, stop=2, num=5) # Generate 5 points between 10**0= 1 and 10**2=100

### Meshgrid Arrays

- Multidimensional coordinate grids can be generated using the function `np.meshgrid()`.
- Given two one-dimensional coordinate arrays, we can generate two-dimensional coordinate arrays using the `np.meshgrid()` function:

In [None]:
# 1-D arrays representing the coordinates of a grid
x = np.array([-10, 0, 10])
y = np.array([0, 50, 270])

In [None]:
X, Y = np.meshgrid(x, y)
X

In [None]:
Y

- It is also possible to generate higher-dimensional coordinate arrays by passing
more arrays as argument to the `np.meshgrid()` function

### Creating Uninitialized Arrays

- To create an array of specific size and data type, but without initializing the elements of the array to any particular values, we can use the function `np.empty()`

- The advantage of this function, for example, instead of`np.zeros`, is that we can avoid the initiation step. If all elements are guaranteed to be initialized later in the code, this can save a little bit of time, especially when working with large arrays:

In [None]:
np.empty(shape=5, dtype=np.float64)

<div class="alert alert-info">

**Note:** 
    
- There's no guarantee that elements generated by `np.empty()` have any particular values. For this reason it is important that all values are explicitly assigned before the array is used; otherwise unpredictable errors are likely to arise. 
</div>


### Creating Arrays with Properties of Other Arrays

It is often necessary to create new arrays that share properties, such as shape and ata type with another array. NumPy provides a family of functions for this purpose:
    
- `np.ones_like()`
- `np.zeros_like()`
- `np.full_like()`
- `np.empty_like()`


A typical use-case is a function that takes arrays of unspecified type and size as arguments and requires working with arrays of the same size and type. 

In [None]:
data = np.array([[2, 3, 8], [6, 8, 10]], dtype=np.float32)
data

In [None]:
np.ones_like(data)

In [None]:
np.zeros_like(data)

In [None]:
np.full_like(data, fill_value=5)

### Creating Matrix Arrays

NumPy provides functions for generating commonly used matrixes. One of these functions is `np.identity()` which generates a square matrix with oens on the diagonal and zeros elsewhere:

In [None]:
np.identity(5)

The similar function `np.eye()` generates matrices with ones on a diagonal (optionally offset):


In [None]:
np.eye(N=4, M=4, k=1)

In [None]:
np.eye(N=4, M=4, k=-1)

To construct a matrix with an arbitrary one-dimensional array on the diagonal, we
can use the `np.diag()` function (which also takes the optional keyword argument k to specify an offset from the diagonal):

In [None]:
d = np.arange(0, 20, 5)
d

In [None]:
np.diag(v=d)

## Indexing and Slicing

- Elements and subarrays of NumPy arrays are accessed using the standard square bracket notation that is also used with Python lists. 
- Within the square bracket, a variety of different index formats are used for different types of element selection.
- In general, the expression within the bracket is a tuple, where each item in the tuple is a specification of which elements to select from each axis/dimensions of the array.


**Examples of Array Indexing and Slicing Expressions**


| Expression           | Description                                                                                                                                                                |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `a[m]`               | Select element at index `m`, where m is an integer (start counting form 0).                                                                                                |
| `a[-m]`              | Select the `nth` element from the end of the list, where `n` is an integer. The  last element in the list is addressed as -1, the second to last element as -2, and so on. |
| `a[m:n]`             | Select elements with index starting at `m` and ending at `n-1` (`m` and `n` are integers).                                                                                 |
| `a[:]` or `a[0:-1]`  | Select all elements in the given axis.                                                                                                                                     |
| `a[:n]`              | Select elements starting with index 0 and going up to index `n-1` (integer)                                                                                                |
| `a[m:]` or `a[m:-1]` | Select elements starting with index `m` (integer) and going up to the last element in the array.                                                                           |
| `a[m:n:p]`           | Select elements with index `m` through `n` (exclusive), with increment `p`.                                                                                                |
| `a[::-1]`            | Select all the elements, in reverse order.                                                                                                                                 |

### One-Dimensional Arrays

Along a single axis, integers are used to select single elements, and so-called slices are used to select ranges and sequences of elements. Positive integers are used to index elements from the beginning of the array (index starts at 0), and negative integers are used to index elements from the end of the array, where the last element is indexed with –1, the second to last element with –2, and so on.

In [None]:
data = np.arange(0, 8)
data

In [None]:
data[0] # First element

In [None]:
data[-1] # last element

In [None]:
data[4] # fifth element, at index 4

In [None]:
data[1:-1] # second-to-last

In [None]:
data[1:-1:2] # second-to-last, selecting every second element

In [None]:
data[:5] # select first five

In [None]:
data[-5:] # last five element

In [None]:
data[::-2] # reverse the array and select only every second value

### Multidimensional Arrays

With multidimensional arrays, elements selections like those introduced in the previous section can be applied on each axis/dimension. The result is a reduced array where each element matches the given selection rules

In [None]:
f = lambda m, n: n + 10 * m

In [None]:
data = np.fromfunction(function=f, shape=(6, 6), dtype=np.int32)
data

In [None]:
data[:, 1] # second column

In [None]:
data[1, :] # second row

- By applying a slice on each of the array axes, we xan extract subarrays:

In [None]:
data[:3, :3] # Upper half diagonal block matrix

In [None]:
data[3:, :3] # lower left off-diagonal block matrix

- With element spacing other that 1, subarrays made up from nonconsecutive elements can be extracted:

In [None]:
data[::2, ::2] # every second element starting from 0, 0

In [None]:
data[1::2, 1::3] # every second and third element starting from 1, 1

This ability to extract subsets of data from a multidimensional array is a simple but very powerful feature.

### Copies and Views of Objects

- Subarrays that are extracted from arrays using slice operations are alternative **views** of the same underlying array data. This means that they are arrays that refer to the same data in memory as the original array, but with a different `strides` configuration. 
- When elements in a view are assigned new values, the values of the original array are therefore also updated:

In [None]:
data = np.fromfunction(function=f, shape=(6, 6), dtype=np.int32)
data

In [None]:
x = data[1:5, 1:5]
x

In [None]:
x[:, :] = 100
data

- Here, assigning new values to the elements in an array B, which is created from the array A, also modifies the values in A (since both arrays refer to the same data in the memory). 
- The fact that extracting subarrays results in views rather than new independent arrays eliminates the need for copying data and improves performance. 
- When a copy rather than a view is needed, the view can be copied explicitly by using the copy method of the ndarray instance.

In [None]:
y = x[1:3, 1:3].copy()
y

In [None]:
y[:, :] = 1 # does not affect x since y is a copy of the view x[1:3, 1:3]
y

In [None]:
x

### Fancy Indexing and Boolean-Valued Index

NumPy provides another convenient method to index arrays, called **fancy indexing**. 

- With fancy indexing, an array can be indexed with another NumPy array, a Python list, or a sequence of integers, whose values select elements in the indexed array.
- Fancy indexing requires that the elements in the array or list used for indexing are integers. 

In [None]:
data = np.linspace(0, 1, 11)
data

In [None]:
data[np.array([0, 2, 4])]

In [None]:
data[[0, 2, 4]]

- Another variant of indexing NumPy arrays is to use Boolean-valued index arrays. In this case, each element indicates whether or not to select the element from the list with the corresponding index. This index method is handy when filtering out elements from an array

In [None]:
data > 0.6

In [None]:
data[data > 0.6]

Unlike arrays created by using slices, the arrays returned using fancy indexing and Boolean-valued indexing **are not views but rather new independent arrays**. Nonetheless, it is possible to assign values to elements selected using fancy indexing:

In [None]:
data = np.arange(10)
data

In [None]:
indices = [3, 5, 7]

In [None]:
x = data[indices]
x

In [None]:
x[0] = -1 # this does not affect data
data

In [None]:
data[indices] = -1 # this affects data
data

In [None]:
data = np.arange(10)
x = data[data > 5]
x

In [None]:
x[0] = -1 # this does not affect data
data

In [None]:
data[data > 5] = -1 # this alters data
data

## Reshaping and Resizing

When working with data in array form, it is often useful to rearrange arrays and alter teh way they are interpreted. For example, an `NxN` matrix array could be rearranged into a vector of length $N^2$, or a set of one-dimensional arrays could be concatenated together or stacked next to each other to form a matrix. 


**Summary of NumPy Functions for Manipulating the Dimensions and the Shape of Arrays**



| Function/Method                                           | Description                                                                                                                                                       |
|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `np.reshape`, `np.ndarray.reshape`                        | Reshape an N-dimensional array. The total number of elements must remain the same.                                                                                |
| `np.ndarray.flatten`                                      | Creates a copy of an N-dimensional array, and reinterpret it as a one-dimensional array (i.e., all  dimensions are collapsed into one).                           |
| `np.ravel`, `np.ndarray.ravel`                            | Create a view (if possible, otherwise a copy) of an N-dimensional array in  which it is interpreted as a one-dimensional array.                                   |
| `np.squeeze`                                              | Removes axes with lenght 1.                                                                                                                                       |
| `np.expand_dims`, `np.newaxis`                            | Add a new axis/dimension of length 1 to an array, where `np.newaxis` is used with array indexing.                                                                 |
| `np.transpose` or  `np.ndarray.transpose`, `np.ndarray.T` | Transpose the array. The transpose operation corresponds to reversing (or more generally, permuting) the axes of the array.                                       |
| `np.hstack`                                               | Stacks a list of arrays horizontally (along axis 1): for example, given a list of column vectors, appends the columns to form a matrix.                           |
| `np.vstack`                                               | Stacks a list of arrays vertically (along axis 0): for example, given a  list of row vectors, appends the rows to form a matrix.                                  |
| `np.dstack`                                               | Stacks arrays depth-wise (along axis 2).                                                                                                                          |
| `np.concatenate`                                          | Creates a new array by appending arrays after each other, along a  given axis.                                                                                    |
| `np.resize`                                               | Resizes an array. Creates a new copy of the original array, with the  requested size. If necessary, the original array will be repeated to fill up the new array. |
| `np.append`                                               | Appends an element to an array. Creates a new copy of the array.                                                                                                  |
| `np.insert`                                               | Inserts a new element at a given position. Creates a new copy of the array.                                                                                       |
| `np.delete`                                               | Deletes an element at a given position. Creates a new copy of the array.                                                                                          |

- Reshaping an array does not require modifying the underlying array data; it only changes in how the data is interpreted, by redefining the array's **strides** attribute.

In [None]:
data = np.array([[10, 3], [5, 8]])
data

In [None]:
data.strides

In [None]:
data.data

In [None]:
x = np.reshape(a=data, newshape=(1, 4))
x

In [None]:
data.reshape(4)

In [None]:
x.strides

- Note that reshaping an array produces a view of the array, and if an independent copy of the array is needed, the view has to be copied explicitly (e.g., using `np.copy`).
67

In [None]:
x[0, 1] = -100
x

In [None]:
data

- The `np.ravel()` is a special case of reshape, which collapses all dimensions of an array and returns a flattened one-dimensional array with a length that corresponds to the total number of elements in the original array. 

In [None]:
data

In [None]:
data.flatten()

In [None]:
data.flatten().shape

- While `np.ravel()` and `np.flatten()` collapse the axes of an array into a one-dimensional array, it is also possible to introduce new axes into an array, either by using `np.reshape` or, when adding new empty axes, using indexing notation and the `np.newaxis` keyword at the place of a new axis. 

In [None]:
data = np.arange(8)
data

In [None]:
column = data[:, np.newaxis]
column

In [None]:
row = data[np.newaxis, :]
row

In [None]:
data.shape, column.shape, row.shape

- The function `np.expand_dims` can also be used to add new dimensions to an array, and in the preceding example, the expression `data[:, np.newaxis]` is equivalent to `np.expand_dims(data, axis=0)`. Here the `axis` argument specifies the location relative to the existing axes where the new axis is to be inserted. 

- In addition to reshaping and selecting subarrays, it is often necessary to merge arrays into bigger arrays, for example, when joining separately computed or measured data series into a higher-dimensional array, such as a matrix. For this task, NumPy provides the functions `np.vstack`, for vertical stacking of, for example, rows into a matrix, and `np.hstack` for horizontal stacking of, for example, columns into a matrix. The function `np.concatenate` provides similar functionality, but it takes a keyword argument `axis` that specifies the axis along which the arrays are to be concatenated.

In [None]:
data = np.arange(5)
data

In [None]:
np.vstack((data, data, data))

- If we instead want to stack the arrays horizontally, to obtain a matrix where the arrays are the column vectors, we might first attempt something similar using `np.hstack`:

In [None]:
np.hstack((data, data, data))

However, this doesn't stack the arrays horizontally, but not in the way intended here. To make `np.hstack()` treat the input arrays as columns and stack them accordingly, we need to make the input arrays two-dimensional arrays of shape `(1, 5)` rather than one-dimensional arrays of shape `(5,)` by inserting a new axis by indexing with `np.newaxis`

In [None]:
data

In [None]:
data = data[:, np.newaxis]
data

In [None]:
np.hstack((data, data, data))

The behavior of the functions for horizontal and vertical stacking, as well as
concatenating arrays using `np.concatenate`, is clearest when the stacked arrays have the same number of dimensions as the final array and when the input arrays are stacked along an axis for which they have length 1.

<div class="alert alert-info">

**Note:** 
    
- The number of elements in a NumPy array cannot be changed once the array has
been created. To insert, append, and remove elements from a NumPy array, for example, using the function `np.append`, `np.insert`, and `np.delete`, a new array must be created and the data copied to it. 
- It may sometimes be tempting to use these functions to grow or shrink the size of a NumPy array, but due to the overhead of creating new arrays and copying the data, it is usually a good idea to preallocate arrays with size such that they do not later need to be resized.
Vectorized
</div>


## Vectorized Expressions

- The purpose of storing numerical data in arrays is to be able to process the data with concise **vectorized** expressions that represent batch operations that are applied to all elements in the arrays. 
- Efficient use of vectorized expressions eliminates teh need of many explicit **`for`** loops. This results in less verbose code, better maintainability, and higher-performing code.
- NumPy implements functions and vectorized operations corresponding to most fundamental mathematical functions and operators. 
- Many of these functions and operations act on arrays on an elementwise basis, and binary operations require all arrays in an expression to be of compatible size. The meaning of compatible size is normally that the variables in an expression represent either scalars or arrays of the same size and shape. More generally, a binary operation involving two arrays is well defined if the arrays can be **broadcasted** into the same shape and size. 

### Broadcasting

**Broadcasting** allows an operator or a function to act on two or more arrays to operate even if these arrays do not have the same shape. That said, not all the dimensions can be subjected to broadcasting; they must meet certain rules.

Two arrays can be subjected to broadcasting when all their dimensions are compatible, i.e., the length of each dimension must be equal or one of them must be equal to 1. If neither of these conditions is met, you get an exception that states that the two arrays are not compatible.

In [None]:
x = np.arange(16).reshape(4, 4)
y = np.arange(4)
z = np.arange(4)[:, np.newaxis]
x

In [None]:
y

In [None]:
z

In [None]:
x.shape

In [None]:
y.shape

In [None]:
z.shape

To illustrate how broadcasting works, we will use two simple examples in which we will compute:

- `x + y`
- `x + z`

In this case, you obtain three arrays with shapes:
    
- x: `4 x 4`
- y: `4`
- z: `4 x 1`

**There are two rules of broadcasting:**
    
1) You must add a 1 to each missing dimension. If the compatibility rules are now satisfied, you can apply the broadcasting and move to the second rule. For example:

- x: `4 x 4`
- y: `1 x 4`
- z: `4 x 1`

The rule of compatibity is met. Then you can move to the second rule of broadcasting. 

2) The second rule explains how to extend the size of the smallest array so that it's the size of the biggest array, so that the element-wise function or operator is applicable. This rule asssumes that the missing elements (size, length 1) are filled with replicas of the values contained in extended sizes:

**Applying the second broadcasting rule**

![broadcasting](../assets/images/broadcasting.jpg)

The highlighted elements represent true elements of the arrays, while the light gray-shaded elements describe the broadcasting of the elements of the array of smaller size.


<div class="alert alert-info">

**Note:** 
    
The extra memory indicated by the gray-shaded boxes is never allocated, but it can be convenient to think about the operations as if it is.
</div>




Now that the arrays have the same dimensions, the values inside may be added together:

In [None]:
x + y

In [None]:
x + z

This is a simple case in which one of the two arrays is smaller than the other. There
may be more complex cases in which the two arrays have different shapes and each is smaller than the other only in certain dimensions:

In [None]:
m = np.arange(6).reshape(3, 1, 2)
n = np.arange(6).reshape(3, 2, 1)
m

In [None]:
n

Even in this case, by analyzing the shapes of the two arrays, you can see that they are
compatible and therefore the rules of broadcasting can be applied:
    
- m: `3 x 1 x 2`
- n: `3 x 2 x 1`

In this case, both arrays undergo the extension of dimensions (broadcasting):
    
```python
m* = [[[0,1],
       [0,1]], 
      [[2,3], 
       [2,3]], 
      [[4,5], 
       [4,5]]]

n* = [[[0,0], 
       [1,1]], 
      [[2,2], 
       [3,3]], 
      [[4,4], 
       [5,5]]]
```

Then you can apply, for example the addition operator between the two arrays, operating element-wise.

In [None]:
m + n

### Arithmetic Operations

The Standard arithmetic operations with NumPy arrays perform elementwise operations:

In [None]:
x = np.arange(4, dtype=np.int32).reshape(2, 2)
x

In [None]:
y = np.ones(shape=(2, 2), dtype=np.int32)
y

In [None]:
x + y

In [None]:
y - x

In [None]:
x * y

In [None]:
x / y

For operations between scalars and arrays, the scalar value is applied to each element in the array:

In [None]:
x * 2

In [None]:
2 ** x

In [None]:
x / 2

In [None]:
(x / 2).dtype


<div class="alert alert-info">

**Note:** 
    
The `dtype` of the resulting array for an expression can be promoted if the computation requires it.
</div>




If an arithmetic operation is performed on arrays with incompatible size or shape, a `ValueError` exception is raised:

In [None]:
x = np.arange(4).reshape(2, 2)
x

In [None]:
y = np.arange(3)
y

In [None]:
x / y

### Elementwise Functions

NumPy provides vectorized functions for elementwise evaluation of many elementary mathematical functions:


| Function                                              	| Description                                                                                 	|
|-------------------------------------------------------	|---------------------------------------------------------------------------------------------	|
| `np.cos`, `np.sin`, `np.tan`                          	| Trigonometric functions.                                                                    	|
| `np.arccos`, `np.arcsin`, `np.arctan`                 	| Inverse trigonometric functions.                                                            	|
| `np.cosh`, `np.sinh`, `np.tanh`                       	| Hyperbolic trigonometric functions.                                                         	|
| `np.arccosh`, `np.arcsinh`, `np.arctanh`              	| Inverse hyperbolic trigonometric functions.                                                 	|
| `np.sqrt`                                             	| Square root.                                                                                	|
| `np.exp`                                              	| Exponential.                                                                                	|
| `np.log`, `np.log2`, `np.log10`                       	| Logarithms of base `e`, `2`,  and `10` respectively.                                        	|
| `np.add`, `np.substract`,  `np.multiply`, `np.divide` 	| Addition, subtraction, multiplication, and division of two NumPy arrays.                    	|
| `np.power`                                            	| Raises first input argument to the power of the second input argument (applied elementwise) 	|
| `np.remainder`                                        	| The remainder of division.                                                                  	|
| `np.reciprocal`                                       	| The reciprocal (inverse) of each element.                                                   	|
| `np.real`, `np.imag`, `np.conj`                       	| The real part, imaginary, and the complex conjugate of  the elements in the input arrays.   	|
| `np.sign`, `np.abs`                                   	| The sign and the absolute value                                                             	|
| `np.floor`, `np.ceil`, `np.rint`                      	| Convert to integer values.                                                                  	|
| `np.round`                                            	| Rounds to a given number of decimals.                                                       	|


<div class="alert alert-info">

**Going Further:** 
    
For a complete list of the available elementwise functions in NumPy, see the [NumPy reference documentation](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

</div>



Each of these functions takes a single array as input and returns a new array of the same shape:

In [None]:
x = np.linspace(-1, 1, 10)
x

In [None]:
np.sin(x)

In [None]:
np.round(a=x, decimals=2)

- When it is necessary to define new elementwise functions that operate on NumPy arrrays, a good way to implement such functions is to use already existing NumPy operators and expressions. In cases when this is not possible, the `np.vectorize` fucntion can be convenient tool.
- `np.vectorize()` takes a function that works on a scalar input and returns a vectorized function:

In [None]:
import math
def trig_func(x, y):
    return ((math.sin(x) ** 2) + (math.cos(y) ** 2))

In [None]:
trig_func(1, 1.5)

Seems reasonable. However, the math library only works on scalars. If we try to pass in arrays, we'll get an error.

In [None]:
trig_func([1, 2], [1, 2])

In [None]:
trig_func(np.arange(5), np.arange(5))

Using `np.vectorize` the scalar `trig_func` function can be converted into a
vectorized function that works with NumPy arrays or any array-like objects as input:

In [None]:
trig_func = np.vectorize(trig_func)

In [None]:
trig_func([1, 2], [1, 2])

In [None]:
trig_func(np.arange(5), np.arange(5))

### Aggregate Functions

- NumPy provides another set of functions for calculating aggregates for NumPy arrays, which take an array as input and by default return a scalar as output.

| Function                 	| Description                                                            	|
|--------------------------	|------------------------------------------------------------------------	|
| `np.mean`                	| The average of all values in the array.                                	|
| `np.std`                 	| Standard deviation.                                                    	|
| `np.var`                 	| Variance                                                               	|
| `np.sum`                 	| Sum of all elements                                                    	|
| `np.prod`                	| Product of all elements.                                               	|
| `np.cumsum`              	| Cumulative sum of all elements.                                        	|
| `np.cumprod`             	| Cumulative product of all elements.                                    	|
| `np.min`, `np.max`       	| The minimum/maximum value in an array.                                 	|
| `np.argmin`, `np.argmax` 	| The index of the minimum/maximum value in an array.                    	|
| `np.all`                 	| Returns True if all elements in the argument array  are nonzero.       	|
| `np.any`                 	| Returns True if any of the elements in the argument array  is nonzero. 	|

- By default, these functions aggregate over the entire input array. 
- Using the `axis` keyword argument with these functions, and their corresponding `ndarray` methods, it is possible to control over which axis/dimension in the array aggregatiion is carried out.

The following example demonstrates how calling the aggregate `np.sum()` on the array of shape `(3, 3)` reduces the dimensionality of the array depending on the values of the axis argument:


![aggregate](../assets/images/agg.jpg)

In [None]:
x = np.arange(start=10, stop=100, step=10).reshape(3, 3)
x

In [None]:
x.sum()

In [None]:
x.sum(axis=0)

In [None]:
x.sum(axis=1)

### Boolean Arrays and Conditional Expressions

NumPy arrays can be used with the usual comparison operators, and the comparisons are made on an element-by-element basis:

In [None]:
x = np.linspace(start=10, stop=50, num=10, dtype=np.int32)
x

In [None]:
y = np.linspace(start=5, stop=60, num=10, dtype=np.int32)
y

In [None]:
x > y

- To use the result of a comparison between arrays in, for example, an if statement,
we need to aggregate the Boolean values of the resulting arrays in some suitable fashion, to obtain a single True or False value.

In [None]:
np.all( x > y) # Test whether all array elements along a given axis evaluate to True.

In [None]:
np.any( x > y) # Test whether any array element along a given axis evaluates to True.

- The advantage of Boolean-valued arrays, however, is that they often make it possible to avoid conditional if statements altogether. By using Boolean-valued arrays in arithmetic expressions, it is possible to write conditional computations in vectorized form.

In [None]:
x[(x > y)]

**NumPy functions for conditional and logical expressions**

| Function                          	| Description                                                                           	|
|-----------------------------------	|---------------------------------------------------------------------------------------	|
| `np.where`                        	| Chooses values from two arrays depending on the value of a condition array.           	|
| `np.choose`                       	| Chooses values from a list of arrays depending on the  values of a given index array. 	|
| `np.select`                       	| Chooses values from a list of arrays depending on a list of conditions.               	|
| `np.nonzero`                      	| Returns an array with indices of nonzero elements.                                    	|
| `np.logical_and`                  	| Performs an elementwise AND operation.                                                	|
| `np.logical_or`, `np.logical_xor` 	| Elementwise OR/XOR operations.                                                        	|
| `np.logical_not`                  	| Elementwise NOT operation (inverting)                                                 	|

- The `np.where` function selects elements from two arrays **(second and third arguments)**, given a Boolean-valued array condition **(the first argument)**. For elements where the condition is True, the corresponding values from the array given as second argument are selected, and if the condition is False, elements from the third argument array are selected:

In [None]:
x

In [None]:
np.where(x<0, x*0, x**2)

In [None]:
np.where(x>0, x*0, x**2)

- The `np.select` function works similarly, but instead of a Boolean-valued condition array, it expects a list of Boolean-valued condition arrays and a corresponding list of value arrays

In [None]:
np.select(condlist=[x < 0, x > 0], choicelist=[x*0, x**2]) # Return an array drawn from elements in choicelist, depending on conditions

- The `np.choose` takes as a first argument a list or an array with indices that determine from which array in a given list of arrays an element is picked from:

In [None]:
np.choose(a=[0, 0, 0, 1, 1, 1, 0, 0, 0, 1], choices=[x*0, x**2])

- The function `np.nonzero` returns a tuple of indices that can be used to index the array

In [None]:
x

In [None]:
np.nonzero(x > 20)

In [None]:
x[np.nonzero(x > 20)]

## Summary

In this tutorial, we've given a brief introduction to array-based programming with Python's NumPy Library. NumPy is a core library that provides a foundation for nearly all computational libraries for Python. Familiarity with the NumPy library and its usage patterns is a fundamental skill for using Python for scientific computing. The NumPy library is the topic of several books, including the Guide to NumPy, by the creator of the NumPy T. Oliphant, available for free online at http://web.mit.edu/dvp/Public/numpybook.pdf, and *Numerical Python (2019)*, and *Python for Data Analysis (2017)*.

## References

- [NumPy Reference Documentation](https://docs.scipy.org/doc/numpy/reference/)
- Robert Johansson, Numerical Python 2nd.Urayasu-shi, Apress, 2019.
- McKinney, Wes. Python for Data Analysis 2nd. Sebastopol: O’Reilly, 2017.

In [None]:
%load_ext watermark
%watermark --iversion -g -h -m -v -u -d