**Tools - NumPy**

*NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions.*

*Fast array operations due to its C++ based implementation.N-dimensional arrays (ndarrays) that can hold data of the same type*

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/ageron/handson-ml2/blob/master/tools_numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
  <td>
    <a target="_blank" href="https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml2/blob/master/tools_numpy.ipynb"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" /></a>
  </td>
</table>

# Creating Arrays

Now let's import `numpy`. Most people import it as `np`:

In [5]:
import numpy as np

## `np.zeros`

The `zeros` function creates an array containing any number of zeros:

In [None]:
np.zeros(5)

It's just as easy to create a 2D array (ie. a matrix) by providing a tuple with the desired number of rows and columns. For example, here's a 3x4 matrix:

In [None]:
np.zeros((3,4))

## Some vocabulary

* In NumPy, each dimension is called an **axis**.
* The number of axes is called the **rank**.
    * For example, the above 3x4 matrix is an array of rank 2 (it is 2-dimensional).
    * The first axis has length 3, the second has length 4.
* An array's list of axis lengths is called the **shape** of the array.
    * For example, the above matrix's shape is `(3, 4)`.
    * The rank is equal to the shape's length.
* The **size** of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)

In [None]:
a = np.zeros((3,4))

In [None]:
a.shape

In [None]:
a.ndim  # equal to len(a.shape)

In [None]:
a.size

## Python List Vs Numpy Arrays
* A list in Python is a built-in data structure that can store different types of data and change in size dynamically. It is flexible and easy to use     for general purposes. 

* A NumPy array, on the other hand, is specifically designed for numerical computations. It is faster and more memory-efficient than a list but requires all elements to be of the same data type. 


In [None]:
py_list = [1, 2, 3]
print("Python list multiplication ", py_list * 2)


np_array = np.array([1, 2, 3]) #element wise multiplication
print("Python array multiplication ", np_array * 2)

import time
start = time.time()
py_list = [i*2 for i in range(1000000)]
print("\n List operation time: ", time.time() - start)

start = time.time()
np_array = np.arange(1000000) * 2
print("\n Numpy operation time: ", time.time() - start)



## N-dimensional arrays
You can also create an N-dimensional array of arbitrary rank. For example, here's a 3D array (rank=3), with shape `(2,3,4)`:

In [None]:
np.zeros((2,3,4))

## Array type
NumPy arrays have the type `ndarray`s:

In [None]:
type(np.zeros((3,4)))

## `np.ones`
Many other NumPy functions create `ndarrays`.

Here's a 3x4 matrix full of ones:

In [None]:
np.ones((3,4))

## `np.full`
Creates an array of the given shape initialized with the given value. Here's a 3x4 matrix full of `π`.

In [None]:
np.full((3,4), np.pi)

## `np.empty`
An uninitialized 2x3 array (its content is not predictable, as it is whatever is in memory at that point):

In [None]:
np.empty((2,3))

## np.array
Of course you can initialize an `ndarray` using a regular python array. Just call the `array` function:

In [None]:
np.array([[1,2,3,4], [10, 20, 30, 40]])

## `np.arange`
You can create an `ndarray` using NumPy's `arange` function, which is similar to python's built-in `range` function:

In [None]:
np.arange(1, 5)

It also works with floats:

In [None]:
np.arange(1.0, 5.0)

Of course you can provide a step parameter:

In [None]:
np.arange(1, 5, 0.5)

However, when dealing with floats, the exact number of elements in the array is not always predictible. For example, consider this:

## `np.rand` and `np.randn`
A number of functions are available in NumPy's `random` module to create `ndarray`s initialized with random values.
For example, here is a 3x4 matrix initialized with random floats between 0 and 1 (uniform distribution):

In [None]:
np.random.rand(3,4)

Here's a 3x4 matrix containing random floats sampled from a univariate [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) (Gaussian distribution) of mean 0 and variance 1:

## np.fromfunction
You can also initialize an `ndarray` using a function:

In [None]:
def my_function(z, y, x):
    return x + 10 * y + 100 * z

np.fromfunction(my_function, (3, 2, 10))

NumPy first creates three `ndarrays` (one per dimension), each of shape `(3, 2, 10)`. Each array has values equal to the coordinate along a specific axis. For example, all elements in the `z` array are equal to their z-coordinate:

    [[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
      [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]
    
     [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
      [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]
    
     [[ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]
      [ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.]]]

So the terms `x`, `y` and `z` in the expression `x + 10 * y + 100 * z` above are in fact `ndarray`s (we will discuss arithmetic operations on arrays below).  The point is that the function `my_function` is only called *once*, instead of once per element. This makes initialization very efficient.

# Array data
## `dtype`
NumPy's `ndarray`s are also efficient in part because all their elements must have the same type (usually numbers).
You can check what the data type is by looking at the `dtype` attribute:

In [None]:
c = np.arange(1, 5)
print(c.dtype, c)

In [None]:
c = np.arange(1.0, 5.0)
print(c.dtype, c)

Instead of letting NumPy guess what data type to use, you can set it explicitly when creating an array by setting the `dtype` parameter:

In [None]:
d = np.arange(1, 5, dtype=np.complex64)
print(d.dtype, d)

Available data types include `int8`, `int16`, `int32`, `int64`, `uint8`|`16`|`32`|`64`, `float16`|`32`|`64` and `complex64`|`128`. Check out [the documentation](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html) for the full list.

## `itemsize`
The `itemsize` attribute returns the size (in bytes) of each item:

In [None]:
e = np.arange(1, 5, dtype=np.complex64)
e.itemsize

# Reshaping an array
## In place
Changing the shape of an `ndarray` is as simple as setting its `shape` attribute. However, the array's size must remain the same.

In [None]:
g = np.arange(24)
print(g)
print("Rank:", g.ndim)

In [None]:
g.shape = (6, 4)
print(g)
print("Rank:", g.ndim)

In [None]:
g.shape = (2, 3, 4)
print(g)
print("Rank:", g.ndim)

## `reshape`
The `reshape` function returns a new `ndarray` object pointing at the *same* data. This means that modifying one array will also modify the other.

In [None]:
g2 = g.reshape(4,6)
print(g2)
print("Rank:", g2.ndim)

Set item at row 1, col 2 to 999 (more about indexing below).

In [None]:
g2[1, 2] = 999
g2

The corresponding element in `g` has been modified.

In [None]:
g

## `ravel`
Finally, the `ravel` function returns a new one-dimensional `ndarray` that also points to the same data:

In [None]:
g.ravel()

# Arithmetic operations
All the usual arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, etc.) can be used with `ndarray`s. They apply *elementwise*:

In [None]:
a = np.array([14, 23, 32, 41])
b = np.array([5,  4,  3,  2])
print("a + b  =", a + b)
print("a - b  =", a - b)
print("a * b  =", a * b)
print("a / b  =", a / b)
print("a % b  =", a % b)


Note that the multiplication is *not* a matrix multiplication. We will discuss matrix operations below.

The arrays must have the same shape. If they do not, NumPy will apply the *broadcasting rules*.

# Broadcasting

In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called *broadcasting* rules:

## First rule
*If the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.*

In [None]:
h = np.arange(5).reshape(1, 1, 5)
h

Now let's try to add a 1D array of shape `(5,)` to this 3D array of shape `(1,1,5)`. Applying the first rule of broadcasting!

In [None]:
h + [10, 20, 30, 40, 50]  # same as: h + [[[10, 20, 30, 40, 50]]]

## Second rule
*Arrays with a 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is repeated along that dimension.*

In [None]:
k = np.arange(6).reshape(2, 3)
k

Let's try to add a 2D array of shape `(2,1)` to this 2D `ndarray` of shape `(2, 3)`. NumPy will apply the second rule of broadcasting:

In [None]:
k + [[100], [200]]  # same as: k + [[100, 100, 100], [200, 200, 200]]

Combining rules 1 & 2, we can do this:

In [None]:
k + [100, 200, 300]  # after rule 1: [[100, 200, 300]], and after rule 2: [[100, 200, 300], [100, 200, 300]]

And also, very simply:

In [None]:
k + 1000  # same as: k + [[1000, 1000, 1000], [1000, 1000, 1000]]

## Third rule
*After rules 1 & 2, the sizes of all arrays must match.*

In [None]:
try:
    k + [33, 44]
except ValueError as e:
    print(e)

Broadcasting rules are used in many NumPy operations, not just arithmetic operations, as we will see below.
For more details about broadcasting, check out [the documentation](https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html).

# Conditional operators

The conditional operators also apply elementwise:

In [None]:
m = np.array([20, -5, 30, 40])
m < [15, 16, 35, 36]

And using broadcasting:

In [None]:
m < 25  # equivalent to m < [25, 25, 25, 25]

This is most useful in conjunction with boolean indexing (discussed below).

In [None]:
m[m < 25]

# Statistical functions

Many mathematical and statistical functions are available for `ndarray`s.

## `ndarray` methods
Some functions are simply `ndarray` methods, for example:

In [90]:
# Sample data
data = np.array([1, 2, 3, 4, 5])


Note that this computes the mean of all elements in the `ndarray`, regardless of its shape.

Here are a few more useful `ndarray` methods:

In [None]:
# Mean (Average)
mean_value = np.mean(data)
print("Mean:", mean_value)

In [None]:
# Median
median_value = np.median(data)
print("Median:", median_value)

In [None]:
# Standard Deviation
std_deviation = np.std(data)
print("Standard Deviation:", std_deviation)

In [None]:
# Minimum and Maximum
min_value = np.min(data)
max_value = np.max(data)
print("Minimum Value:", min_value)
print("Maximum Value:", max_value)

These functions accept an optional argument `axis` which lets you ask for the operation to be performed on elements along the given axis. For example:

## Universal functions
NumPy also provides fast elementwise functions called *universal functions*, or **ufunc**. They are vectorized wrappers of simple functions. For example `square` returns a new `ndarray` which is a copy of the original `ndarray` except that each element is squared:

In [None]:
a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])
np.square(a)

Here are a few more useful unary ufuncs:

In [None]:
print("Original ndarray")
print(a)
for func in (np.abs, np.sqrt, np.exp, np.log, np.sign, np.ceil, np.modf, np.isnan, np.cos):
    print("\n", func.__name__)
    print(func(a))

## Binary ufuncs
There are also many binary ufuncs, that apply elementwise on two `ndarray`s.  Broadcasting rules are applied if the arrays do not have the same shape:

In [None]:
a = np.array([1, -2, 3, 4])
b = np.array([2, 8, -1, 7])
np.add(a, b)  # equivalent to a + b

In [None]:
np.greater(a, b)  # equivalent to a > b

In [None]:
np.maximum(a, b)

In [None]:
np.copysign(a, b)

# Array indexing
## One-dimensional arrays
One-dimensional NumPy arrays can be accessed more or less like regular python arrays:

In [None]:
a = np.array([1, 5, 3, 19, 13, 7, 3])
a[3]

In [None]:
a[2:5]

In [None]:
a[2:-1]

In [None]:
a[:2]

In [None]:
a[2::2]

In [None]:
a[::-1]

Of course, you can modify elements:

In [None]:
a[3]=999
a

You can also modify an `ndarray` slice:

In [None]:
a[2:5] = [997, 998, 999]
a

## Differences with regular python arrays
Contrary to regular python arrays, if you assign a single value to an `ndarray` slice, it is copied across the whole slice, thanks to broadcasting rules discussed above.

In [None]:
a[2:5] = -1
a

Also, you cannot grow or shrink `ndarray`s this way:

In [None]:
try:
    a[2:5] = [1,2,3,4,5,6]  # too long
except ValueError as e:
    print(e)

You cannot delete elements either:

In [None]:
try:
    del a[2:5]
except ValueError as e:
    print(e)

Last but not least, `ndarray` **slices are actually *views*** on the same data buffer. This means that if you create a slice and modify it, you are actually going to modify the original `ndarray` as well!

In [None]:
a_slice = a[2:6]
a_slice[1] = 1000
a  # the original array was modified!

In [None]:
a[3] = 2000
a_slice  # similarly, modifying the original array modifies the slice!

If you want a copy of the data, you need to use the `copy` method:

In [None]:
another_slice = a[2:6].copy()
another_slice[1] = 3000
a  # the original array is untouched

In [None]:
a[3] = 4000
another_slice  # similary, modifying the original array does not affect the slice copy

## Multi-dimensional arrays
Multi-dimensional arrays can be accessed in a similar way by providing an index or slice for each axis, separated by commas:

In [None]:
b = np.arange(48).reshape(4, 12)
b

In [None]:
b[1, 2]  # row 1, col 2

In [None]:
b[1, :]  # row 1, all columns

In [None]:
b[:, 1]  # all rows, column 1

**Caution**: note the subtle difference between these two expressions: 

In [None]:
b[1, :]

In [None]:
b[1:2, :]

The first expression returns row 1 as a 1D array of shape `(12,)`, while the second returns that same row as a 2D array of shape `(1, 12)`.

# Stacking arrays
It is often useful to stack together different arrays. NumPy offers several functions to do just that. Let's start by creating a few arrays.

In [None]:
q1 = np.full((3,4), 1.0)
q1

In [None]:
q2 = np.full((4,4), 2.0)
q2

In [None]:
q3 = np.full((3,4), 3.0)
q3

## `vstack`
Now let's stack them vertically using `vstack`:

In [None]:
q4 = np.vstack((q1, q2, q3))
q4

In [None]:
q4.shape

This was possible because q1, q2 and q3 all have the same shape (except for the vertical axis, but that's ok since we are stacking on that axis).

## `hstack`
We can also stack arrays horizontally using `hstack`:

In [None]:
q5 = np.hstack((q1, q3))
q5

In [None]:
q5.shape

This is possible because q1 and q3 both have 3 rows. But since q2 has 4 rows, it cannot be stacked horizontally with q1 and q3:

In [None]:
try:
    q5 = np.hstack((q1, q2, q3))
except ValueError as e:
    print(e)

## `concatenate`
The `concatenate` function stacks arrays along any given existing axis.

In [None]:
q7 = np.concatenate((q1, q2, q3), axis=0)  # Equivalent to vstack
q7

In [None]:
q7.shape

As you might guess, `hstack` is equivalent to calling `concatenate` with `axis=1`.

## `stack`
The `stack` function stacks arrays along a new axis. All arrays have to have the same shape.

In [None]:
q8 = np.stack((q1, q3))
q8

In [None]:
q8.shape

# Splitting arrays
Splitting is the opposite of stacking. For example, let's use the `vsplit` function to split a matrix vertically.

First let's create a 6x4 matrix:

In [None]:
r = np.arange(24).reshape(6,4)
r

Now let's split it in three equal parts, vertically:

In [None]:
r1, r2, r3 = np.vsplit(r, 3)
r1

In [None]:
r2

In [None]:
r3

There is also a `split` function which splits an array along any given axis. Calling `vsplit` is equivalent to calling `split` with `axis=0`. There is also an `hsplit` function, equivalent to calling `split` with `axis=1`:

In [None]:
r4, r5 = np.hsplit(r, 2)
r4

In [None]:
r5

# Transposing arrays
The `transpose` method creates a new view on an `ndarray`'s data, with axes permuted in the given order.

For example, let's create a 3D array:

In [None]:
t = np.arange(24).reshape(4,2,3)
t

Now let's create an `ndarray` such that the axes `0, 1, 2` (depth, height, width) are re-ordered to `1, 2, 0` (depth→width, height→depth, width→height):

In [None]:
t1 = t.transpose((1,2,0))
t1

In [None]:
t1.shape

By default, `transpose` reverses the order of the dimensions:

In [None]:
t2 = t.transpose()  # equivalent to t.transpose((2, 1, 0))
t2

In [None]:
m1 = np.arange(10).reshape(2,5)
m1

In [None]:
m1.T

The `T` attribute has no effect on rank 0 (empty) or rank 1 arrays:

In [None]:
m2 = np.arange(5)
m2

In [None]:
m2.T

We can get the desired transposition by first reshaping the 1D array to a single-row matrix (2D):

In [None]:
m2r = m2.reshape(1,5)
m2r

In [None]:
m2r.T

## Matrix multiplication
Let's create two matrices and execute a [matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication) using the `dot()` method.

In [None]:
n1 = np.arange(10).reshape(2, 5)
n1

In [None]:
n2 = np.arange(15).reshape(5,3)
n2

In [None]:
n1.dot(n2)

**Caution**: as mentionned previously, `n1*n2` is *not* a matric multiplication, it is an elementwise product (also called a [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices))).

## Matrix inverse 
Many of the linear algebra functions are available in the `numpy.linalg` module, in particular the `inv` function to compute a square matrix's inverse:

In [None]:
import numpy.linalg as linalg

m3 = np.array([[1,2,3],[5,7,11],[21,29,31]])
m3

In [None]:
linalg.inv(m3)

## Identity matrix
The product of a matrix by its inverse returns the identiy matrix (with small floating point errors):

In [None]:
m3.dot(linalg.inv(m3))

You can create an identity matrix of size NxN by calling `eye`:

## Determinant
The `det` function computes the [matrix determinant](https://en.wikipedia.org/wiki/Determinant):

In [None]:
linalg.det(m3)  # Computes the matrix determinant