# 1. Introduction to NumPy Arrays

Arrays are the main data structure used in machine learning. In Python, arrays from the NumPy
library, called N-dimensional arrays or the ndarray, are used as the primary data structure for
representing data. In this tutorial, you will discover the N-dimensional array in NumPy for
representing numerical and manipulating data in Python

## 1.1 NumPy N-dimensional Array

The main data structure in NumPy is the ndarray,
which is a shorthand name for N-dimensional array. 

When working with NumPy, data in an
ndarray is simply referred to as an array. It is a fixed-sized array in memory that contains data
of the same type, such as integers or floating point values.
The data type supported by an array can be accessed via the dtype attribute on the array.

The dimensions of an array can be accessed via the shape attribute that returns a tuple
describing the length of each dimension. There are a host of other attributes. A simple way
to create an array from data or simple Python data structures like a list is to use the array()
function. The example below creates a Python list of 3 floating point values, then creates an
ndarray from the list and access the arrays’ shape and data type.

In [1]:
from numpy import array

# list of data
l = [1.0, 2.0, 3.0]
# create array
a = array(l)
print(a)
# shape of the array
print(a.shape)
# data type of the array
print(a.dtype)

[1. 2. 3.]
(3,)
float64


## 1.2 Functions to create Arrays

#### 1.2.1 Empty

The empty() function will create a new array of the specified shape. The argument to the
function is an array or tuple that specifies the length of each dimension of the array to create.
The values or content of the created array will be random and will need to be assigned before
use. The example below creates an empty 3 × 3 two-dimensional array.

In [2]:
# create empty array 
from numpy import empty
a = empty([3,3])
print(a)

[[4.65671366e-310 0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000 0.00000000e+000]]


#### 1.2.2 Zeros

The zeros() function will create a new array of the specified size with the contents filled with
zero values. The argument to the function is an array or tuple that specifies the length of each
dimension of the array to create. The example below creates a 3 × 5 zero two-dimensional array.

In [3]:
# create zero array
from numpy import zeros
a = zeros([3,5])
print(a)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


#### 1.2.3 Ones


The ones() function will create a new array of the specified size with the contents filled with
one values. The argument to the function is an array or tuple that specifies the length of each
dimension of the array to create. The example below creates a 5-element one-dimensional array.

In [4]:
# create one array
from numpy import ones
a = ones([5])
print(a)

[1. 1. 1. 1. 1.]


## 1.3 Combining arrays

#### 1.3.1 Vertical stack

Given two or more existing arrays, you can stack them vertically using the vstack() function.
For example, given two one-dimensional arrays, you can create a new two-dimensional array
with two rows by vertically stacking them. 

In [5]:
from numpy import array
from numpy import vstack

a1 = array([1,2,3])
print(a1)

a2 = array([4,5,6])
print(a2)

a3 = vstack((a1, a2))

print(a3)
print(a3.shape)

[1 2 3]
[4 5 6]
[[1 2 3]
 [4 5 6]]
(2, 3)


#### 1.3.2 Horizontal stack

Given two or more existing arrays, you can stack them horizontally using the hstack() function.
For example, given two one-dimensional arrays, you can create a new one-dimensional array or
one row with the columns of the first and second arrays concatenated

In [6]:
from numpy import array
from numpy import hstack

a1 = array([1,2,3])
print(a1)

a2 = array([4,5,6])
print(a2)

a3 = hstack((a1, a2))

print(a3)
print(a3.shape)

[1 2 3]
[4 5 6]
[1 2 3 4 5 6]
(6,)


#### 1.3.3 Extensions 

More fuctions https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.array-creation.html

In [7]:
a4 = vstack((a3, a3))
print(a4)

a5 = hstack((a4, a4))
print(a5)

a6 = hstack((a3, a3, a3, a3))
print(a6)

[[1 2 3 4 5 6]
 [1 2 3 4 5 6]]
[[1 2 3 4 5 6 1 2 3 4 5 6]
 [1 2 3 4 5 6 1 2 3 4 5 6]]
[1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6]


# 2. Index, Slice and Reshape NumPy Arrays

## 2.1 From List to Arrays

#### 2.1.1 One-Dimensional List to Array

In [8]:
# create one-dimensional array
from numpy import array
# list of data
data = [11, 22, 33, 44, 55]
# array of data
data = array(data)
print(data)
print(type(data))

[11 22 33 44 55]
<class 'numpy.ndarray'>


#### 2.1.2 Two-Dimensional List of Lists to Array

That is a table of
data where each row represents a new observation and each column a new feature.

In [9]:
# create two-dimensional array
from numpy import array
# list of data
data = [[11, 22],
[33, 44],
[55, 66]]
# array of data
data = array(data)
print(data)
print(type(data))

[[11 22]
 [33 44]
 [55 66]]
<class 'numpy.ndarray'>


## 2.2 Array Indexing

#### 2.2.1 One-Dimensional Indexing

In [10]:
# index a one-dimensional array
from numpy import array
# define array
data = array([11, 22, 33, 44, 55])
# index data
print(data[0])
print(data[4])

11
55


In [11]:
# negative array indexing
from numpy import array
# define array
data = array([11, 22, 33, 44, 55])
# index data
print(data[-1])
print(data[-5])

55
11


In [12]:
# index row of two-dimensional array
from numpy import array
# define array
data = array([
[11, 22],
[33, 44],
[55, 66]])
# index data
print(data[0,])

[11 22]


## 2.3 Array Slicing

#### 2.3.1 One-Dimensional Slicing

In [13]:
# slice a one-dimensional array
from numpy import array
# define array
data = array([11, 22, 33, 44, 55])
print(data[:])

[11 22 33 44 55]


In [14]:
# slice a subset of a one-dimensional array
from numpy import array
# define array
data = array([11, 22, 33, 44, 55])
print(data[0:1])

[11]


#### 2.3.2 Two-Dimensional Slicing

In [15]:
# split input and output data
from numpy import array
# define array
data = array([
[11, 22, 33],
[44, 55, 66],
[77, 88, 99]])
# separate data
X, y = data[:, :-1], data[:, -1]
print(X)
print(y)

[[11 22]
 [44 55]
 [77 88]]
[33 66 99]


Tt is common to split a loaded dataset into separate train and test sets. This is a splitting of
rows where some portion will be used to train the model and the remaining portion will be used
to estimate the skill of the trained model. This would involve slicing all columns by specifying :
in the second dimension index. The training dataset would be all rows from the beginning to
the split point.

In [16]:
# split train and test data
from numpy import array
# define array
data = array([
[11, 22, 33],
[44, 55, 66],
[77, 88, 99]])
# separate data
split = 2
train,test = data[:split,:],data[split:,:]
print(train)
print(test)

[[11 22 33]
 [44 55 66]]
[[77 88 99]]


## 2.4 Array Reshaping

After slicing your data, you may need to reshape it. For example, some libraries, such as
scikit-learn, may require that a one-dimensional array of output variables (y) be shaped as a
two-dimensional array with one column and outcomes for each column. Some algorithms, like
the Long Short-Term Memory recurrent neural network in Keras, require input to be specified
as a three-dimensional array comprised of samples, timesteps, and features. It is important to
know how to reshape your NumPy arrays so that your data meets the expectation of specific
Python libraries. We will look at these two examples.

#### 2.4.1 Data shape

NumPy arrays have a shape attribute that returns a tuple of the length of each dimension of
the array. 

In [17]:
# shape of one-dimensional array
from numpy import array
# define array
data = array([11, 22, 33, 44, 55])
print(data.shape)

(5,)


In [18]:
# shape of a two-dimensional array
from numpy import array
# list of data
data = [[11, 22],
[33, 44],
[55, 66]]
# array of data
data = array(data)
print(data.shape)

(3, 2)


You can use the size of your array dimensions in the shape dimension, such as specifying
parameters. The elements of the tuple can be accessed just like an array, with the 0th index for
the number of rows and the 1st index for the number of columns. 

In [19]:
# row and column shape of two-dimensional array
from numpy import array
# list of data
data = [[11, 22],
    [33, 44],
    [55, 66]]
# array of data
data = array(data)
print('Rows: %d' % data.shape[0])
print('Cols: %d' % data.shape[1])

Rows: 3
Cols: 2


#### 2.4.2 Reshape 1D to 2D array

t is common to need to reshape a one-dimensional array into a two-dimensional array with
one column and multiple arrays. NumPy provides the reshape() function on the NumPy array
object that can be used to reshape the data. The reshape() function takes a single argument
that specifies the new shape of the array. In the case of reshaping a one-dimensional array into
a two-dimensional array with one column, the tuple would be the shape of the array as the first
dimension (data.shape[0]) and 1 for the second dimension.

In [20]:
# reshape 1D array to 2D
from numpy import array
# define array
data = array([11, 22, 33, 44, 55])
print(data.shape)
# reshape
data = data.reshape((data.shape[0], 1))
print(data.shape)
print(data)

(5,)
(5, 1)
[[11]
 [22]
 [33]
 [44]
 [55]]


#### 2.4.3 Reshape 2D to 3D array

It is common to need to reshape two-dimensional data where each row represents a sequence
into a three-dimensional array for algorithms that expect multiple samples of one or more time
steps and one or more features. A good example is the LSTM recurrent neural network model
in the Keras deep learning library. The reshape function can be used directly, specifying the
new dimensionality. This is clear with an example where each sequence has multiple time steps
with one observation (feature) at each time step. We can use the sizes in the shape attribute on
the array to specify the number of samples (rows) and columns (time steps) and fix the number
of features at 1.

In [21]:
# reshape 2D array to 3D
from numpy import array
# list of data
data = [[11, 22],
[33, 44],
[55, 66]]
# array of data
data = array(data)
print(data.shape)
# reshape
data = data.reshape((data.shape[0], data.shape[1], 1))
print(data.shape)

(3, 2)
(3, 2, 1)


# 3. NumPy Array Broadcasting

## 3.1 Limitation with Array Arithmetic

Two arrays can be added together to create a new array where the values at each
index are added together. For example, an array a can be defined as [1, 2, 3] and array b can be
defined as [1, 2, 3] and adding together will result in a new array with the values [2, 4, 6].

- a = [1, 2, 3]
- b = [1, 2, 3]
- c = a + b
- c = [1 + 1, 2 + 2, 3 + 3]

## 3.2 Array Broadcasting

Broadcasting is the name given to the method that NumPy uses to allow array arithmetic
between arrays with a different shape or size. Although the technique was developed for NumPy,
it has also been adopted more broadly in other numerical computational libraries, such as
Theano, TensorFlow, and Octave. Broadcasting solves the problem of arithmetic between arrays
of differing shapes by in effect replicating the smaller array along the last mismatched dimension.

In the context of deep learning, we also use some less conventional notation. We
allow the addition of matrix and a vector, yielding another matrix: C = A + b, where
Ci,j = Ai,j + bj . In other words, the vector b is added to each row of the matrix.
This shorthand eliminates the need to define a matrix with b copied into each row
before doing the addition. This implicit copying of b to many locations is called
broadcasting.

## 3.3 Broadcasting in Numpy

#### 3.3.1 Scalar and One-Dimensional Array

A single value or scalar can be used in arithmetic with a one-dimensional array. For example,
we can imagine a one-dimensional array a with three values [a1, a2, a3] added to a scalar b.

```
a = [a1, a2, a3]
b
```

The scalar will need to be broadcast across the one-dimensional array by duplicating the
value it 2 more times.

``` b = [b1, b2, b3]```

The scalar will need to be broadcast across the one-dimensional array by duplicating the
value it 2 more times.

```
b = [b1, b2, b3]
```

The two one-dimensional arrays can then be added directly.

```
c = a + b
c = [a1 + b1, a2 + b2, a3 + b3]
```

In [22]:
# broadcast scalar to one-dimensional array
from numpy import array
# define array
a = array([1, 2, 3])
print(a)
# define scalar
b = 2
print(b)
# broadcast
c = a + b
print(c)

[1 2 3]
2
[3 4 5]


#### 3.3.2 Scalar and Two-Dimensional Array

A scalar value can be used in arithmetic with a two-dimensional array. For example, we can imagine a two-dimensional array \( A \) with 2 rows and 3 columns added to the scalar \( b \).

$$
A = \begin{pmatrix}
a_{1,1} & a_{1,2} & a_{1,3} \\
a_{2,1} & a_{2,2} & a_{2,3}
\end{pmatrix}
$$

The scalar will need to be broadcast across each row of the two-dimensional array by duplicating it 5 more times.

$$
B = \begin{pmatrix}
b_{1,1} & b_{1,2} & b_{1,3} \\
b_{2,1} & b_{2,2} & b_{2,3}
\end{pmatrix}
$$

The two two-dimensional arrays can then be added directly.

$$
C = A + B 
$$

$$
C = \begin{pmatrix}
a_{1,1} + b_{1,1} & a_{1,2} + b_{1,2} & a_{1,3} + b_{1,3} \\
a_{2,1} + b_{2,1} & a_{2,2} + b_{2,2} & a_{2,3} + b_{2,3}
\end{pmatrix}
$$

In [23]:
# broadcast scalar to two-dimensional array
from numpy import array
# define array
A = array([
[1, 2, 3],
[1, 2, 3]])
print(A)
# define scalar
b = 2
print(b)
# broadcast
C = A + b
print(C)

[[1 2 3]
 [1 2 3]]
2
[[3 4 5]
 [3 4 5]]


#### 3.3.3 One-Dimensional and Two-Dimensional Arrays

A one-dimensional array can be used in arithmetic with a two-dimensional array. for example, we can imagine a two-dimensional array $A$ with 2 rows and 3 columns added to a one-dimensional array $b$ with 3 values.

$$
A = \begin{pmatrix}
a_{1,1} & a_{1,2} & a_{1,3} \\
a_{2,1} & a_{2,2} & a_{2,3}
\end{pmatrix}
$$

$$
b = \begin{pmatrix}
b_1 & b_2 & b_3
\end{pmatrix}
$$

The one-dimensional array is broadcast across each row of the two-dimensional array by creating a second copy to result in a new two-dimensional array $B$.

$$
B = \begin{pmatrix}
b_{1,1} & b_{1,2} & b_{1,3} \\
b_{2,1} & b_{2,2} & b_{2,3}
\end{pmatrix}
$$

The two two-dimensional arrays can then be added directly.

$$
C = A + B
$$

$$
C = \begin{pmatrix}
a_{1,1} + b_{1,1} & a_{1,2} + b_{1,2} & a_{1,3} + b_{1,3} \\
a_{2,1} + b_{2,1} & a_{2,2} + b_{2,2} & a_{2,3} + b_{2,3}
\end{pmatrix}
$$

In [24]:
# broadcast one-dimensional array to two-dimensional array
from numpy import array
# define two-dimensional array
A = array([
[1, 2, 3],
[1, 2, 3]])
print(A)
# define one-dimensional array
b = array([1, 2, 3])
print(b)
# broadcast
C = A + b
print(C)

[[1 2 3]
 [1 2 3]]
[1 2 3]
[[2 4 6]
 [2 4 6]]


## 3.4 Limitations of Broadcasting

Broadcasting is a handy shortcut that proves very useful in practice when working with NumPy
arrays. That being said, it does not work for all cases, and in fact imposes a strict rule that
must be satisfied for broadcasting to be performed. Arithmetic, including broadcasting, can
only be performed when the shape of each dimension in the arrays are equal or one has the
dimension size of 1. The dimensions are considered in reverse order, starting with the trailing
dimension; for example, looking at columns before rows in a two-dimensional case.
This make more sense when we consider that NumPy will in effect pad missing dimensions
with a size of 1 when comparing arrays. Therefore, the comparison between a two-dimensional
array A with 2 rows and 3 columns and a vector b with 3 elements:

![image.png](attachment:image.png)

In [25]:
# broadcasting error
from numpy import array
# define two-dimensional array
A = array([
[1, 2, 3],
[1, 2, 3]])
print(A.shape)
# define one-dimensional array
b = array([1, 2])
print(b.shape)
# attempt broadcast
C = A + b
print(C)

(2, 3)
(2,)


ValueError: operands could not be broadcast together with shapes (2,3) (2,) 