# 1 Index, Slice and Reshape Numpy Arrays

In Machine Learning data is represented as Numpy Arrays. In this tutorial, we will see how to access and manipulate our data using Numpy arrays.

## 1.1 From List to Arrays

### 1.1.1 One Dimensional List to Array

We can easily convert one-dimensional data to an array by calling the **array()** Numpy function.

In [1]:
# create one-dimensional array

from numpy import array

In [2]:
# list of data

data = [11,22,33,44]

In [3]:
# array of data

data = array(data)

In [4]:
print(data)
print(type(data))

[11 22 33 44]
<class 'numpy.ndarray'>


### 1.1.2 Two-Dimensional List of Lists to Array

In Machine Learning most likely we get two-dimensional data i.e. data where each row is a new sample or observation and each column represents a new attribute or feature. Each list represents a new observation. We can easily convert it to a Numpy array following the same procedure as above:

In [5]:
# create two-dimensional array
# list of data

data = [[11,22],[33,44]]

In [6]:
# array of data

data = array(data)

In [7]:
print(data)
print(type(data))

[[11 22]
 [33 44]]
<class 'numpy.ndarray'>


## 1.2 Array Indexing

Once your data is represented, it's time to access it. The way we access it is known as Indexing.

### 1.2.1 One-Dimensional Indexing

Indexing works just as similar to any other programming language. We can access elements using the bracket **[ ]** operator where the index starts at zero.

In [8]:
# index a one-dimensional array
# define array

data = array([11,22,33,44,55])

In [9]:
# index data

print(data[-1])
print(data[-5])

55
11


If we specify values beyond the bounds of array, an error will be given.

In [10]:
data = array([[11,22],[33,44],[55,66]])

In [11]:
print(data[5])

IndexError: index 5 is out of bounds for axis 0 with size 3

### 1.2.2 Two-Dimensional Indexing

It's similar to one-dimensional indexing of data, except that a *comma* is used to separate the index of each dimension.

In [None]:
# index two-dimensional array
# define array

data = array([
    [11, 22],
    [33, 44],
    [55, 66]
])

# index data
print(data[0, 0])

If interested in all items in first row:

In [12]:
print(data[0, ])

[11 22]


## 1.3 Array Slicing

Structures like lists and Numpy arrays can be sliced. What does slicing means?

It basically means to index and retrieve a subsequence of the structure i.e. list or numpy array. It's pretty handful when it comes to Machine Learning as during splitting the dataset into training and testing set, it helps to split the rows and columns.

It's specified using colon **:** operator: $$from : to$$ The slice extends from the *from* index and ends one before the *to* index. 

### 1.3.1 One-Dimensional Slicing

All data can be accessed using just the colon operator without mentioning any indices:

In [13]:
# define array

data = array([11,22,33,44,55])

In [14]:
print(data[:])

[11 22 33 44 55]


We can also use negative indexes in slices. The following cell says, *hey can you give all items from the second last element until the end?*

In [15]:
print(data[-2:])

[44 55]


### 1.3.3 Two-Dimensional Slicing

It's common to split your loaded data into input variables($X$) and output variable($y$) in Machine Learning. It can be accomplished by slicing all rows and columns up to, but before the last columns for input variables and separately indexing the last column for output variable.

In [16]:
# split input and output data
# define array

data = array([[11,22],[33,44],[55,66]])

In [17]:
# For input variable i.e. X specify : in rows and :-1 in columns
# For output variables i.e. y specify : in rows and -1 in columns

X,y = data[:,:-1], data[:,-1]

In [18]:
print(X)

[[11]
 [33]
 [55]]


In [19]:
print(y)

[22 44 66]


## 1.4 Array Reshaping

Once you're done with slicing your data, you may need to reshape it. Reason? It's important to reshape your Numpy arrays so that your data meets the expectation of specific model architectures.

### 1.4.1 Data Shape

shape attribute returns a tuple of length of each dimension of the array.

In [20]:
# shape of one-dimensional array
# define array

data = array([11,22,33,44,55])

In [21]:
print(data.shape)

(5,)


### 1.4.2 Reshape 1D to 2D Array

In the case of reshaping a one-dimensional array into a two-dimensional array with one column, the tuple would be the shape of the array as the first dimension and 1 for the second dimension.

In [22]:
data = data.reshape((data.shape[0],1))

In [23]:
print(data.shape)

(5, 1)


### 1.4.3 Reshape 2D to 3D Array

It's common to reshape 2D data(where each row represents a sequence) into a 3D array for algorithms which expect multiple samples of one or more time steps and one or more features. e.g. LSTM in Keras Library.

In [24]:
# array of data

data = array([[11,22],[33,44],[55,66]])

In [25]:
data.shape

(3, 2)

We can use the sizes in the shape attribute on the array to specify the number of samples(rows and columns(time steps) and fix the number of features at 1.

In [26]:
data = data.reshape(data.shape[0], data.shape[1], 1)

In [27]:
data.shape

(3, 2, 1)