# Index, Slice and Reshape NumPy Arrays

Machine learning data is represented as arrays. In Python, data is almost universally represented as NumPy arrays. Following are few ways of converting list data into NumPy arrays.

## From List to Arrays

It's generally recommended to load data from file using Pandas or NumPy functions. Here's lets assume you have loaded the data and is available in Python List. Let's convert the data into NumPy Array.

### One-Dimensional List to Array
You can convert a one-dimentional list to data to an array by calling the **array()** NumPy function.

In [1]:
# Import the necessary function
from numpy import array

# list of data
data = [11,22,33,44,55]

# converting to numpy array
data = array(data)
print(data)
print(type(data))

[11 22 33 44 55]
<class 'numpy.ndarray'>


### Two-Dimensional List of List to Array
You can convert the 2D array into 2D-Numpy array by using the **array()** NumPy function.

In [2]:
# Import the necessary function
from numpy import array

# list of data
data = [[11,22],
        [33,44],
        [55,66]]

# converting to numpy array
data = array(data)
print(data)
print(type(data))

[[11 22]
 [33 44]
 [55 66]]
<class 'numpy.ndarray'>


## Array Indexing
Once your data is represented using NumPy array, you can access it using indexing.
Following are the examples for the same.

### One-Dimensional Indexing
Generally, indexing works just like you would expect for python list using the bracket operator **[ ]**

In [3]:
# Import the necessary function
from numpy import array

# list of data
data = array([11,22,33,44,55])

# index data
print(data[0])
print(data[1])

11
22


Specifing a negative integer value in brackets will return values from the terminal end of the array.

In [4]:
# index data with -ve int value
print(data[-1])
print(data[-5])

55
11


### Two-Dimensional Indexing 
Indexing in two-dimensional data is similar to indexing one-dimentional data, except that a comma is used to separate the index for each dimension represented as **[int1,int2]**

In [5]:
# Import the necessary function
from numpy import array

# define array
data = array([[11,22],
              [33,44],
              [55,66]])

# index data
print(data[0,0])
print(data[1,1])

11
44


You can also access the entire row by just mentioning the first integer value and leaving the second dimension blank.

In [7]:
# index data
print(data[0,])
print(data[1,])

[11 22]
[33 44]


## Array Slicing
Structures like lists and arrays can be sliced which means a specific subsequence of the data can be indexed and retrieved. This is useful for machine learning when specifying input variables and output variables, or splitting training rows from testing rows. Slicing is specified with a colon operator **:** and is used as **[idx1:idx2]**

### One-Dimensional Slicing 

In [12]:
# Import the necessary function
from numpy import array

# define array
data = array([11,22,33,44,55])

# slice the data
print(data[1:3])

[22 33]


When negative index are mentioned for slicing the slicing starts from terminal end.

In [16]:
# slice the data
print(data[-4:-1])

[22 33 44]


### Two-Dimensional Slicing
In Machine learning the 2D slicing is generally used for following 

#### Split Input and Output Features
Spliting data into Input variables(x) and Output variables(y). We can do this by slicing all rows and all columns up to, but before the last column then seperatly indexing the last column for the input features we can select all rows and columns except the last one by specifying : for in the rows index, and :-1 in the column index.

X = [:,:-1]

For the output column we can select all rows again using : and index just the last column by specifying the -1 index.

Y = [:,-1]

In [2]:
# Import necessary libraries
from numpy import array

# define array
data = array([
    [11,22,33],
    [44,55,66],
    [77,88,99]
]
)

# Seperate data
X,Y = data[:,:-1],data[:,-1]

print(X)
print(Y)

[[11 22]
 [44 55]
 [77 88]]
[33 66 99]


#### Split Train and Test Rows
This includes slicing all columns by specifying : in the second dimension index. The training dataset would be all rows from the beginning to the split point.

train = data[:split,:]
test  = data[split:,:]

In [4]:
# Import necessary libraries
from numpy import array

# define array
data = array([
    [11,22,33],
    [44,55,66],
    [77,88,99]
]
)

# Seperate data
split = 2
train,test = data[:split,:],data[split:,:]

print(train)
print(test)

[[11 22 33]
 [44 55 66]]
[[77 88 99]]


## Array Reshaping

### Data Shape

NumPy arrays have a shape attribute that returns a tuple of length of each dimension of the array.

In [5]:
# Import necessary libraries
from numpy import array

# define array
data = array([11,22,33,44,55])
print(data.shape)

(5,)


Tuple of length 2 is returned for a 2D-array.

In [6]:
# Import necessary libraries
from numpy import array

# define array
data = array([[11,22],
              [33,44],
              [55,66]])
print(data.shape)

(3, 2)


The elements of tuple can be accessed just like an array where the 0th index give the number of rows and the 1st index gives the number of columns.

In [7]:
# Import necessary libraries
from numpy import array

# list of data
data = array([[11,22],
              [33,44],
              [55,66]])
print('Rows: %d'%data.shape[0])
print('Cols: %d'%data.shape[1])

Rows: 3
Cols: 2


### Reshape 1D to 2D Array
Numpy provides the reshape() function on the NumPy array object that can be used to reshape the data. The reshape function takes a single argument that specifies the new shape of the array.

In case of reshaping 1D array into 2D array with one column the tuple would be the shape of array as the first dimension (data.shape[0]) and 1 for second dimension

In [8]:
# Import necessary libraries
from numpy import array

# define array
data = array([11,22,33,44,55])
print(data.shape)

# reshape
data = data.reshape((data.shape[0],1))
print(data.shape)

(5,)
(5, 1)


### Reshape 2D to 3D Array
It is a common need to reshape 2D data where each row represents a sequence into 3D array for algorithms that expects multiple samples of one or more time steps and features. A common example is **LSTM recurrent neural network model** in Keras. 

The reshape function can be used directly specifying the new dimensions. We can use the sizes in the shape attribute on the array to specify the number of sample (rows) and cloumns (time step) and fix the number of features at 1. 

data.reshape((data.shape[0],data.shape[1],1))

In [12]:
# Import necessary libraries
from numpy import array

# define array
data = array([[11,22],
             [33,44],
             [55,66]])
print(data.shape)

# reshape
data = data.reshape((data.shape[0],data.shape[1],1))
print(data.shape)

(3, 2)
(3, 2, 1)
