# 02.02 - Basics of NumPy Arrays

Basic categories of array manipulation:

1. **Attributes**: Determining the size, shape, memory consumption, and data types of arrays
2. **Indexing**: Getting and setting the value of individual array elements
3. **Slicing**: Getting and setting smaller subarrays within a larger array
4. **Reshaping**: Changing the shape of a given array
5. **Concatenation and splitting**: Combining multiple arrays into one, and splitting one array into many

### 1. NumPy Array Attributes

Let's start by defining three random arrays: one-dimensional, two-dimensional, and three-dimensional.

In order to ensure that the same random arrays are generated each time the code is run, we can specificy a set _seed_ in NumPy random number generator.

In [1]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

For each array, we can find, among the others:

* <code>ndim</code>: number of dimensions  
* <code>shape</code>: size of each dimension  
* <code>size</code>: total size of array (in terms of num of elements)  
* <code>itemsize</code>: total size of _each_ array (in bytes)
* <code>nbytes</code>: total size of the array (in bytes)
* <code>dtype</code>: data type  
    
For example, using our two-dimensional array:

In [2]:
print('x2 num_dim: ', x2.ndim)
print('x2 shape:', x2.shape)
print('x2 size: ', x2.size)
print('x2 item_size: ', x2.itemsize, 'bytes')
print('x2 num_bytes: ', x2.nbytes, 'bytes')
print('x2 data type: ', x2.dtype)

x2 num_dim:  2
x2 shape: (3, 4)
x2 size:  12
x2 item_size:  4 bytes
x2 num_bytes:  48 bytes
x2 data type:  int32


As a general rule, <code>nbytes</code> = <code>itemsize</code> * <code>size</code>

### 2. Array Indexing 

For one-dimensional arrays, indexing is similar to Python:

In [3]:
x1

array([5, 0, 3, 3, 7, 9])

In [4]:
x1[0] # First value

5

In [5]:
x1[1] # Third value

0

In [6]:
x1[-1]# Last value

9

In [7]:
x1[-2]# Penultimate value

7

For multi-dimensional arrays, items can be accessed with a (row, col) tuple:

In [8]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [9]:
x2[2,3]

7

In [10]:
x2[-1,-3]

6

To modify arrays, we can specify the index to replace:

In [11]:
x2[2,3] = 8

In [12]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 8]])

**Note**: NumPy arrays are fixed type, so e.g. a floating point will be truncated (3.14 > 3)

### 3. Array Slicing

Once again, NumPy follows the standard Python notation: <code>x[start:stop:step]</code> with default values:  

* <code>start</code> = 0
* <code>stop</code> = size of dimension
* <code>step</code> = 1

**Note**: with a negative <code>step</code> value, <code>start</code> and <code>stop</code> are swapped. This is convenient to swap an array.

In [13]:
x = np.arange(15)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [14]:
x[::-1] # Reverse the array

array([14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

In [15]:
x[10::-2] # Reverse from index 10 with step 2

array([10,  8,  6,  4,  2,  0])

For **multi-dimensional arrays** we can get subarrays with multiple slices separated by commas. For example:

In [16]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 8]])

In [17]:
x2[:2, :3] # 2 rows, 3 cols

array([[3, 5, 2],
       [7, 6, 8]])

In [18]:
x2[:3, ::2] # 3 rows, every other column

array([[3, 2],
       [7, 8],
       [1, 7]])

In [19]:
x2[::-1, ::-1] # Reverse array

array([[8, 7, 6, 1],
       [8, 8, 6, 7],
       [4, 2, 5, 3]])

**Accessing a single row or col** can be done combining indexing and slicing:

In [20]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 8]])

In [21]:
print(x2[:, 0])  # first column of x2

[3 7 1]


In [22]:
print(x2[1, :])  # second row of x2

[7 6 8 8]


**Note**: For **row access**, the empty slice can be omitted:

In [23]:
print(x2[0])  # equivalent to x2[0, :]

[3 5 2 4]


A useful thing to know is that subarrays return **views** rather than _copies_.   

In [24]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[3 5]
 [7 6]]


In [25]:
x2_sub[0,0] = 42
print(x2_sub)

[[42  5]
 [ 7  6]]


In [26]:
x2 # original array has changed

array([[42,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  8]])

To actual **copy an array**, we can use <code>copy()</code>. For example:

In [27]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [28]:
x2_sub_copy[0, 0] = 99
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


In [29]:
print(x2) # original array has NOT changed 

[[42  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  8]]


### 4. Reshaping

The most flexible way to reshape an array is to use <code>reshape()</code>:

In [30]:
grid = np.arange(1, 13).reshape((2, 6))
print(grid)

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]


In [31]:
grid = np.arange(1, 13).reshape((3, 4))
print(grid)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


Another way is using the <code>newaxis()</code> within a slice operation:

In [32]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape((1, 3))

array([[1, 2, 3]])

In [33]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [34]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [35]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

### 5. Concatenation and splitting

Arrays can be **joined** using various methods:

* <code>np.concatenate</code> takes a tuple or list of arrays as its first argument
* <code>np.vstack</code> for arrays of mixed dimensions (**vertical** stack)
* <code>np.hstack</code> for arrays of mixed dimensions (**horizontal** stack)

In [36]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

This can also work for more than two arrays at once:

In [37]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


For multi-dimensional arrays,, the axis needs to concatenate along needs to be specified:

In [38]:
grid_1 = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [39]:
grid_2 = grid_1 * 2

In [40]:
# concatenate along the first axis | default is axis = 0
np.concatenate([grid_1, grid_2])

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 2,  4,  6],
       [ 8, 10, 12]])

In [41]:
# concatenate along the second axis
np.concatenate([grid_1, grid_2], axis = 1)

array([[ 1,  2,  3,  2,  4,  6],
       [ 4,  5,  6,  8, 10, 12]])

For working with arrays of mixed dimensions, it can be clearer to use the <code>np.vstack</code> (**vertical stack**) and <code>np.hstack</code> (**horizontal stack**) functions.  

For example:

In [42]:
x = np.array([1, 2, 3])
y = np.array([[10, 12, 14],
                 [20, 30, 40]])

# vertically stack arrays
np.vstack([x, y])

array([[ 1,  2,  3],
       [10, 12, 14],
       [20, 30, 40]])

In [43]:
# horizontally stack arrays
z = np.array([[42],
              [96]])
np.hstack([z, y])

array([[42, 10, 12, 14],
       [96, 20, 30, 40]])

On the other hand, arrays can be **split** using:

* <code>np.split</code> 
* <code>np.vsplit</code> 
* <code>np.hsplit</code>

In [44]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5]) # splitting before index 3 and 5 (after 3rd and 5th element)
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


For multi-dimentional arrays, we can use <code>np.vsplit</code> (**vertical split**) and <code>np.hsplit</code> (**horizontal split**).  

For example:

In [45]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [46]:
upper, lower = np.vsplit(grid, [2]) # split before row of index 2 (after 2nd row)
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [47]:
left, right = np.hsplit(grid, [2]) # split before col of index 2 (after 2nd col)
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]
