<!--BOOK_INFORMATION-->
<img align="left" style="padding-right:10px;" src="https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/figures/PDSH-cover-small.png?raw=1">

*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*

<!--NAVIGATION-->
< [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >

<a href="https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.02-The-Basics-Of-NumPy-Arrays.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>


# The Basics of NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas ([Chapter 3](03.00-Introduction-to-Pandas.ipynb)) are built around the NumPy array.
This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.
While the types of operations shown here may seem a bit dry and pedantic, they comprise the building blocks of many other examples used throughout the book.
Get to know them well!

We'll cover a few categories of basic array manipulations here:

- *Attributes of arrays*: Determining the size, shape, memory consumption, and data types of arrays
- *Indexing of arrays*: Getting and setting the value of individual array elements
- *Slicing of arrays*: Getting and setting smaller subarrays within a larger array
- *Reshaping of arrays*: Changing the shape of a given array
- *Joining and splitting of arrays*: Combining multiple arrays into one, and splitting one array into many

## NumPy Array Attributes

First let's discuss some useful array attributes.
We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.
We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run:

1. np.random.randint to generate random matrices with int values

In [105]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array
print('x1:')
print(x1)
print('x2:')
print(x2)
print('x3:')
print(x3)

x1:
[5 0 3 3 7 9]
x2:
[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]
x3:
[[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]


2. ndim, shape and size attributes

Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array; multiplication of the sizes of each dimension):

In [50]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


3. Multi dimentional arrays:
    - An array of dimention 1 ex 6 is just an array containing 6 elements inside
    - An array of dimension 2 ex (3,4) is an array of 3 arrays containing 6 elements inside
    - An array of dimention 3 ex (3, 4, 5) is an array of 3 arrays each containing 4 arrays each containing 5 elements

4. dtype attribute, the type of the elements of the array

Att: numpy arrays, unlike python lists have a fixed type

Another useful attribute is the ``dtype``, the data type of the array (which we discussed previously in [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb)):

In [51]:
print("dtype:", x3.dtype)

dtype: int64


Keep in mind that, unlike Python lists, NumPy arrays have a fixed type.
This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [52]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 0, 3, 3, 7, 9])

## Array Indexing: Accessing Single Elements

5. Accessing single elements in one dimentional arrays, indices start at 0

If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar.
In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [53]:
x1

array([3, 0, 3, 3, 7, 9])

In [54]:
x1[0]

3

In [55]:
x1[4]

7

6. Negative indexes to index from the end of the array

To index from the end of the array, you can use negative indices:

In [56]:
x1[-1]

9

In [57]:
x1[-2]

7

7. Accessing single elements in one dimentional arrays, and modifying them

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [58]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [59]:
x2[0, 0]

3

In [60]:
x2[2, 0]

1

In [61]:
x2[2, -1]

7

Values can also be modified using any of the above index notation, don't forget that numpy arrays are single typed:

In [71]:
x2[0, 0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

## Array Slicing: Accessing Subarrays

### One-dimensional subarrays

8. One dimentional array slicing using column : This is very important
``` python
x[start:stop:step]
```
    - If any of these are unspecified, and the step size is either unspecified or positive, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.

    - If any of these are unspecified, and the step size is **negative**, they default to the values  ``start=``*``size of dimension``*, ``stop=0``,and the step is positive but reversed accessing ``step= -step``.

In [63]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [64]:
x[:5]  # first five elements, start unspecified =0, stop =5, step unspecified =1

array([0, 1, 2, 3, 4])

In [65]:
x[5:]  # elements after index 5, start =5, stop unspecified =size of dimension 10 here, step unspecified =1

array([5, 6, 7, 8, 9])

In [72]:
x[4:7]  # middle sub-array, start =4, stop =7, step unspecified =1

array([4, 5, 6])

In [73]:
x[::2]  # every other element, start unspecified =0, stop unspecified =10, step =2

array([0, 2, 4, 6, 8])

In [68]:
x[1::2]  # every other element, starting at index 1, start =1, stop unspecified =10, step =2

array([1, 3, 5, 7, 9])

A potentially confusing case is when the ``step`` value is negative.
In this case, the defaults for ``start`` and ``stop`` are swapped.
This becomes a convenient way to reverse an array:

In [69]:
x[::-1]  # all elements, reversed, case 2, start unspecified =10, stop unspecified =0, step =1

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [74]:
x[5::-2]  # reversed every other from index 5, case 2, start specified =5, stop unspecified =0, step =2

array([5, 3, 1])

Att in the reverse case with negative step < -1, the stop is not included

In [79]:
x[5:1:-2] # reversed every other from index 5 to index 1, case 2, start specified =5, stop specified =1, step =2

array([5, 3])

### Multi-dimensional subarrays

8. Multi-dimensional slices work in the same way, with multiple slices separated by commas, each slice corresponds to a dimension.
For example:
``` python
x[start:stop:step, start:stop:step, start:stop:step, ... ]
```
    - If any of these slices are unspecified, then the entire dimension is taken into account,
    - How to make a dimension unspecified:
        - Unspecified last dims if the first dims are specified: You can leave them blank
        - Unspecified first dims: You need a **column** to specify subsequent dimension slicing but not the first dimensions ex x[:, :3]
    - In dimension 2, the arrays in dim 1 are called rows and the elements of these arrays line into columns

In [80]:
x3

array([[[8, 1, 5, 9, 8],
        [9, 4, 3, 0, 3],
        [5, 0, 2, 3, 8],
        [1, 3, 3, 3, 7]],

       [[0, 1, 9, 9, 0],
        [4, 7, 3, 2, 7],
        [2, 0, 0, 4, 5],
        [5, 6, 8, 4, 1]],

       [[4, 9, 8, 1, 1],
        [7, 9, 9, 3, 6],
        [7, 2, 0, 3, 5],
        [9, 4, 4, 6, 4]]])

In [90]:
x3[:2, :3, :] 
# dimension 1: first two arrays
# dimension 2: first 3 arrays for each of the two arrays of dim 1
# dimension 3: unspecified, all elements of each of the array of dim 2

array([[[8, 1, 5, 9, 8],
        [9, 4, 3, 0, 3],
        [5, 0, 2, 3, 8]],

       [[0, 1, 9, 9, 0],
        [4, 7, 3, 2, 7],
        [2, 0, 0, 4, 5]]])

In [85]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [0]:
x2[:2, :3]  # dim 1: two rows, dim 2: three columns

array([[12,  5,  2],
       [ 7,  6,  8]])

In [91]:
x2[:, ::2]  # dim 1: all rows, dim 2: every other column

array([[12,  2],
       [ 7,  8],
       [ 1,  7]])

Finally, subarray dimensions can even be reversed together:

In [95]:
x2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

9. Combining indexing and slicing to obtain single rows or columns

In [106]:
print(x2[:, 0])  # first column of x2, dim 1: all rows, dim 2: first element of each row

[3 7 1]


In [107]:
print(x2[0, :])  # first row of x2, dim 1: first row, dim2: all elements of this row

[3 5 2 4]


In the case of row access, the empty slice can be omitted for a more compact syntax:

In [108]:
print(x2[0])  # equivalent to x2[0, :], see unspecified last dims

[3 5 2 4]


### Subarrays as no-copy views
10. Att changing subarrays changes the original np array !!!!!

One important–and extremely useful–thing to know about array slices is that they return *views* rather than *copies* of the array data.
This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.
Consider our two-dimensional array from before:

In [109]:
print(x2)

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]


Let's extract a $2 \times 2$ subarray from this:

In [110]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[3 5]
 [7 6]]


Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [111]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [112]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

11. Copies of an array

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:

In [116]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


If we now modify this subarray, the original array is not touched:

In [117]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [118]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


## Reshaping of Arrays

12. reshape

Another useful type of operation is reshaping of arrays.
The most flexible way of doing this is with the ``reshape`` method.
For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [119]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Note that for this to work, the size of the initial array must match the size of the reshaped array. 
Where possible, the ``reshape`` method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.
This can be done with the ``reshape`` method, or more easily done by making use of the ``newaxis`` keyword within a slice operation:

13. newaxis: 
```python
x[np.newaxis, :]
```
This adds one more dimension to x, this dimension contains one array containing x, for example if x is of shape 3 or (3,) then it will become (1,3). If x is of shape  (3,4) it will become of shape (1, 3, 4)

In [140]:
x = np.array([1, 2, 3])
x.shape

(3,)

In [141]:
# row vector via reshape
m = x.reshape((1, 3))

In [142]:
m.shape

(1, 3)

In [147]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [144]:
x2.shape

(3, 4)

In [145]:
m2 = x2[np.newaxis, :]

In [146]:
m2.shape

(1, 3, 4)

In [148]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [150]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

We will see this type of transformation often throughout the remainder of the book.

## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

14. concatenate for arrays of the same dimensions

In [151]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

You can also concatenate more than two arrays at once:

In [152]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


It can also be used for two-dimensional arrays:

In [153]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [154]:
# concatenate along the first axis
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [155]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

15. vstack and hstack for vertically or horizontally stacking 2-d arrays, watch out for dimensions

In [163]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])
print(x.shape)
print(grid.shape)
# vertically stack the arrays
np.vstack([x, grid])

(3,)
(2, 3)


array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [164]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
print(y.shape)
print(grid.shape)
np.hstack([grid, y])

(2, 1)
(2, 3)


array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

Similary, ``np.dstack`` will stack arrays along the third axis.

### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

16. split, can specify integer being number of splits, or list of indices where to split

In [166]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [167]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [168]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [169]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


Similarly, ``np.dsplit`` will split arrays along the third axis.

<!--NAVIGATION-->
< [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >

<a href="https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.02-The-Basics-Of-NumPy-Arrays.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>
