# NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even tools like Pandas are built around the NumPy array.
Numpy arrays can be thought of as **mathematical vectors** and behave correspondingly; contrary to Python lists, which are a container that stores arbitrary objects.



### Comparison with Python containers

Let's compare the behavior of NumPy arrays with Python lists.

In [None]:
list1 = [1, "string", {'a':1}, [[1, 3], set()]]  # arbitrary objects

In [None]:
listnumbers = [1, 2, 3]

Adding two lists concatenates them, there is no mathematical operation: adding two containers that contain arbitrary objects means "combining" them.

In [None]:
list1 + listnumbers

In [None]:
import numpy as np

In [None]:
arrnumbers = np.array(listnumbers)
arrnumbers2 = np.array([5, 3, 42])
arrnumbers2

Adding two NumPy arrays adds them element-wise, which is a mathematical operation and inline with the behavior of vectors.

In [None]:
arrnumbers + arrnumbers2

In [None]:
arrnumbers * 3  # multiply each element by 3, as expected for vectors

In [None]:
listnumbers * 2  # three times a container just means repeating it

In [None]:
arrnumbers * arrnumbers2  # element-wise multiplication, as expected for vectors

In [None]:
listnumbers * list1  # this is not defined, multiplying a container with another container

In [None]:
listnumbers * listnumbers  # it's also not defined if the list is filled with numbers; they are just objects

## NumPy Array Attributes

First let's discuss some useful array attributes.
We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.
We'll use NumPy's random number generator, which we will *seed* with a set value in order to ensure that the same random arrays are generated each time this code is run.

The old way is to set a global state, but this is "bad practice" and can lead to subtle bugs.
The new way is to use a ``numpy.random.RandomState`` object, which can generate random numbers in a way that is isolated from the global state.

(Correct random number generation is **hard**!)

In [None]:
import numpy as np

np.random.seed(0)  # global seed for reproducibility
rndgen = np.random.RandomState(42)

x1 = rndgen.normal(0, 10, size=6)  # One-dimensional array
x2 = rndgen.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.normal(2, 3, size=(3, 4, 5))  # Three-dimensional array  # we can use numpy, but this relies on the global seed

In [None]:
x1

Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array):

In [None]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

Another useful attribute is the ``dtype``, the data type of the array

In [None]:
print("dtype:", x3.dtype)

## Array Indexing: Accessing Single Elements

If you are familiar with Python's standard list indexing, indexing in NumPy will feel quite familiar.
In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with Python lists:

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[4]

To index from the end of the array, you can use negative indices:

In [None]:
x1[-1]

In [None]:
x1[-2]

In a multi-dimensional array, items can be accessed using a comma-separated tuple of indices:

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, 0]

In [None]:
x2[2, -1]

Values can also be modified using any of the above index notation:

In [None]:
x2[0, 0] = 12
x2

Keep in mind that, unlike Python lists, NumPy arrays have a fixed type.
This means, for example, that if you attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don't be caught unaware by this behavior!

In [None]:
x2[0, 0] = 3.14159  # this will be truncated!
x2

## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

### One-dimensional subarrays

In [None]:
x = np.linspace(0, 10, 10)
x

In [None]:
x[:5]  # first five elements

In [None]:
x[5:]  # elements after index 5

In [None]:
x[4:7]  # middle sub-array

In [None]:
x[::2]  # every other element

In [None]:
x[1::2]  # every other element, starting at index 1

A potentially confusing case is when the ``step`` value is negative.
In this case, the defaults for ``start`` and ``stop`` are swapped.
This becomes a convenient way to reverse an array:

In [None]:
x[::-1]  # all elements, reversed

In [None]:
x[5::-2]  # reversed every other from index 5

### Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by commas.
For example:

In [None]:
x2

In [None]:
x2[:2, :3]  # two rows, three columns

In [None]:
x2[:3, ::2]  # all rows, every other column

Finally, subarray dimensions can even be reversed together:

In [None]:
x2[::-1, ::-1]

#### Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns of an array.
This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``):

In [None]:
print(x2[:, 0])  # first column of x2

In [None]:
print(x2[0, :])  # first row of x2

In the case of row access, the empty slice can be omitted for a more compact syntax:

In [None]:
print(x2[0])  # equivalent to x2[0, :]

### Subarrays as no-copy views

One important–and extremely useful–thing to know about array slices is that they return *views* rather than *copies* of the array data.
This is one area in which NumPy array slicing differs from Python list slicing: in lists, slices will be copies.
Consider our two-dimensional array from before:

In [None]:
print(x2)

Let's extract a $2 \times 2$ subarray from this:

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

Now if we modify this subarray, we'll see that the original array is changed! Observe:

In [None]:
x2_sub[0, 0] = 99
print(x2_sub)

In [None]:
print(x2)

This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer.

### Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copy the data within an array or a subarray. This can be most easily done with the ``copy()`` method:

In [None]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

If we now modify this subarray, the original array is not touched:

In [None]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

In [None]:
print(x2)

## Array Concatenation

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

You can also concatenate more than two arrays at once:

In [None]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

## Operations on NumPy Arrays

NumPy provides a number of functions to perform operations on NumPy arrays that are much more efficient than looping over the arrays.
They fall into three categories:
- *unary* operations, which operate on a single array
- *binary* operations, which operate on two arrays
- *aggregates*, which summarize the values in an array

In general, they act like mathematical vectors.

In [None]:
shape = (3, 4)
x = rndgen.normal(0, 10, size=shape)
y = rndgen.normal(5, 1, size=shape)

In [None]:
x * 3  # multiply each element by 3

In [None]:
[4, 5] * 3

In [None]:
x + y  # add each element of x to the corresponding element of y

In [None]:
np.maximum(x, y)  # element-wise maximum

In [None]:
np.cos(x)  # cosine of each element

### Aggregation

Aggregates are functions that summarize the values in an array, they can be only over some axis or over the whole array.

In [None]:
np.sum(x)  # sum of all elements

In [None]:
np.sum(x3, axis=(0, 2))  # sum of each column

### Boolean operations

NumPy also implements comparison operators such as ``<`` (less than) and ``>`` (greater than) or ``==`` (equal) as element-wise functions.
What do you expect the following to return?

In [None]:
x < 5

In [None]:
x + 5

In [None]:
x == y

### Boolean indexing

Boolean indexing is a powerful feature that allows you to select elements of an array that satisfy some condition.
Using a boolean mask, you can index an array where the elements are selected where the mask is ``True``.

In [None]:
select_lt1 = x < 1
select_lt1

In [None]:
x

In [None]:
x[select_lt1]

This operation is particularly useful for filtering data.

NumPy arrays are great for mathematical operations, however, a drawback is that we have to remember which row corresponds to which column, as they are only numbers.
In the next section, we will see how we can use Pandas to work with named columns, rows, and indices.